Software Engineer

Retrieval augmented generation: session from MS Reactor

2 min read

Sep 10, 2025

Python + AI: Retrieval Augmented Generation (RAG)

On October 9, 2025 I watched the "Python + AI: Retrieval Augmented Generation" session by Pamela Fox at Microsoft Reactor.
This week the series continues with a session on Vision Models on October 14.
Here are the key ideas I learned from the RAG session last week.

The session had a lot of useful code and tool demonstrations, but I will focus only on concepts here.

Why Do We Need RAG?

LLMs are limited; they don't have the most recent, accurate, or full information on specialized domains.
LLMs don't know internal knowledge because they are not trained on private information.
The question is: How to integrate domain knowledge?

Two Methods of Solving This Problem

Fine-tuning: Re-training an LLM with specific data, but can be time-consuming and expensive.
Retrieval Augmented Generation (RAG): Giving LLM additional information by using a stored database of knowledge.

RAG Simplified

Store domain data in an easy-to-retrieve storage — vector database.
When a user asks a question, retrieve contextual information.
Tell the LLM to answer the question using the retrieved information.

How to Store Domain Knowledge (Example Tools)

Store documents if you want clickable citations in your answers — Azure Blog Storage
Extract textual data from documents — Azure Document Intelligence
Split text into chunks — Python
Vectorize chunks using an embedding model — Azure OpenAI
Index documents and chunks — Azure AI Search

What is a Good Chunking Approach?

Make chunks 512 tokens, ensure 25% overlap between chunks.
Keep semantical units, such as tables, in one chunk.

How to Improve RAG Queries

Multiturn support — tell LLM to take into account previous messages in the conversation.
Query rewriting — rewrite user questions using LLM to generate better responses.

Problems in Retrieving Information

Retrieving based on keyword matches only can miss semantic relationships.
Retrieving based on vector searches only can miss documents with exact term matches.

How to Solve Retrieval Quality Problem

Adopt a hybrid approach: combine keyword and vector-based search results and re-rank them.
Include metadata search if needed to enhance retrieval quality.

Relevant research: https://lnkd.in/e73dwnNX

Is RAG Good for Us? Yes.

Despite increasing LLM context window capacity, RAG is still relevant.
Relying on long context windows for everything can be slow, expensive, environmentally costly, and quality results are not guaranteed.
Techniques for RAG implementation have improved.

If you are interested in this problem, read more here: https://lnkd.in/erYBMYAW

#RAG
#AI
#ML
#vector
#reactor

Hello! How can I help you today?

Virtual Chat

Hello! My name is VirtuBot. I am a virtual assistant representing Nazar. You can ask me questions as if I am Nazar.
4:24 PM
Tell me about yourself?