
Nazar Mammedov
Software Engineer
Retrieval augmented generation: session from MS Reactor
2 min read
|
Python + AI: Retrieval Augmented Generation (RAG)
On October 9, 2025 I watched the "Python + AI: Retrieval Augmented Generation" session by Pamela Fox at Microsoft Reactor.
This week the series continues with a session on Vision Models on October 14.
Here are the key ideas I learned from the RAG session last week.
The session had a lot of useful code and tool demonstrations, but I will focus only on concepts here.
Why Do We Need RAG?
- LLMs are limited; they don't have the most recent, accurate, or full information on specialized domains.
- LLMs don't know internal knowledge because they are not trained on private information.
- The question is: How to integrate domain knowledge?
Two Methods of Solving This Problem
- Fine-tuning: Re-training an LLM with specific data, but can be time-consuming and expensive.
- Retrieval Augmented Generation (RAG): Giving LLM additional information by using a stored database of knowledge.
RAG Simplified
- Store domain data in an easy-to-retrieve storage — vector database.
- When a user asks a question, retrieve contextual information.
- Tell the LLM to answer the question using the retrieved information.
How to Store Domain Knowledge (Example Tools)
- Store documents if you want clickable citations in your answers — Azure Blog Storage
- Extract textual data from documents — Azure Document Intelligence
- Split text into chunks — Python
- Vectorize chunks using an embedding model — Azure OpenAI
- Index documents and chunks — Azure AI Search
What is a Good Chunking Approach?
- Make chunks 512 tokens, ensure 25% overlap between chunks.
- Keep semantical units, such as tables, in one chunk.
How to Improve RAG Queries
- Multiturn support — tell LLM to take into account previous messages in the conversation.
- Query rewriting — rewrite user questions using LLM to generate better responses.
Problems in Retrieving Information
- Retrieving based on keyword matches only can miss semantic relationships.
- Retrieving based on vector searches only can miss documents with exact term matches.
How to Solve Retrieval Quality Problem
- Adopt a hybrid approach: combine keyword and vector-based search results and re-rank them.
- Include metadata search if needed to enhance retrieval quality.
Relevant research: https://lnkd.in/e73dwnNX
Is RAG Good for Us? Yes.
- Despite increasing LLM context window capacity, RAG is still relevant.
- Relying on long context windows for everything can be slow, expensive, environmentally costly, and quality results are not guaranteed.
- Techniques for RAG implementation have improved.
If you are interested in this problem, read more here: https://lnkd.in/erYBMYAW
- #RAG
- #AI
- #ML
- #vector
- #reactor
Hello! How can I help you today?
Virtual Chat- Hello! My name is VirtuBot. I am a virtual assistant representing Nazar. You can ask me questions as if I am Nazar.4:24 PMTell me about yourself?

Powered by NazarAI