Nazar Mammedov

Software Engineer

Retrieval augmented generation: session from MS Reactor

2 min read
|
RAG session

Python + AI: Retrieval Augmented Generation (RAG)

On October 9, 2025 I watched the "Python + AI: Retrieval Augmented Generation" session by Pamela Fox at Microsoft Reactor.
This week the series continues with a session on Vision Models on October 14.
Here are the key ideas I learned from the RAG session last week.

The session had a lot of useful code and tool demonstrations, but I will focus only on concepts here.

Why Do We Need RAG?

  • LLMs are limited; they don't have the most recent, accurate, or full information on specialized domains.
  • LLMs don't know internal knowledge because they are not trained on private information.
  • The question is: How to integrate domain knowledge?

Two Methods of Solving This Problem

  • Fine-tuning: Re-training an LLM with specific data, but can be time-consuming and expensive.
  • Retrieval Augmented Generation (RAG): Giving LLM additional information by using a stored database of knowledge.

RAG Simplified

  1. Store domain data in an easy-to-retrieve storage — vector database.
  2. When a user asks a question, retrieve contextual information.
  3. Tell the LLM to answer the question using the retrieved information.

How to Store Domain Knowledge (Example Tools)

  • Store documents if you want clickable citations in your answers — Azure Blog Storage
  • Extract textual data from documents — Azure Document Intelligence
  • Split text into chunks — Python
  • Vectorize chunks using an embedding model — Azure OpenAI
  • Index documents and chunks — Azure AI Search

What is a Good Chunking Approach?

  • Make chunks 512 tokens, ensure 25% overlap between chunks.
  • Keep semantical units, such as tables, in one chunk.

How to Improve RAG Queries

  • Multiturn support — tell LLM to take into account previous messages in the conversation.
  • Query rewriting — rewrite user questions using LLM to generate better responses.

Problems in Retrieving Information

  • Retrieving based on keyword matches only can miss semantic relationships.
  • Retrieving based on vector searches only can miss documents with exact term matches.

How to Solve Retrieval Quality Problem

  • Adopt a hybrid approach: combine keyword and vector-based search results and re-rank them.
  • Include metadata search if needed to enhance retrieval quality.

Relevant research: https://lnkd.in/e73dwnNX

Is RAG Good for Us? Yes.

  • Despite increasing LLM context window capacity, RAG is still relevant.
  • Relying on long context windows for everything can be slow, expensive, environmentally costly, and quality results are not guaranteed.
  • Techniques for RAG implementation have improved.

If you are interested in this problem, read more here: https://lnkd.in/erYBMYAW

  • #RAG
  • #AI
  • #ML
  • #vector
  • #reactor

Hello! How can I help you today?

Virtual Chat
  • Hello! My name is VirtuBot. I am a virtual assistant representing Nazar. You can ask me questions as if I am Nazar.
    4:24 PM
    Tell me about yourself?
Powered by NazarAI