Discovering Retrieval Augmented Generation

Project Source Code

Get the project source code below, and follow along with the lesson material.

Download Project Source Code

To set up the project on your local machine, please follow the directions provided in the README.md file. If you run into any issues with running the project source code, then feel free to reach out to the author in the course's Discord channel.

This lesson preview is part of the Responsive LLM Applications with Server-Sent Events course and can be unlocked immediately with a \newline Pro subscription or a single-time purchase. Already have access to this course? Log in here.

This video is available to students only
Unlock This Course

Get unlimited access to Responsive LLM Applications with Server-Sent Events, plus 70+ \newline books, guides and courses with the \newline Pro subscription.

Thumbnail for the \newline course Responsive LLM Applications with Server-Sent Events
  • [00:00 - 00:50] Welcome back. In this lesson, we will learn about retrieval-ogmodal generation, also called RAG. The problem is that LMs are limited to the training data and its radar can be outdated, it can be mistyening, it can be even be false. Let's see what you want to learn about permanent change. You may ask a question to large-scale model, but maybe we have a large model, so some misinformation during training, how can we figure that? Very hard to add a retrieval step, where we pull information from a trusted source and we use it for regeneration. Let me show you in the demo app. Here we ask a question about the very fake requirement change, and what we get is a very precise, as you can see the number and a very accurate answer. How is it possible?

    [00:51 - 01:07] Because the model is using reports from the IPCC, a global scientific organisation dedicated to climate change studies. It's what allows our model to constrict the answer. We can even ask for a patch number, and then we can see that patch 24, there is one search to our question.

    [01:08 - 02:52] Now, how is it possible to make this search? There is a very popular implementation, which is semantic search. Let us understand what is it. It's made possible for us to embedding. And embedding is simply a function that converts an input, here is a text, into an any-emotional vector. Now you can see a textural embed ding correcting a string into a vector. A vector is a list of floats, and is often given a geometric representation as a narrow. Now, for textural embedding, and is somewhere between 100 and 10,000, and you can find many embedding already trained on the internet or fully managed as by open the IPAP for instance. Now, what we really want to convert this text into a vector? Embedying was constructed in such a way that words will close meaning, have closed vector for instance, let's see the vector for dog, the vector for cat, the vector for apartment. As we can see, we have a naked eye, dog and cat looks much closer, and we will build some quantitative metric, like for instance, similarity to formalize this proximity. Now, why is it useful for our use case? Let's imagine we have a query about pets. We build the, we have embed ding, we build a vector of pets, and then we can look up all the vectors which are close. So if we have, for instance, an extract of dogs, from our report, we may fetch it, so we get a search based on semantic proximity. Let's see a quick second diagram to get a better understanding.

    [02:53 - 03:17] Imagine I'm a user, a start here, ask a question to view a map, then we add a semantic search vector store, it returns chunks which are similar to my query, and then we return back everything to the model, so we get a high quality answer. So we've presented the pattern, and in the next lesson we will build the vector store. See you soon!