Word embeddings and nearest neighbors
Get the project source code below, and follow along with the lesson material.
Download Project Source CodeTo set up the project on your local machine, please follow the directions provided in the README.md
file. If you run into any issues with running the project source code, then feel free to reach out to the author in the course's Discord channel.
This lesson preview is part of the Fundamentals of transformers - Live Workshop course and can be unlocked immediately with a single-time purchase. Already have access to this course? Log in here.
Get unlimited access to Fundamentals of transformers - Live Workshop, plus 70+ \newline books, guides and courses with the \newline Pro subscription.
[00:00 - 00:05] So how language models predict? We're going to look at the architecture of a large language model.
[00:06 - 00:12] Here it's the quote unquote macro architecture. We're going to be looking at the large chunks of the large language model just at a very high level.
[00:13 - 00:17] And then we'll dive deeper and deeper. OK, so we talked before.
[00:18 - 00:26] We said this gray box is the LLA, and it takes in a bunch of input words, and it outputs the next word. All right, so let's break that apart into three different steps.
[00:27 - 00:33] I haven't labeled any of those steps yet. All you see on this slide is just that there are three different steps.
[00:34 - 00:41] For this first step, we're going to take the word, let's say the input word is u. So I mentioned that there are input words plural, but for now let's just simplify.
[00:42 - 00:43] We just have one input word. It's u.
[00:44 - 00:49] And we want to convert that word into vectors. So let me explain what vectors are.
[00:50 - 00:57] Vectors are an array of numbers. So let's say we have this array of numbers, 0, 1, 0, 0, 0, negative 1.
[00:58 - 01:05] This is just-- sorry, I want to see something real quick here. Oh, goodness.
[01:06 - 01:09] OK, there we go. There we go.
[01:10 - 01:18] So here's a set of numbers. And I'm going to say vectors, but you can think of these vectors as, again, just arrays.
[01:19 - 01:27] Arrays of two numbers in this case. Well, how do we understand or visualize what these arrays look like?
[01:28 - 01:33] Well, the first method is just to do a treat them as coordinates. So we can plot them here.
[01:34 - 01:41] So you apply these coordinates. You can also consider these points as arrows starting from the origin and ending up at that point.
[01:42 - 01:57] So you can visualize vectors as either points or direction with length, the direction with length being the arrows and the points being the circles that I drew. To simplify, I'm going to refer to vectors as points every so often, at least when I illustrate them.
[01:58 - 02:06] But when I speak, I'll say vectors. But when you think vectors-- sorry, when I say vectors, you can just think points in space.
[02:07 - 02:11] OK, so really quickly. Looks like-- so Ken asked, where does EOS come from?
[02:12 - 02:24] End of a document fed into training? So EOS actually occurs only at the very, very end of-- so it's not as granular as a document.
[02:25 - 02:36] What will usually happen is you don't have EOS until the end of a corpus. And so a corpus could be more than just a single document.
[02:37 - 02:41] It could be all of Wikipedia, for example. But none of that really matters in pre-training.
[02:42 - 02:47] And so I'll talk about different stages of training later on as well. But in instruction tuning, this matters a lot.
[02:48 - 02:57] So in instruction tuning, end of sequence is just at the end of a typical response. And that matters because you don't want chat to be too labbing on and on and on and on.
[02:58 - 03:05] And so that's sort of a problem that was common with the early versions of LLMs. But the model just blabs on and on and on and on.
[03:06 - 03:12] And so we would fix that during instruction tuning. So long story short, end of sequence is end of a very long corpus.
[03:13 - 03:17] Or it's the end of a response you expect the model to give during training. Yeah.
[03:18 - 03:25] So I will talk more about instruction following an instruction finding tuning later on. Yeah.
[03:26 - 03:29] Cool. OK, so back to these slides here.
[03:30 - 03:36] So we've talked about how to visualize these arrays of numbers. Or in other words, these vectors.
[03:37 - 03:41] For now, put aside the visualization. We're going to come back to that later on.
[03:42 - 03:51] But let's say that we have a mapping from words to vectors. We know that u maps to 0, 1, r maps to 0.7, and cold maps to 0, negative 1.
[03:52 - 03:59] I just arbitrarily defined this mapping. And later on, I'll tell you how you can actually create this mapping on your own.
[04:00 - 04:03] But now, this is the mapping. So we saw that our input word was u.
[04:04 - 04:06] So let's look at u in this mapping. We know that u is 0, 1.
[04:07 - 04:11] So let's use that. u now converts into 0, 1.
[04:12 - 04:22] That process of looking up what our word is-- sorry, looking up the vector for our word is called n-bed. In other words, we're converting words into vectors.
[04:23 - 04:27] Now, our next step is to then transform these vectors. And I use the word transform intentionally.
[04:28 - 04:37] This blocks in the middle, as you can probably imagine, is the transformer that we alluded to earlier. So the second step here is to transform vectors.