Q&A

This lesson preview is part of the Fundamentals of transformers - Live Workshop course and can be unlocked immediately with a single-time purchase. Already have access to this course? Log in here.

This video is available to students only

Unlock This Course

Get unlimited access to Fundamentals of transformers - Live Workshop with a single-time purchase.

$Thumbnail for the \newline course Fundamentals of transformers - Live Workshop$

[00:00 - 00:03] And again, you can ask me any questions if you have any. I'll be happy to answer them.

[00:04 - 00:18] Let me now go to what folks were talking about earlier. OK, so I took some notes while folks were asking questions during the Q&A. So to question about how do we actually curate these datasets, the next question is, yeah, what happens to formatting?

[00:19 - 00:24] We stripped out all the formatting. And then the third was about how do we categorize all this knowledge?

[00:25 - 00:28] And then the specific question was, like, what's a layer? What's an architecture?

[00:29 - 00:39] And we have all sorts of these problems. So this is also what Ken had said that we're basically learning about an application with many moving parts.

[00:40 - 00:50] It's not just one for pass through a model. So to help categorize all the information that we're getting about LLMs, we should divide all of the information that we know into four different parts.

[00:51 - 00:59] This is generally-- or these categories apply generally to ML. So this is not specific to LLMs.

[01:00 - 01:06] This is just how to understand any machine learning model. So for folks in the audience that I've been working with ML for a while, this is probably, like, already old news.

[01:07 - 01:13] But I'm going to talk about it anyways, because I think it's helpful for understanding what's going on. So there are four main compartments of knowledge.

[01:14 - 01:19] There's data, model, algorithm, and optimization. So data here means what you're training on.

[01:20 - 01:22] That's pretty straightforward. In LLM case, it's text.

[01:23 - 01:28] In the Vision Transformer case, it's images. It could be videos, for example.

[01:29 - 01:32] Whatever data you're trying to learn from. The second is model.

[01:33 - 01:38] So the model is the majority of what we talked about today. This is how you do inference.

[01:39 - 01:44] For the linear regression case, the line is the model. In our case, the transformer is the model.

[01:45 - 01:51] So the model tells us how to actually do prediction. There's a third part, which is the algorithm.

[01:52 - 01:58] And this algorithm is effectively-- oh, so actually, sorry. Let me make a change here.

[01:59 - 02:03] That's application. So the third part here is application.

[02:04 - 02:12] And application means what are you predicting and what are you using that prediction for? The prediction could, for example, be-- in our case, it's the next word.

[02:13 - 02:18] And then what's the usage of predicting the next word? Well, now you can generate lots and lots of text.

[02:19 - 02:25] So maybe in other cases, it's to generate an image. Or maybe it's to generate a depth estimation of how far away the objects are in an image.

[02:26 - 02:31] There are many different applications. And this ties into what your model is predicting specifically.

[02:32 - 02:37] This last one. This is the expensive part of working with LLMs, this optimization, so how you actually learn.

[02:38 - 02:47] How do you train this model on the day that you have for the application you're interested in? So these are the four different general categories of knowledge.

[02:48 - 03:00] So now back to our specific example before of RNNs and CNNs and so on, so forth. So a lot of those architectures, like GANs, convolutions, most of that falls under model.

[03:01 - 03:11] So when you're learning about how a model works, it usually falls under model. Now, when we mentioned that RNN, though, differs not in-- mostly doesn't differ in the operations.

[03:12 - 03:19] Like, yes, it also differs in how you construct different layers together and whatnot. But the most important part of RNN is how it represents information.

[03:20 - 03:32] So arguably, you could say that falls in the application. I mean, none of these-- you don't have to be precise about which categories you use, but it could be helpful to think about these four categories when you're learning about new pieces of knowledge.

[03:33 - 03:44] That was possibly confusing, because I rushed through that really quickly. But before I talk about these two, folks have more questions about what I've talked about here.

[03:45 - 03:52] Or is there a particular direction you'd like to steer in? Basically, I think from now on out for the next 20 minutes, it's really up to the questions that folks have.

[03:53 - 04:10] Happy to answer anything. So would the application part of this be-- if we were to participate in the co-art, the application is what we need to bring to the table?

[04:11 - 04:12] Yeah. Pretty much.

[04:13 - 04:15] OK. Yeah.

[04:16 - 04:17] Yep. What do you want to build?

[04:18 - 04:20] Yeah, pretty much. And what do you want to build?

[04:21 - 04:27] And then it ties into what the model actually predicts. And you have to find the right combination of the two.

[04:28 - 04:32] Yeah. Yeah.

[04:33 - 04:36] Any other questions? I don't call these.

[04:37 - 04:59] Let's start going. Could you comment on-- one thing I've been curious about is when you do choose to use one of these foundation models and use RAG or something on top of it, is there a way that you scope the output to your particular topic of expertise that you're intending the solution to have?

[05:00 - 05:21] Because if you want-- I did read a little bit about some people were exploiting a model that was fine-tuned by one company to actually learn information about another company. So are there methods to basically put guardrails on that model?

[05:22 - 05:25] Yeah. That is a question, and it's an important one.

[05:26 - 05:36] So the general idea is, how do we get a model to not output things that we don't want it to output? Maybe we don't want the model to be racist or sexist or anything, right?

[05:37 - 05:44] Or the second is we don't want it to output any dangerous information. So how do you prevent it from predicting any harmful outputs?

[05:45 - 05:55] And two main ways. One is based on the output, classify it into one or two categories, safer or not safe, and determine whether or not to return that result.

[05:56 - 05:59] And the third one-- sorry. And the second one is to actually fine-tune.

[06:00 - 06:06] So we mentioned instruction tuning a little bit before. In short, you have three different phases in training.

[06:07 - 06:09] You have your pre-training. That's the most expensive part.

[06:10 - 06:16] For the most part, everyone in this call, myself included, doesn't need to pre-train. There are plenty of open-source pre-training models that are very high quality.

[06:17 - 06:21] You can just take one off the shelf. The second step is instruction following.

[06:22 - 06:30] Instruction following you generally want to do on your own for your specific use case. But there are also very high-quality instruction-tuning instruction-tuned models available.

[06:31 - 06:36] The third one, the most important one, would be fine-tuning. And this is usually done with very, very few parameters.

[06:37 - 06:45] It's done with 0.1% of parameters. And it's in this third step that you then begin to add things, like guardrails.

[06:46 - 06:59] Fine-tune on responses where you have an undesirable question and force the model to learn to not respond to that question. This would also be the place to fine-tune on data that's specific to your use case.

[07:00 - 07:05] So is that the equivalence of negative set training data? Yeah, pretty much.

[07:06 - 07:12] There's red-tuning data sets that you can find open source online everywhere. And those would be the data sets you want to use.

[07:13 - 07:13] Interesting. Yeah.

[07:14 - 07:18] Yeah. Yeah, good questions.

[07:19 - 07:30] Yeah, any other questions? Anything out of your minds?

[07:31 - 07:38] Really appreciate the opportunity to be able to learn some in this format. I think it's really cool.

[07:39 - 07:42] Yeah, thanks for coming. I'm really happy that I got to meet all of you.

[07:43 - 08:04] And then I guess one thing I did want to say, too-- so some general comments along the lines of how much do I need to know and whatnot. And a specific phrase was, don't want to hold back other people.

[08:05 - 08:22] I just want to say, at a general level, I know this is easier said than done, that don't be intimidated by all the mumbo-jumbo jargon that's online everywhere. Like I had mentioned, most of the blog posts that attempt to explain transformers are just copy-possed out diagrams for the original paper.

[08:23 - 08:26] A, they're not very clear. And B, they're probably wrong anyways.

[08:27 - 08:42] So I'm hoping that this-- I know this is a lot of content in these last four hours. But I hope that a lot of it was more accessible than other content on the web.

[08:43 - 08:45] It doesn't have to be intimidating. And you can certainly do it.

[08:46 - 08:54] The problem is sifting through all the noise that other people have created across the web. And I think that's what I'm hoping to help do for you guys.

[08:55 - 09:05] Yeah. For what it's worth, Alvin, that's probably the biggest selling point you have.

[09:06 - 09:14] He's spent my four hours of my time on a subject that I know I need to get into. I know I need to learn.

[09:15 - 09:21] Yeah. Was I tempted by the salary?

[09:22 - 09:23] Yeah. Yeah.

[09:24 - 09:28] Yeah. I'm not stupid.

[09:29 - 09:39] I made software development for me was a second career. After 20 years of being in sales.

[09:40 - 09:46] Oh, wow. When the internet hit, I'm like, I better learn about this internet thing.

[09:47 - 09:51] And I got a second bachelor's. Wow.

[09:52 - 10:06] And I made a transition into software development. And I just received my master's from Boston U in 2021.

[10:07 - 10:11] Congrats. Well, I'm not-- I'm a lifelong learner.

[10:12 - 10:27] And I need to sift through the noise. Just like you said, just so you hit it, the nail on the head with that comment.

[10:28 - 10:39] I would like-- I heard my talk about New Line and everything. I read those emails too.

[10:40 - 10:43] Some are funny. Some are like, OK, whatever.

[10:44 - 11:04] But when I saw this, and I saw this, the email on this, I'm like, well, this may help me sift through the noise of ML Jenner to AI. What's Gemini?

[11:05 - 11:11] I know a chat GPT is. So it's been very, very helpful.

[11:12 - 11:23] And I'm extremely interested in learning not only for the money, but because I'm a lifelong learner. And I love to learn.

[11:24 - 11:28] So that's all I'm going to say about that. Yeah, that's great.

[11:29 - 11:41] I hope that-- yeah, I like that a lot. That's who I want to be to.

[11:42 - 11:48] I aspire to that. Yeah, any other thoughts from folks?

[11:49 - 12:08] OK. Well, I mean-- Zao, I don't know if you had any closing remarks, but my thoughts here are-- yeah, thank you all for coming.

[12:09 - 12:15] I mean, I'm really, really happy that I got to meet you all. And hopefully provide some value to all of you.

[12:16 - 12:23] This is a-- it's a mouthful. There's a lot of stuff in this field, a lot of random stuff in this field.

[12:24 - 12:53] So maybe my last comment here is reading Twitter or blog posts or papers is sort of like drinking from a fire hose. And given that I'm sitting in the mischievous and all of you are two, I found the best technique for filtering out all of the random stuff and knowing what needs to stick and where I actually need to learn is just letting time run its course.

[12:54 - 13:10] What will happen is after three months or so, if you still hear about that weird technique you heard about three months ago, then it's likely something worth learning about. So my suggestion is not to drink directly from the fire hose, but let time filter out what's worth learning.

[13:11 - 13:16] And I don't mean wait a year or two years, but wait a month or two. And then see what sticks.

[13:17 - 13:21] And that's sort of what I do. And even now, I would chat to BT first to cough like two years ago, right?

[13:22 - 13:31] It's only now that I finally understand what are the fundamentals. It's only now the field as a whole understands what fundamentals are and what has remained unchanged.

[13:32 - 13:36] So in short, if things are overwhelming, that's OK. It's overwhelming for everybody.

[13:37 - 13:47] But we might not need to deal with it over a moment apart. Just let time take its run its course and see what actually is left.

[13:48 - 13:50] Yeah, so thanks all. Yeah.

[13:51 - 14:08] And then thank you guys for coming. I guess to address the final concern is that I want to mention something that Ken and Bob brought up.

[14:09 - 14:17] As part of the course, we'll help you with the cost calculation. So let's hear an example.

[14:18 - 14:35] Originally part of the live workshop, we actually had a section on optimization of flops versus on the compute side of things. As you can buy, no Alvin is a specialist in doing that.

[14:36 - 15:07] And so we can basically go over-- so if you want to basically build, let's say, for example, a vertical foundational model or a rag based, we can help calculate a lot of the costs. And then generally speaking as an observation from our AI applications, there's CAPEX versus-- there's initial kind of expenditure, which is capitalization costs versus variable expenditures.

[15:08 - 15:37] Generally speaking, as you go from foundational models into more fine tuning and then into retrieval augmented generation and then AI agents, it shifts from a lot of front loss into more variable cost. And where someone basically-- let's say, for example, you build a writing tool to help writers or a SEO writing tool.

[15:38 - 15:51] And the SEO writing tool, you basically have a llama kind of an inference streaming kind of a service in the back end. And then it falls into GPD kind of a four.

[15:52 - 16:02] But those things basically are more variable costs. So as you get subscribers, as you get usage, it basically scales up as it goes.

[16:03 - 16:13] You don't have to basically put the expenditures. I think what Ken you're mentioning is certainly basically on the foundational model side or even on the vertical foundational model side.

[16:14 - 16:30] Certainly it is true that basically those people are enormously expensive. But I don't think that's going to necessarily-- we can certainly help build these things, like what Maya basically mentioned.

[16:31 - 16:42] But for everyone else, basically we want to basically build AI applications as well. So just to provide additional context on about Ken's question.

[16:43 - 17:02] So-- [AUDIO OUT] And as a closing remark, we're going to upload this course. And then we're going to ask for all your individual disc boards.

[17:03 - 17:18] The reason why is-- well, if you want to, you can basically engage with us in a Discord channel. And then as you go through the material, as Alvin mentioned, it sometimes takes a little bit to sync in.

[17:19 - 17:35] And as you engage with the material, rewatch it, and so forth, you can kind of ask questions in the Discord channel. And then I'll follow up with you guys about any individual kind of concerns that you guys may have.

[17:36 - 17:52] And I'll give you a count and leave link if you want to book a time with me as well. So any comments, questions, kind of about concerns that you guys want to talk about?

[17:53 - 18:07] I look forward to getting the email from you. OK.

[18:08 - 18:09] Great. Thank you.

[18:10 - 18:31] And then we'll-- yeah, I'll do that after the call. And then if you guys enjoy the conversation, we'll follow up with-- we would appreciate a testimonial or a star rating from you as well.

[18:32 - 18:41] So we'll follow up with that as well. Sounds great.

[18:42 - 18:42] Thank you. All right.

[18:43 - 18:46] Thank you so much, guys. You know, hope you enjoyed it.

[18:47 - 18:54] So-- Definitely did. Thanks again, Alvin.

[18:55 - 19:00] Really good job. And look forward to working with you.

[19:01 - 19:02] Thanks. Yeah, absolutely.

[19:03 - 19:08] All right. Thank you, guys.

[19:09 - 19:10] Have a great day, guys. Bye.

[19:11 - 19:14] Thank you. Yeah.

[19:15 - 19:21] See you. Bye.

[19:22 - 19:43] [END PLAYBACK] [END PLAYBACK] [END PLAYBACK] [END PLAYBACK] [END PLAYBACK] [END PLAYBACK] [END PLAYBACK]