Intro

ChatGPT is to LLM, as Kleenex is to tissue

What are LLMs?

Tokens

Demo - Manual LLM inference 

LLM generate text

What LLMs predict

Vectors, intuitively

Word embeddings and nearest neighbors

Demo - Semantic meaning of word embeddings 

The architecture for a Large Language Model

How LLMs predict

Self-attention adds context

Demo - Adding "context" to a vector

MLP transforms

Demo - The necessity of non-linearities

How Transformers predict

Absolute positional encoding

Demo Cons of absolute positional bias

Demo - skip connections

Batch norm

RMS norm

How LLMs use position

Workshop-feedback-qa

How LLMs attend

Modern day transformer architectures

Modern LLM connection to papers

You can take the workshop from anywhere in the world, as long as you
have a computer and an internet connection. You also have the opportunity to ask the instructors questions live.

Live and remote

Learn at your own pace, whenever it's convenient for you. With no
rigid schedule to worry about, you can take the course on your own
terms.

Recorded

Learn by building while you learn the concepts.

Build

Join a [vibrant community](/discord) of other students who
are also learning with Fundamentals of transformers. Ask questions, get feedback and
collaborate with others to take your skills to the next level.

Community

In this workshop, we dive deep into Large Language Models (LLMs) to help you understand, build, and optimize their architecture for real-world applications. LLMs are transforming industries—from customer support to content creation—but understanding how these models work, let alone optimizing them, can be challenging. 

In this comprehensive 9-module series, we cover:

The technical essentials of LLMs, including autoregressive decoding, positional encoding, and multi-head attention
The entire LLM lifecycle, from pretraining on massive datasets to fine-tuning and instruction tuning for specialized tasks
Best practices for evaluating LLMs, identifying bottlenecks, and leveraging state-of-the-art architectures for efficiency and scalability

this workshop includes hours of in-depth instruction, hands-on coding exercises, and access to a community forum for support and discussions. You'll also gain exclusive access to source code templates, an expansive reference library, and downloadable materials for continued learning.

It's taught by [Alvin Wan](https://www.linkedin.com/in/alvinwan/), a Senior Research Scientist at Apple and a PhD student at UC Berkeley with international recognition for his impactful contributions in efficient AI and design. With his practical industry experience and research insights, you’ll be guided from fundamentals to advanced concepts with clarity and precision.

By the end of this workshop, you’ll not only understand how to create and optimize LLMs but also how to apply this knowledge across various applications in tech and business.

Everyone knows chatgpt, but how do modern large language models fully work? The fundamentals start at the transformer. This workshop is a workshop to dymstify the transformer and be able to run through concept to code on how the transformer work. This workshops combines concept at an intutive level, to code, to math all with the intent at providing an end to end understanding at the fundamentals of large language models.

Fundamentals of transformers - Live Workshop

Miniaturized version of Huggingface’s LLM inference utility

Interactive demos to understand word embeddings and model representation capacity

Visualization utilities for real-world LLMs, to understand self-attention

Intro to LLM Basics: Learn foundational concepts, including terminology like models, data, algorithms, and optimization.

Autoregressive Decoding: Grasp how LLMs predict words through conditional generation, supported by manual inference demos.

LLM Prediction Mechanism: Explore LLM architecture, with an intuitive look at vectors and word embeddings.

Semantic Meaning in Embeddings: See how word embeddings represent semantic meaning through nearest neighbors and vector demos.

Transformer Core Mechanics: Unpack the inner workings of a transformer layer, including self-attention and context addition.

Non-Linear Transformations: Discover why non-linearities are essential, supported by hands-on matrix multiplication and MLP demos.

Positional Encoding: Learn absolute and relative positional encoding techniques, plus RMS Norm for positional bias management.

Differences between absolute and relative positional encoding

Attention Mechanisms: Delve into “forward-facing” and multi-head attention to understand attention values.

Advanced Attention: Study grouped-query attention and its importance in handling large data.

Current Transformer Models: Analyze academic and modern transformer diagrams, identifying bottlenecks in today’s LLMs.

Build a Mini LLM Inference Tool: Create a simplified version of Huggingface’s LLM utility to understand LLM operation.

Understand Word Embeddings: Develop interactive demos exploring word embeddings and how models represent words

Visualize Self-Attention: Use visualization tools to understand the role of self-attention in language models.

---
title: Workshop-feedback-qa
privateVideoUrl: https://stream.mux.com/N9702t00jC5Zcu00ArKzNdCwQrXbmlB76mq6kqtL1RLeu4.m3u8
---


Alvin Wan

Okay, got it. How are you guys doing? Do you guys feel like you're getting

Feel free to unmute yourself, talk, and you don't have to just, yeah, clap.

All right, I'm going to go into a little bit of PowerPoint presentation.

Oh, and this part is supposed to be interactive. So don't, not like a lecture.

So feel free to unmute yourself and about do live questions.

So we surveyed over 120 people and to understand what the biggest pains were

and some of the problems with the biggest pains were they're unable to

unable to understand the lingo, be able to not be able to judge what's good or

Be able to apply AI, be able to understand the internals of AI and be able to

build a foundational model yourself and be able to

Build on top of AI. So I'm curious. How do you guys feel after after two hours

of conversation with Alvin? How do you do where would you basically feel like

Now feel free to unmute yourself and just talk

I do feel like I have a better understanding of what's happening a bit more

internally there, but I guess the one thing that I've been dying to learn is

I want to understand how data needs to be prepared to do these things like to

And then potentially to augment it using rag or my understanding is you can

make use of vector databases as well to layer on top of a foundational model.

And the reason it's a little selfish. The reason I'm curious is that I've been

a data guy for a long time supporting machine learning engineers.

So I have a deep understanding of how to prepare data, for example, that might

Audio event detection where you have audio that gets force aligned and then

gets converted into mel bins and then then the training occurs. And so I'm

Support fine tuning LLM's from a data perspective. How did what does that look

like is the inputs are not just like big text files right.

And I've seen some references where it shows like repetitive sequences of

inputs. So it'd be like, I, I am, I am cool. Right.

Yeah, that's a, that's a great question. My understanding is there's a lot of

prepared text files already. And, and so there's rapid jam. There's a lot of

prepared attacks. Alvin, do you want to go into kind of how to prepare the, the

Yeah, it's a good question. We can tack down at the end right here.

Basically, because of how the transformer predicts, you actually don't need to

do anything too special. So you did mention, yeah, we do have repetitive text.

But this will take a little bit more explaining. You have some of the, you

actually have some of the core concepts already needed to explain this, but you

actually don't. Yeah, the only thing you need to prepare is, for example,

pieces or strings that don't contribute to the meaning of the text. Right. So

there's not much to compare in terms of the format of the text. So, for example

, but we can talk about that at the end too. I think it's a great question. And

Anyone else. How do you guys feel about kind of about being able to understand

The punctuation formatting is really mysterious to me. I mean, because Alvin,

you just, you've been doing fantastic. Alvin, by the way, you obviously have a

deep mastery and great exposition skills.

I'm not just saying that because we both got our PhDs Berkeley, but whatever.

Anyway, I just find that because you just said to my dad that some of the

formatting gets stripped out the formatting stripped out.

However, as the form that didn't get preserved in this auto regressive model.

knew it, but what was really helpful in the presentation so far is just there's

an outer loop and an inner loop and it's not just like a deep neural network

It needs it's really an app and it's I think that's something that bothers me,

LLM. Okay, it's a model. No, it's really not what it is. It's an app and it's

a bunch of tricks and a bunch of loops and some of it is inner products and

some of it is matrix algebra. It's just is a lot more to what is a lot more

finesse and human contribution than your sort of the press would lead you to

Yeah, what would help me a little bit was I read a post on GPD without the Wolf

Yeah, I read that one. That was fantastic because he's basically a genius and

he said he doesn't understand how it works. I thought that was just a really

post. Yeah, I thought what was really valuable is that at particular section,

this is just the recipe on how they figured it out and it doesn't work any

other way and he doesn't know why. So it's I don't know, like cooking like

there are certain steps that you don't know why it works, but you still do that

step and about tastes good in the end. Yeah.

But there are transformless approaches now that are yielding decent output. It

's definitely not the transformers and this the be all end all. It's something

Kodak for our huge corpus and I can believe that works a lot better than the

automatic programming that people are trying to get out of it because

there are only a finite number of ways to say the knowledge that humanity has

much more vast and more creative or more unique in a way. So I'm not surprised