Welcome to
AI bootcamp 2
AI engineering in the enterprise
Course Syllabus and Content
Week 1
Byte-Level Models & Sampling Decoders
3 Units
- 01Tokenization deep dive - Byte-level language modeling vs traditional tokenization
- Learn how byte-level models process raw UTF-8 bytes directly, with a vocabulary size of 256
- Understand how this approach removes the need for subword tokenizers like BPE or SentencePiece
- Compare byte-level models to tokenized models with larger vocabularies (e.g., 30k–50k tokens)
- Analyze the trade-offs between the two approaches in terms of simplicity
- Evaluate how each approach handles multilingual text
- Assess the impact on model size
- Examine differences in performance
- 02State-of-the-art decoders
-
Explore decoding strategies that influence LLM output diversity and fluency
-
Top-k sampling
- Learn how Top-k sampling truncates the output distribution to the k most likely tokens (e.g., k=16)
- Understand how Top-k sampling balances creativity and control, and why it’s especially effective with small vocab sizes like byte-level models
-
Nucleus (Top-p) sampling
- Learn how Nucleus (Top-p) sampling dynamically includes tokens up to a cumulative probability p (e.g., p=0.9)
- Understand how Top-p sampling produces more adaptive and coherent completions than Top-k, especially in unpredictable generation tasks
-
Beam search
- Learn how Beam search keeps multiple candidate completions in parallel and scores them to select the most likely overall path
- Understand why Beam search is useful for deterministic outputs (e.g., code, structured data) and why it can lead to repetitive or bland completions in open-ended generation
-
Speculative decoding (OpenAI-style)
- Learn how Speculative decoding speeds up inference by letting a small model propose multiple token candidates in parallel, which a larger model verifies
- Understand how speculative decoding works internally and why it is gaining popularity in production systems like Groq and OpenAI APIs
-
- 03Mini-lab - Compare decoding methods on a complex prompt
- Run the same input prompt using Top-k, Top-p, and Beam search decoding
- Measure differences in diversity, accuracy, repetition, and latency across the methods
- Discuss which strategy works best for each context and explain why
Week 2
Markov Chains & Reinforcement Learning Foundations
4 Units
- 01Markov Decision Processes (MDP) as LLM analogies
- Learn how token generation in LLMs can be framed as a Markov process
- Understand the key components of an MDP
- Understand how these map conceptually to autoregressive decoding
- 02Monte Carlo vs Temporal Difference (TD) learning
- Explore the Monte Carlo and TD methods of learning from sequences
- 03Q-learning & Policy Gradients (conceptual overview)
- Learn the concept of Q-learning as a method to estimate how good an action (token) is in a specific context (prompt state)
- Learn the concept of Policy gradients as a method to directly optimize the probability distribution over actions to maximize long-term reward
- Understand how Q-learning and Policy gradients form the basis of RLHF, DPO, and advanced training techniques for aligning LLM behavior
- 04RL in decoding, CoT prompting, and feedback loops
-
Understand how RL ideas are used without training by introducing dynamic feedback in inference
- Apply reward scoring or confidence thresholds to adjust CoT (Chain-of-Thought) reasoning steps
- Use external tools (e.g., validators or search APIs) as part of a feedback loop that rewards correct or complete answers
- Understand how RL concepts power speculative decoding verification, scratchpad agents, and dynamic rerouting during generation
-
Week 3
Advanced Retrieval Methods
9 Units
- 01Cartridge-based retrieval (self-study distillation)
- Learn how to modularize retrieval into topic- or task-specific “cartridges.”
- Understand that cartridges are pre-distilled context sets for self-querying agents
- Study how this approach is inspired by OpenAI’s retrieval plugin and LangChain’s retriever routers
- See how cartridges improve retrieval precision by narrowing memory to high-relevance windows
- 02Late interaction methods (ColQwen-Omni, audio+image chunks)
- Study late interaction architectures (like ColQwen-Omni) that separate dense retrieval from deep semantic fusion
- Explore how these models support chunking and retrieval over image, audio, and video-text combinations using attention-based fusion at scoring time
- 03Multi-vector DB vs standard DB
- Understand how multi-vector databases (e.g., ColBERT, Turbopuffer) store multiple vectors per document to support fine-grained relevance
- Contrast this with standard single-vector-per-doc retrieval (e.g., FAISS), and learn when multi-vector setups are worth the extra complexity
- 04Query routing logic and memory-index hybrids
-
Implement index routing systems where queries are conditionally routed:
- short factual query → lexical index
- long reasoning query → dense retriever
- visual question → image embedding index
-
Learn how to fuse local memory with global vector stores for agentic long-term retrieval
-
- 05Contrastive loss vs triplet loss
- Compare the two core objectives used for fine-tuning retrievers
- Understand how each behaves in hard-negative-rich domains like code or finance
- 06Tri-encoder vs cross-encoder performance trade-offs
- Explore the architectural trade-offs between Bi/tri-encoders vs cross-encoders
- Learn when to use hybrid systems (e.g., bi-encoder retrieval + cross-encoder reranking)
- 07Triplet-loss fundamentals and semi-hard negative mining
- Dive into triplet formation strategies
- Focusing on how to find semi-hard negatives (similar but incorrect results that challenge the model)
- 08Cohere Rerank API & SBERT fine-tuning ([sbert.net], Hugging Face)
- Learn to use off-the-shelf rerankers like Cohere’s API or fine-tune SBERT models to optimize document ranking post-retrieval
- 09Hard-negative mining strategies
- Implement pipelines that automatically surface confusing negatives