Welcome to

AI bootcamp 2

AI engineering in the enterprise

Course Syllabus and Content

Week 1

Byte-Level Models & Sampling Decoders

3 Units

  • 01
    Tokenization deep dive - Byte-level language modeling vs traditional tokenization
     
    • Learn how byte-level models process raw UTF-8 bytes directly, with a vocabulary size of 256
    • Understand how this approach removes the need for subword tokenizers like BPE or SentencePiece
    • Compare byte-level models to tokenized models with larger vocabularies (e.g., 30k–50k tokens)
    • Analyze the trade-offs between the two approaches in terms of simplicity
    • Evaluate how each approach handles multilingual text
    • Assess the impact on model size
    • Examine differences in performance
  • 02
    State-of-the-art decoders
     
    • Explore decoding strategies that influence LLM output diversity and fluency

    • Top-k sampling

      • Learn how Top-k sampling truncates the output distribution to the k most likely tokens (e.g., k=16)
      • Understand how Top-k sampling balances creativity and control, and why it’s especially effective with small vocab sizes like byte-level models
    • Nucleus (Top-p) sampling

      • Learn how Nucleus (Top-p) sampling dynamically includes tokens up to a cumulative probability p (e.g., p=0.9)
      • Understand how Top-p sampling produces more adaptive and coherent completions than Top-k, especially in unpredictable generation tasks
    • Beam search

      • Learn how Beam search keeps multiple candidate completions in parallel and scores them to select the most likely overall path
      • Understand why Beam search is useful for deterministic outputs (e.g., code, structured data) and why it can lead to repetitive or bland completions in open-ended generation
    • Speculative decoding (OpenAI-style)

      • Learn how Speculative decoding speeds up inference by letting a small model propose multiple token candidates in parallel, which a larger model verifies
      • Understand how speculative decoding works internally and why it is gaining popularity in production systems like Groq and OpenAI APIs
  • 03
    Mini-lab - Compare decoding methods on a complex prompt
     
    • Run the same input prompt using Top-k, Top-p, and Beam search decoding
    • Measure differences in diversity, accuracy, repetition, and latency across the methods
    • Discuss which strategy works best for each context and explain why
Week 2

Markov Chains & Reinforcement Learning Foundations

4 Units

  • 01
    Markov Decision Processes (MDP) as LLM analogies
     
    • Learn how token generation in LLMs can be framed as a Markov process
    • Understand the key components of an MDP
    • Understand how these map conceptually to autoregressive decoding
  • 02
    Monte Carlo vs Temporal Difference (TD) learning
     
    • Explore the Monte Carlo and TD methods of learning from sequences
  • 03
    Q-learning & Policy Gradients (conceptual overview)
     
    • Learn the concept of Q-learning as a method to estimate how good an action (token) is in a specific context (prompt state)
    • Learn the concept of Policy gradients as a method to directly optimize the probability distribution over actions to maximize long-term reward
    • Understand how Q-learning and Policy gradients form the basis of RLHF, DPO, and advanced training techniques for aligning LLM behavior
  • 04
    RL in decoding, CoT prompting, and feedback loops
     
    • Understand how RL ideas are used without training by introducing dynamic feedback in inference

      • Apply reward scoring or confidence thresholds to adjust CoT (Chain-of-Thought) reasoning steps
      • Use external tools (e.g., validators or search APIs) as part of a feedback loop that rewards correct or complete answers
      • Understand how RL concepts power speculative decoding verification, scratchpad agents, and dynamic rerouting during generation
Week 3

Advanced Retrieval Methods

9 Units

  • 01
    Cartridge-based retrieval (self-study distillation)
     
    • Learn how to modularize retrieval into topic- or task-specific “cartridges.”
    • Understand that cartridges are pre-distilled context sets for self-querying agents
    • Study how this approach is inspired by OpenAI’s retrieval plugin and LangChain’s retriever routers
    • See how cartridges improve retrieval precision by narrowing memory to high-relevance windows
  • 02
    Late interaction methods (ColQwen-Omni, audio+image chunks)
     
    • Study late interaction architectures (like ColQwen-Omni) that separate dense retrieval from deep semantic fusion
    • Explore how these models support chunking and retrieval over image, audio, and video-text combinations using attention-based fusion at scoring time
  • 03
    Multi-vector DB vs standard DB
     
    • Understand how multi-vector databases (e.g., ColBERT, Turbopuffer) store multiple vectors per document to support fine-grained relevance
    • Contrast this with standard single-vector-per-doc retrieval (e.g., FAISS), and learn when multi-vector setups are worth the extra complexity
  • 04
    Query routing logic and memory-index hybrids
     
    • Implement index routing systems where queries are conditionally routed:

      • short factual query → lexical index
      • long reasoning query → dense retriever
      • visual question → image embedding index
    • Learn how to fuse local memory with global vector stores for agentic long-term retrieval

  • 05
    Contrastive loss vs triplet loss
     
    • Compare the two core objectives used for fine-tuning retrievers
    • Understand how each behaves in hard-negative-rich domains like code or finance
  • 06
    Tri-encoder vs cross-encoder performance trade-offs
     
    • Explore the architectural trade-offs between Bi/tri-encoders vs cross-encoders
    • Learn when to use hybrid systems (e.g., bi-encoder retrieval + cross-encoder reranking)
  • 07
    Triplet-loss fundamentals and semi-hard negative mining
     
    • Dive into triplet formation strategies
    • Focusing on how to find semi-hard negatives (similar but incorrect results that challenge the model)
  • 08
    Cohere Rerank API & SBERT fine-tuning ([sbert.net], Hugging Face)
     
    • Learn to use off-the-shelf rerankers like Cohere’s API or fine-tune SBERT models to optimize document ranking post-retrieval
  • 09
    Hard-negative mining strategies
     
    • Implement pipelines that automatically surface confusing negatives