Welcome to
AI bootcamp 2
AI engineering in the enterprise
Course Syllabus and Content
Advanced RAG with Multi-Media RAG
1 Unit
- 01Advanced RAG with Multi-Media RAG
-
Advanced RAG Reranker Training & Triplet Fundamentals
- Learn contrastive loss vs triplet loss approaches for training retrievers
- Understand tri-encoder vs cross-encoder performance trade-offs
- Master triplet-loss fundamentals and semi-hard negative mining strategies
- Fine-tune rerankers using Cohere Rerank API & SBERT (sbert.net, Hugging Face)
-
Multimodal & Metadata RAG
- Index and query images, tables, and structured JSON using ColQwen-Omni (ColPali-based late interaction for audio, video, and visual documents)
- Implement metadata filtering, short vs long-term indices, and query routing logic
-
Cartridges RAG Technique
- Learn how Cartridges compress large corpora into small, trainable KV-cache structures for efficient retrieval (~39x less memory, ~26x faster)
- Master the Self-Study training approach using synthetic Q&A and context distillation for generalized question answering
-
Cartridge-Based Retrieval
- Learn modular retrieval systems with topic-specific "cartridges" for precision memory routing
-
Late Interaction Methods
- Study architectures like ColQwen-Omni that combine multimodal (text, audio, image) retrieval using late interaction fusion
-
Multi-Vector vs Single-Vector Retrieval
- Compare ColBERT/Turbopuffer vs FAISS, and understand trade-offs in granularity, accuracy, and inference cost
-
Query Routing & Hybrid Memory Systems
- Explore dynamic routing between lexical, dense, and multimodal indexes
-
Loss Functions for Retriever Training
- Compare contrastive loss vs triplet loss, and learn about semi-hard negative mining
-
Reranker Tuning with SBERT or APIs
- Fine-tune rerankers (SBERT, Cohere API), evaluate with MRR/nDCG, and integrate into retrieval loops
-
Exercises: Advanced RAG Techniques
- Implement triplet loss vs contrastive loss for reranker training with semi-hard negative mining
- Build multimodal RAG systems with images, tables, and query routing
- Compare single-vector (FAISS) vs multi-vector (ColBERT) retrieval
- Create cartridge-based RAG with topic-specific memory routing
-
Advanced AI-Evals & Monitoring
1 Unit
- 01Advanced AI-Evals & Monitoring
-
Advanced AI-Evals & Monitoring
- Scale LLM-judge for bulk multimodal outputs
- Build dashboards comparing judge accuracy vs IR metrics
- Implement auto-gate builds if accuracy drops below 95%
-
Agent Failure Analysis Deep Dive
- Create transition-state heatmaps & tool states visualization
- Construct failure-matrices with LLM classification
- Develop systematic debugging workflows
-
Enhancing RAG with Contextual Retrieval Recipes
- Use Instructor-driven synthetic data (Anthropic GitHub)
- Integrate web-search solutions (e.g., exa.ai)
- Apply LogFire, Braintrust augmentations
- Implement Cohere reranker + advanced logging
-
Advanced Synthetic & Statistical Validation
- Generate persona-varied synthetic questions (angry/confused personas) and rewrite questions for better retrieval
- Perform embedding-diversity checks and JSONL corpus structuring
- Work with multi-vector databases
- Build parallel experimentation harness using ThreadPoolExecutor
-
Strategic Feedback Collection
- Collect feedback with different types; use binary feedback (thumbs up/down) instead of stars
- Distinguish between two segment types: lack of data vs lack of capabilities
- Address common but fixable capability issues
-
Dynamic Prompting & Validation
- Build dynamic UI with chain-of-thought wrapping using XML or streaming
- Incorporate validators with regex (e.g., checking fake emails generated by LLM)
-
Data Segmentation & Prioritization
- Segment data based on patterns
- Apply Expected Value formula: Impact × Percentage of Queries × Probability of Success
-
Topic Discovery with BERTopic
- Configure and apply BERTopic for unsupervised topic discovery
- Set up embedding model, UMAP, and HDBSCAN for effective clustering
- Visualize topic similarities and relationships
- Analyze satisfaction scores by topic to identify pain points
- Create matrices showing relationship between topics and satisfaction
- Identify the "danger zone" of high-volume, low-satisfaction query areas
-
Persona-Driven Synthetic Queries
- Generate diverse queries (angry, curious, confused users) to stress-test retrieval and summarization pipelines
-
Regex & Schema Validators for LLM Outputs
- Add lightweight automated checks for emails, JSON formats, and other structural expectations
-
Segmentation-Driven Summarization
- Build summarization-specific chunks, integrate financial metadata, and compare with BM25 retrieval
-
Failure-Type Segmentation
- Classify failures into retrieval vs generation errors to guide improvement priorities
-
Clustering Queries with BERTopic
- Use UMAP + HDBSCAN to group user queries into semantically meaningful clusters
-
Mapping Feedback to Topics
- Overlay evaluator scores onto clusters to identify weak performance areas
-
Danger Zone Heatmaps
- Visualize query volume vs success rates to prioritize high-impact fixes
-
Feedback-to-Reranker Loop
- Build iterative reranking systems driven by topic segmentation and evaluation feedback
-
Dynamic Prompting for Tool Selection
- Teach LLMs to output structured tool calls reliably (JSON schema, guardrails, few-shots)
-
Tool Disambiguation and Clarification Loops
- Design prompts that force models to ask clarifying questions before executing
-
XML-Based CoT Streaming for Agents
- Output reasoning traces in structured XML-like format for real-time dashboards or UIs
-
Production-Grade Project
- Deploy a full RAG + fine-tuned LLM service
- Add multiple tools with RAG and implement tool routing
- Include multimodal retrieval, function-calling, LLM-judge pipeline, and monitoring
- Achieve ≥ 95% end-to-end task accuracy
-
Exercises: AI Evaluation & Monitoring Pipeline
- Build LLM-as-judge evaluation pipelines with accuracy dashboarding
- Apply BERTopic for failure analysis and danger zone heatmaps
- Generate persona-driven synthetic queries for stress-testing
- Implement automated quality gates with statistical validation
-
Intro RL & RLHF
1 Unit
- 01Intro RL & RLHF
-
Markov Processes as LLM Analogies
- Frame token generation as a Markov Decision Process (MDP) with states, actions, and rewards
-
Monte Carlo vs Temporal Difference Learning
- Compare Monte Carlo episode-based learning with Temporal Difference updates, and their relevance to token-level prediction
-
Q-Learning & Policy Gradients
- Explore conceptual foundations of Q-learning and policy gradients as the basis of RLHF and preference optimization
-
RL in Decoding and Chain-of-Thought
- Apply RL ideas during inference without retraining, including CoT prompting with reward feedback and speculative decoding verification
-
Exercises: RL Foundations with Neural Networks
- Implement token generation as MDP with policy and value networks
- Compare Monte Carlo vs Temporal Difference learning for value estimation
- Build Q-Learning from tables to DQN with experience replay
- Implement REINFORCE with baseline subtraction and entropy regularization
-
RL & RLHF Framework
1 Unit
- 01RL & RLHF Framework
-
DSPy + RL Integration
- Explore DSPy's prompt optimizer and RL system built into the pipeline
-
LangChain RL
- Use LangChain's experimental RL chain for reinforcement learning tasks
-
RL Fine-Tuning with OpenAI API
- Implement RL fine-tuning using OpenAI's API
-
RL Fine-Tuning Applications
- Apply RL fine-tuning for state-of-the-art email generation
- Apply RL fine-tuning for summarization tasks
-
RL Fine-Tuning with OpenPipe
- Use OpenPipe for RL fine-tuning workflows
-
DPO/PPO/GPRO Comparison
- Compare Direct Preference Optimization, Proximal Policy Optimization, and GPRO approaches
-
Reinforcement Learning with Verifiable Rewards (RLVR)
- Learn about RLVR methodology for training with verifiable reward signals
-
Rubric-Based RL Systems
- Explore rubric-based systems to guide RL at inference time for multi-step reasoning
-
Training Agents to Control Web Browsers
- Train agents to control web browsers with RL and Imitation Learning
-
Exercises: RL Frameworks & Advanced Algorithms
- Compare DSPy vs LangChain for building QA systems
- Implement GRPO and RLVR algorithms
- Build multi-turn agents with turn-level credit assignment
- Create privacy-preserving multi-model systems (PAPILLON) with utility-privacy tradeoffs
-
How RAG Finetuning and RLHF Fits in Production
1 Unit
- 01How RAG Finetuning and RLHF Fits in Production
-
End-to-End LLM Finetuning & Orchestration using RL
- Prepare instruction-tuning datasets (synthetic + human)
- Finetune a small LLM on your RAG tasks
- Use RL to finetune the same dataset and compare results across all approaches
- Select the appropriate finetuning approach and build RAG
- Implement orchestration patterns (pipelines, agents)
- Set up continuous monitoring integration using Braintrust
-
RL Frameworks in Practice
- Use DSPy, OpenAI API, LangChain's RLChain, OpenPipe ART, and PufferLib for RLHF tasks
-
Rubric-Based Reward Systems
- Design interpretable rubrics to score reasoning, structure, and correctness
-
Real-World Applications of RLHF
- Explore applications in summarization, email tuning, and web agent fine-tuning
-
RL and RLHF for RAG
- Apply RL techniques to optimize retrieval and generation in RAG pipelines
- Use RLHF to improve response quality based on user feedback and preferences
-
Exercises: End-to-End RAG with Finetuning & RLHF
- Finetune a small LLM (Llama 3.2 3B or Qwen 2.5 3B) on ELI5 dataset using LoRA/QLoRA
- Apply RLHF with rubric-based rewards to optimize responses
- Build production RAG with DSPy orchestration, logging, and monitoring
- Compare base → finetuned → RLHF-optimized models
-