Top Lessons

Most Recent

Most Popular

Highest Rated

Reset

lesson

How RAG Finetuning and RLHF Fits in Production

- End-to-End LLM Finetuning & Orchestration using RL - Prepare instruction-tuning datasets (synthetic + human) - Finetune a small LLM on your RAG tasks - Use RL to finetune the same dataset and compare results across all approaches - Select the appropriate finetuning approach and build RAG - Implement orchestration patterns (pipelines, agents) - Set up continuous monitoring integration using Braintrust - RL Frameworks in Practice - Use DSPy, OpenAI API, LangChain's RLChain, OpenPipe ART, and PufferLib for RLHF tasks - Rubric-Based Reward Systems - Design interpretable rubrics to score reasoning, structure, and correctness - Real-World Applications of RLHF - Explore applications in summarization, email tuning, and web agent fine-tuning - RL and RLHF for RAG - Apply RL techniques to optimize retrieval and generation in RAG pipelines - Use RLHF to improve response quality based on user feedback and preferences - Exercises: End-to-End RAG with Finetuning & RLHF - Finetune a small LLM (Llama 3.2 3B or Qwen 2.5 3B) on ELI5 dataset using LoRA/QLoRA - Apply RLHF with rubric-based rewards to optimize responses - Build production RAG with DSPy orchestration, logging, and monitoring - Compare base → finetuned → RLHF-optimized models

lesson

RL & RLHF Framework

- DSPy + RL Integration - Explore DSPy's prompt optimizer and RL system built into the pipeline - LangChain RL - Use LangChain's experimental RL chain for reinforcement learning tasks - RL Fine-Tuning with OpenAI API - Implement RL fine-tuning using OpenAI's API - RL Fine-Tuning Applications - Apply RL fine-tuning for state-of-the-art email generation - Apply RL fine-tuning for summarization tasks - RL Fine-Tuning with OpenPipe - Use OpenPipe for RL fine-tuning workflows - DPO/PPO/GPRO Comparison - Compare Direct Preference Optimization, Proximal Policy Optimization, and GPRO approaches - Reinforcement Learning with Verifiable Rewards (RLVR) - Learn about RLVR methodology for training with verifiable reward signals - Rubric-Based RL Systems - Explore rubric-based systems to guide RL at inference time for multi-step reasoning - Training Agents to Control Web Browsers - Train agents to control web browsers with RL and Imitation Learning - Exercises: RL Frameworks & Advanced Algorithms - Compare DSPy vs LangChain for building QA systems - Implement GRPO and RLVR algorithms - Build multi-turn agents with turn-level credit assignment - Create privacy-preserving multi-model systems (PAPILLON) with utility-privacy tradeoffs

lesson

Intro RL & RLHF

- Markov Processes as LLM Analogies - Frame token generation as a Markov Decision Process (MDP) with states, actions, and rewards - Monte Carlo vs Temporal Difference Learning - Compare Monte Carlo episode-based learning with Temporal Difference updates, and their relevance to token-level prediction - Q-Learning & Policy Gradients - Explore conceptual foundations of Q-learning and policy gradients as the basis of RLHF and preference optimization - RL in Decoding and Chain-of-Thought - Apply RL ideas during inference without retraining, including CoT prompting with reward feedback and speculative decoding verification - Exercises: RL Foundations with Neural Networks - Implement token generation as MDP with policy and value networks - Compare Monte Carlo vs Temporal Difference learning for value estimation - Build Q-Learning from tables to DQN with experience replay - Implement REINFORCE with baseline subtraction and entropy regularization

lesson

Advanced AI-Evals & Monitoring

- Advanced AI-Evals & Monitoring - Scale LLM-judge for bulk multimodal outputs - Build dashboards comparing judge accuracy vs IR metrics - Implement auto-gate builds if accuracy drops below 95% - Agent Failure Analysis Deep Dive - Create transition-state heatmaps & tool states visualization - Construct failure-matrices with LLM classification - Develop systematic debugging workflows - Enhancing RAG with Contextual Retrieval Recipes - Use Instructor-driven synthetic data (Anthropic GitHub) - Integrate web-search solutions (e.g., exa.ai) - Apply LogFire, Braintrust augmentations - Implement Cohere reranker + advanced logging - Advanced Synthetic & Statistical Validation - Generate persona-varied synthetic questions (angry/confused personas) and rewrite questions for better retrieval - Perform embedding-diversity checks and JSONL corpus structuring - Work with multi-vector databases - Build parallel experimentation harness using ThreadPoolExecutor - Strategic Feedback Collection - Collect feedback with different types; use binary feedback (thumbs up/down) instead of stars - Distinguish between two segment types: lack of data vs lack of capabilities - Address common but fixable capability issues - Dynamic Prompting & Validation - Build dynamic UI with chain-of-thought wrapping using XML or streaming - Incorporate validators with regex (e.g., checking fake emails generated by LLM) - Data Segmentation & Prioritization - Segment data based on patterns - Apply Expected Value formula: Impact × Percentage of Queries × Probability of Success - Topic Discovery with BERTopic - Configure and apply BERTopic for unsupervised topic discovery - Set up embedding model, UMAP, and HDBSCAN for effective clustering - Visualize topic similarities and relationships - Analyze satisfaction scores by topic to identify pain points - Create matrices showing relationship between topics and satisfaction - Identify the "danger zone" of high-volume, low-satisfaction query areas - Persona-Driven Synthetic Queries - Generate diverse queries (angry, curious, confused users) to stress-test retrieval and summarization pipelines - Regex & Schema Validators for LLM Outputs - Add lightweight automated checks for emails, JSON formats, and other structural expectations - Segmentation-Driven Summarization - Build summarization-specific chunks, integrate financial metadata, and compare with BM25 retrieval - Failure-Type Segmentation - Classify failures into retrieval vs generation errors to guide improvement priorities - Clustering Queries with BERTopic - Use UMAP + HDBSCAN to group user queries into semantically meaningful clusters - Mapping Feedback to Topics - Overlay evaluator scores onto clusters to identify weak performance areas - Danger Zone Heatmaps - Visualize query volume vs success rates to prioritize high-impact fixes - Feedback-to-Reranker Loop - Build iterative reranking systems driven by topic segmentation and evaluation feedback - Dynamic Prompting for Tool Selection - Teach LLMs to output structured tool calls reliably (JSON schema, guardrails, few-shots) - Tool Disambiguation and Clarification Loops - Design prompts that force models to ask clarifying questions before executing - XML-Based CoT Streaming for Agents - Output reasoning traces in structured XML-like format for real-time dashboards or UIs - Production-Grade Project - Deploy a full RAG + fine-tuned LLM service - Add multiple tools with RAG and implement tool routing - Include multimodal retrieval, function-calling, LLM-judge pipeline, and monitoring - Achieve ≥ 95% end-to-end task accuracy - Exercises: AI Evaluation & Monitoring Pipeline - Build LLM-as-judge evaluation pipelines with accuracy dashboarding - Apply BERTopic for failure analysis and danger zone heatmaps - Generate persona-driven synthetic queries for stress-testing - Implement automated quality gates with statistical validation

lesson

Advanced RAG with Multi-Media RAG

- Advanced RAG Reranker Training & Triplet Fundamentals - Learn contrastive loss vs triplet loss approaches for training retrievers - Understand tri-encoder vs cross-encoder performance trade-offs - Master triplet-loss fundamentals and semi-hard negative mining strategies - Fine-tune rerankers using Cohere Rerank API & SBERT (sbert.net, Hugging Face) - Multimodal & Metadata RAG - Index and query images, tables, and structured JSON using ColQwen-Omni (ColPali-based late interaction for audio, video, and visual documents) - Implement metadata filtering, short vs long-term indices, and query routing logic - Cartridges RAG Technique - Learn how Cartridges compress large corpora into small, trainable KV-cache structures for efficient retrieval (~39x less memory, ~26x faster) - Master the Self-Study training approach using synthetic Q&A and context distillation for generalized question answering - Cartridge-Based Retrieval - Learn modular retrieval systems with topic-specific "cartridges" for precision memory routing - Late Interaction Methods - Study architectures like ColQwen-Omni that combine multimodal (text, audio, image) retrieval using late interaction fusion - Multi-Vector vs Single-Vector Retrieval - Compare ColBERT/Turbopuffer vs FAISS, and understand trade-offs in granularity, accuracy, and inference cost - Query Routing & Hybrid Memory Systems - Explore dynamic routing between lexical, dense, and multimodal indexes - Loss Functions for Retriever Training - Compare contrastive loss vs triplet loss, and learn about semi-hard negative mining - Reranker Tuning with SBERT or APIs - Fine-tune rerankers (SBERT, Cohere API), evaluate with MRR/nDCG, and integrate into retrieval loops - Exercises: Advanced RAG Techniques - Implement triplet loss vs contrastive loss for reranker training with semi-hard negative mining - Build multimodal RAG systems with images, tables, and query routing - Compare single-vector (FAISS) vs multi-vector (ColBERT) retrieval - Create cartridge-based RAG with topic-specific memory routing