RL & RLHF Framework

- DSPy + RL Integration - Explore DSPy's prompt optimizer and RL system built into the pipeline - LangChain RL - Use LangChain's experimental RL chain for reinforcement learning tasks - RL Fine-Tuning with OpenAI API - Implement RL fine-tuning using OpenAI's API - RL Fine-Tuning Applications - Apply RL fine-tuning for state-of-the-art email generation - Apply RL fine-tuning for summarization tasks - RL Fine-Tuning with OpenPipe - Use OpenPipe for RL fine-tuning workflows - DPO/PPO/GPRO Comparison - Compare Direct Preference Optimization, Proximal Policy Optimization, and GPRO approaches - Reinforcement Learning with Verifiable Rewards (RLVR) - Learn about RLVR methodology for training with verifiable reward signals - Rubric-Based RL Systems - Explore rubric-based systems to guide RL at inference time for multi-step reasoning - Training Agents to Control Web Browsers - Train agents to control web browsers with RL and Imitation Learning - Exercises: RL Frameworks & Advanced Algorithms - Compare DSPy vs LangChain for building QA systems - Implement GRPO and RLVR algorithms - Build multi-turn agents with turn-level credit assignment - Create privacy-preserving multi-model systems (PAPILLON) with utility-privacy tradeoffs