Algorithmic Foundations of Interactive Learning
Spring 2025. 17-740. Tuesday / Thursday 11:00-12:20. GHC 4215. Lecture Recordings
Announcements 📣
Course Overview 📝
Interactive learning is a dynamic approach to machine learning where systems learn and adapt through continuous interaction with their environment or users, receiving feedback and adjusting their behavior in response. These techniques are currently experiencing a resurgence across various domains of artificial intelligence and machine learning, from robotics to language modeling. In this advanced theory course, students will explore interactive learning from its foundational principles to recent applications, including fine-tuning Large Language Models (LLMs) and robot learning from demonstration.
Key topics include:
- Online Learning: Learning under distribution shift.
- Game Solving: Using no-regret algorithms to compute equilibria.
- Reinforcement Learning: Sequential decision making. Model-free, model-based, and hybrid RL.
- Imitation Learning & Applications to Robotics: Learning from demonstrations. Behavioral cloning, DAgger, and inverse RL.
- RL from Human Feedback & Applications to Language Modeling: Learning from preferences. PPO, DPO, SPO.
Schedule (Tentative) 📅
Online Learning
- Jan. 14
- Jan. 16
- Intro to Online Learning / Weighted Majority [Scribe]
- JAB Course Note, Weighted Majority, Universal Portfolios
- Jan. 21
- Information Theory Overview [Scribe]
- MacKay Ch. 2, What is Entropy?
- Jan. 23
- Maximum Entropy & FTRL [Scribe]
- MaxEnt, The Entropy Principle, FTRL, Hedge
- Jan. 28
Game Solving
- Jan. 30
- Minimax via No Regret I [Video] [Scribe]
- Adaboost, Roth Textbook Ch.2
- Feb. 4
- Minimax via No Regret II [Video] [Scribe]
- Adaboost, Roth Textbook Ch.2, Gabriele’s CFR Note
Sequential Decision Making (RL/IL)
- Feb. 6
- Foundations of MDPs [Video], [Scribe]
- Akshay’s Note, Nan’s Note
- Feb. 11
- HW #1a Out DAgger & Covariate Shift in IL [HW #1a]
- Invitation to Imitation, DAgger,
- Feb. 13
- Feb. 18
- Policy Gradients: Introduction [Video], [Scribe]
- Wen’s Slides
- Feb. 20
- Policy Gradients: Baselines / Actor-Critic [Video], [Scribe]
- PPO, PPO Tricks, GAE, REINFORCE for LLMs
- Feb. 25
- 3 Views of The Natural Policy Gradient [Video], [Scribe]
- NPG, Covariant Policy Search, TRPO
- Feb. 27
Spring Break 🏝️
- Mar. 11
- Model-based RL as Game-Solving [Video], [Scribe]
- Wen’s Simulation Lemma Note, Agnostic SysID, DREAMER
- Mar. 13
- HW #1b Pres. [Video]
- Mar. 18
- Hybrid RL (Guest Lecture: Yuda Song) [Proj. Proposal]
- Yuda’s Note, HyQ, LAMPS
- Mar. 20
- Model Predictive Control & Test-Time Scaling
- Sparse Sampling, MCTS, THOR, MPPI, Expert Iteration, Dual Policy Iteration, V-STaR, Self-Improvement
Imitation Learning
- Mar. 25
- Proj. Proposal Due IL as Game Solving
- MaxEnt IRL, Moment Matching, Wen’s Soft VI Note
- Mar. 27
- Efficient Inverse RL
- FILTER, Hybrid IRL, Efficient Imitation Under Misspecification
- Apr. 1
- IRL2: Inverse RL In Real Life (Guest Lecture: Sanjiban Choudhury)
- Diffusion Policy
- Apr. 3
- Spring Carnival 🎡 (No Class)
RLHF
- Apr. 8
- The Information Geometry of RLHF
- RL from Prefs., PPO+RM, All Roads Lead to Likelihood
- Apr. 10
- The Value of Interaction in RLHF
- DPO, HyPO, All Roads Lead to Likelihood
- Apr. 15
- Apr. 17
- RLHF as Game Solving
- SPO
Project Presentations
- Apr. 22
- Project Pres.
- Presenters TBD
- Apr. 24
- Project Pres.
- Presenters TBD
Instructors 👨🏫


