Algorithmic Foundations of Interactive Learning

Spring 2025. 17-740. Tuesday / Thursday 11:00-12:20. GHC 4215. Lecture Recordings

Announcements 📣

Hello World!

Nov 10 · 0 min read

We can’t wait to meet you! 👋

Course Overview 📝

Interactive learning is a dynamic approach to machine learning where systems learn and adapt through continuous interaction with their environment or users, receiving feedback and adjusting their behavior in response. These techniques are currently experiencing a resurgence across various domains of artificial intelligence and machine learning, from robotics to language modeling. In this advanced theory course, students will explore interactive learning from its foundational principles to recent applications, including fine-tuning Large Language Models (LLMs) and robot learning from demonstration.

Key topics include:

  1. Online Learning: Learning under distribution shift.
  2. Game Solving: Using no-regret algorithms to compute equilibria.
  3. Reinforcement Learning: Sequential decision making. Model-free, model-based, and hybrid RL.
  4. Imitation Learning & Applications to Robotics: Learning from demonstrations. Behavioral cloning, DAgger, and inverse RL.
  5. RL from Human Feedback & Applications to Language Modeling: Learning from preferences. PPO, DPO, SPO.

Schedule (Tentative) 📅

Online Learning

Jan. 14
Course Overview [Slides][Video]
Syllabus
Jan. 16
Intro to Online Learning / Weighted Majority [Scribe]
JAB Course Note, Weighted Majority, Universal Portfolios
Jan. 21
Information Theory Overview [Scribe]
MacKay Ch. 2, What is Entropy?
Jan. 23
Maximum Entropy & FTRL [Scribe]
MaxEnt, The Entropy Principle, FTRL, Hedge
Jan. 28
FTRL II [Video] [Scribe]
OGD

Game Solving

Jan. 30
Minimax via No Regret I [Video] [Scribe]
Adaboost, Roth Textbook Ch.2
Feb. 4
Minimax via No Regret II [Video] [Scribe]
Adaboost, Roth Textbook Ch.2, Gabriele’s CFR Note

Sequential Decision Making (RL/IL)

Feb. 6
Foundations of MDPs [Video], [Scribe]
Akshay’s Note, Nan’s Note
Feb. 11
HW #1a Out DAgger & Covariate Shift in IL [HW #1a]
Invitation to Imitation, DAgger,
Feb. 13
HW #1b Out Approximate Policy Iteration [HW #1b], [Scribe]
MACRL Ch. 8, CPI, PSDP, NRPI
Feb. 18
Policy Gradients: Introduction [Video], [Scribe]
Wen’s Slides
Feb. 20
Policy Gradients: Baselines / Actor-Critic [Video], [Scribe]
PPO, PPO Tricks, GAE, REINFORCE for LLMs
Feb. 25
3 Views of The Natural Policy Gradient [Video], [Scribe]
NPG, Covariant Policy Search, TRPO
Feb. 27
HW #1a due. Learning by Cheating [Slides]
LBC, SequIL

Spring Break 🏝️

Mar. 11
Model-based RL as Game-Solving [Video], [Scribe]
Wen’s Simulation Lemma Note, Agnostic SysID, DREAMER
Mar. 13
HW #1b Pres. [Video]
Mar. 18
Hybrid RL (Guest Lecture: Yuda Song) [Proj. Proposal]
Yuda’s Note, HyQ, LAMPS
Mar. 20
Model Predictive Control & Test-Time Scaling
Sparse Sampling, MCTS, THOR, MPPI, Expert Iteration, Dual Policy Iteration, V-STaR, Self-Improvement

Imitation Learning

Mar. 25
Proj. Proposal Due IL as Game Solving
MaxEnt IRL, Moment Matching, Wen’s Soft VI Note
Mar. 27
Efficient Inverse RL
FILTER, Hybrid IRL, Efficient Imitation Under Misspecification
Apr. 1
IRL2: Inverse RL In Real Life (Guest Lecture: Sanjiban Choudhury)
Diffusion Policy
Apr. 3
Spring Carnival 🎡 (No Class)

RLHF

Apr. 8
The Information Geometry of RLHF
RL from Prefs., PPO+RM, All Roads Lead to Likelihood
Apr. 10
The Value of Interaction in RLHF
DPO, HyPO, All Roads Lead to Likelihood
Apr. 15
REBEL and REFUEL (Guest Lecture: Wen Sun)
REBEL, REFUEL
Apr. 17
RLHF as Game Solving
SPO

Project Presentations

Apr. 22
Project Pres.
Presenters TBD
Apr. 24
Project Pres.
Presenters TBD

Instructors 👨‍🏫

Avatar
Avatar
Avatar