Algorithmic Foundations of Interactive Learning

Spring 2025. 17-740. Tuesday / Thursday 11:00-12:20. GHC 4215.

Announcements 📣

Hello World!

Nov 10 · 0 min read

We can’t wait to meet you! 👋

Course Overview 📝

Interactive learning is a dynamic approach to machine learning where systems learn and adapt through continuous interaction with their environment or users, receiving feedback and adjusting their behavior in response. These techniques are currently experiencing a resurgence across various domains of artificial intelligence and machine learning, from robotics to language modeling. In this advanced theory course, students will explore interactive learning from its foundational principles to recent applications, including fine-tuning Large Language Models (LLMs) and robot learning from demonstration.

Key topics include:

  1. Online Learning: Learning under distribution shift.
  2. Game Solving: Using no-regret algorithms to compute equilibria.
  3. Reinforcement Learning: Sequential decision making. Model-free, model-based, and hybrid RL.
  4. Imitation Learning & Applications to Robotics: Learning from demonstrations. Behavioral cloning, DAgger, and inverse RL.
  5. RL from Human Feedback & Applications to Language Modeling: Learning from preferences. PPO, DPO, SPO.

Schedule (Tentative) 📅

Online Learning

Jan. 14
Course Overview [Slides]
Syllabus
Jan. 16
Intro to Online Learning / Hedge
JAB Course Note, Weighted Majority, Hedge, Universal Portfolios
Jan. 21
Information Theory and Maximum Entropy
MacKay Ch. 2, MaxEnt
Jan. 23
Online Gradient Descent
OGD
Jan. 26
Buffer / Follow-the-Leader
FTRL

Game Solving

Jan. 28
Computing Equilibria I
Adaboost, Roth Textbook Ch.2
Feb. 4
Computing Equilibria II
Adaboost, Roth Textbook Ch.2

Sequential Decision Making (RL/IL)

Feb. 6
Foundations of MDPs
Nan’s Note 1, HW #1 Out
Feb. 11
DAgger & Covariate Shift in IL
Invitation to Imitation, DAgger
Feb. 13
Approximate Policy Iteration
CPI, PSDP, NRPI
Feb. 18
Policy Gradients
Wen’s Slides
Feb. 20
The Natural Policy Gradient
NPG, Covariant Policy Search
Feb. 25
TRPO & PPO
TRPO, PPO
Feb. 27
HW #1 Pres.
HW #2 Out

Spring Break 🏝️

Mar. 11
Model-based RL
Wen’s Simulation Lemma Note, Nan’s Note 4
Mar. 13
Proj. Proposal Due Hybrid RL (Guest Lecture: Yuda Song)
HyQ, LAMPS
Mar. 18
Learning by Cheating
LBC, SequIL
Mar. 20
Model Predictive Control & Test-Time Scaling
MCTS

Inverse RL

Mar. 25
IL as Game Solving
MaxEnt IRL, Moment Matching
Mar. 27
Efficient Inverse RL
FILTER, Hybrid IRL
Apr. 1
IRL2: Inverse RL In Real Life (Guest Lecture: Sanjiban Choudhury)
Diffusion Policy, DREAMER
Apr. 3
HW #2 Pres.

RLHF

Apr. 8
The Information Geometry of RLHF
RL from Prefs., PPO+RM
Apr. 10
The Value of Interaction in RLHF
DPO, HyPO
Apr. 15
REBEL and REFUEL (Guest Lecture: Wen Sun)
REBEL, REFUEL
Apr. 17
RLHF as Game Solving
SPO

Project Presentations

Apr. 22
Project Pres.
Presenters TBD
Apr. 24
Project Pres.
Presenters TBD

Instructors 👨‍🏫

Avatar
Avatar
Avatar