Algorithmic Foundations of Interactive Learning

Spring 2025. 17-740. Tuesday / Thursday 11:00-12:20. GHC 4215. Lecture Recordings, Scribe Notes

Announcements 📣

Hello World!

Nov 10 · 0 min read

We can’t wait to meet you! 👋

Course Overview 📝

Interactive learning is a dynamic approach to machine learning where systems learn and adapt through continuous interaction with their environment or users, receiving feedback and adjusting their behavior in response. These techniques are currently experiencing a resurgence across various domains of artificial intelligence and machine learning, from robotics to language modeling. In this advanced theory course, students will explore interactive learning from its foundational principles to recent applications, including fine-tuning Large Language Models (LLMs) and robot learning from demonstration.

Key topics include:

Online Learning: Learning under distribution shift.
Game Solving: Using no-regret algorithms to compute equilibria.
Reinforcement Learning: Sequential decision making. Model-free, model-based, and hybrid RL.
Imitation Learning & Applications to Robotics: Learning from demonstrations. Behavioral cloning, DAgger, and inverse RL.
RL from Human Feedback & Applications to Language Modeling: Learning from preferences. PPO, DPO, SPO.

Schedule (Tentative) 📅

Online Learning

Jan. 14

Course Overview [Slides][Video]: Syllabus

Jan. 16

Intro to Online Learning / Weighted Majority [Video] [Scribe]: JAB Course Note, Weighted Majority, Universal Portfolios

Jan. 21

Information Theory Overview [Video] [Scribe]: MacKay Ch. 2, What is Entropy?

Jan. 23

Maximum Entropy & FTRL [Video] [Scribe]: MaxEnt, The Entropy Principle, FTRL, Hedge

Jan. 28

FTRL II [Video] [Scribe]: OGD

Game Solving

Jan. 30

Minimax via No Regret I [Video] [Scribe]: Adaboost, Roth Textbook Ch.2

Feb. 4

Minimax via No Regret II [Video] [Scribe]: Adaboost, Roth Textbook Ch.2, Gabriele’s CFR Note

Sequential Decision Making (RL/IL)

Feb. 6

Foundations of MDPs [Video], [Scribe]: Akshay’s Note, Nan’s Note

Feb. 11

HW #1a Out DAgger & Covariate Shift in IL [HW #1a], [Video] [Scribe]: Invitation to Imitation, DAgger,

Feb. 13

HW #1b Out Approximate Policy Iteration [HW #1b], [Video] [Scribe]: MACRL Ch. 8, CPI, PSDP, NRPI

Feb. 18

Policy Gradients: Introduction [Video], [Scribe]: Wen’s Slides

Feb. 20

Policy Gradients: Baselines / Actor-Critic [Video], [Scribe]: PPO, PPO Tricks, GAE, REINFORCE for LLMs

Feb. 25

3 Views of The Natural Policy Gradient [Video], [Scribe]: NPG, Covariant Policy Search, TRPO

Feb. 27

HW #1a due. Learning by Cheating [Slides] [Video]: LBC, SequIL

Spring Break 🏝️

Mar. 11

Model-based RL as Game-Solving [Video], [Scribe]: Wen’s Simulation Lemma Note, Agnostic SysID, DREAMER

Mar. 13

HW #1b Pres. [Video]

Mar. 18

Hybrid RL (Guest Lecture: Yuda Song) [Proj. Proposal] [Video] [Scribe]: Yuda’s Note, HyQ, LAMPS

Mar. 20

Model Predictive Control & Test-Time Scaling [Video] [Scribe]: Sparse Sampling, MCTS, THOR, MPPI, Expert Iteration, Dual Policy Iteration, V-STaR, Self-Improvement

Imitation Learning

Mar. 25

Proj. Proposal Due IL as Game Solving [Video] [Slides] [Scribe]: MaxEnt IRL, Moment Matching, Wen’s Soft VI Note

Mar. 27

Efficient Inverse RL [Video] [Slides] [Scribe]: FILTER, Hybrid IRL, Efficient Imitation Under Misspecification

Apr. 1

Imitation Learning In Real Life (Guest Lecture: Sanjiban Choudhury) [Video] [Scribe]: EIL, SAILOR

Apr. 3

Spring Carnival 🎡 (No Class)

RLHF

Apr. 8

HW #2 Out The Information Geometry of RLHF [HW #2] [Video] [Slides] [Scribe]: RL from Prefs., PPO+RM, DPO, All Roads Lead to Likelihood,

Apr. 10

The Value of Interaction in RLHF [Video] [Slides] [Scribe]: HyPO, All Roads Lead to Likelihood

Apr. 15

RL via Regressing Relative Rewards (Guest Lecture: Wen Sun) [Video] [Scribe]: REBEL, REFUEL, Code Tutorial

Apr. 17

RLHF as Game Solving [Video] [Slides] [Scribe]: SPO, DNO

Project Presentations

Apr. 22

Project Pres.: Presenters TBD

Apr. 24

Project Pres.: Presenters TBD

Instructors 👨‍🏫