Link Search Menu Expand Document

Advanced Machine Learning

University of Washington, Autumn 2023

About

This course will cover advanced machine learning, from VC dimension to ChatGPT. It will be divided into two parts: theoretical and empirical. In the first we will cover topics such as VC dimension, Rademacher complexity, ERM, generalization bounds, and optimization basics. Next we will cover the components and development of advanced ML systems such as GPT-3.

Lecture

Tuesday / Thursday 10 - 11:20 am PT CSE2 G04 (Gates building).

Lecture 1 Introduction (slides).

Lecture 2 Introduction to learning theory – PAC learning of finite hypothesis classes in the realizable case (March 30) (Chapters 2 and 3 in Understaning Machine LearningPDF)

Lecture 3 See edstem post.

Lecture 4 See edstem post.

Lecture 5 Convergence of Gradient Descent. Smooth and convex starts at Sec 2.3 here – Lemma 1 and 2 discussed in class are Lemmas 2.8 and 2.9.

Lecture 6 Convergence of Gradient Descent continued. Smooth and strongly convex is here. See chapter 14 of the textbook for the analysis of GD in the Lipschitz setting and the analysis of SGD.

Lecture 7 Continued analysis of SGD, modern optimizers. See here for the Adam slides.

Lecture 8 Generalization bounds and online learning.

Lecture 9 Generalization bounds. Rademacher complexity - See chapter 26.

Lecture 10 Introduction to language modelling.

Lecture 11 Introduction to language modelling continued.

Lecture 12 Introduction to language modelling continued.

Lecture 13 Recent GPT-style language models.

Lecture 14 Language modeling architecture: from zero to llama.

Lecture 15 GPT-style language models, generalization bounds.

Lecture 16 Generalization bounds, scaling, multimodal models.

Lecture 17 Multimodal, continued.

Lecture 18 Efficient deep learning.

Lecture 19 Chain of thought prompting and instruction tuning.

Lecture 20 Final project presentations and closing remarks.

Assignments

Link to gradescope.

Homework 1 Version 4 (PDF, source). Due on Tuesday, May 2nd at 11:59pm.

Homework 2 Version 1 (PDF). HW2 is due on Sunday, June 4th at 11:59pm. Submissions should be done by one person per group.

Extra Reading Assignment

Week 1 April 27 - May 4: GPT-2. Due 11:59pm on May 4. For more detail see here.

Week 2 May 4 - May 11: GPT-3. Due 11:59pm on May 11. For more detail see here.

Week 3 May 11 - May 18: Transformers. Due 11:59pm on May 18. For more detail see here.

Week 4 May 18 - May 25: CLIP. Due 11:59pm on May 25. For more detail see here.

Week 5 May 25 - Jun 1: Instruct GPT. Due 11:59pm on Jun 1. For more detail see here.

Project

The project will be about a replication of research, original empirical research, or a summarization of a line of theoretical work (and potential extension). There are three milestones for the project: (1) a proposal what you will work on, (2) version 1 which checks if you are on track to finish the project in time, (3) the final version which includes the full report.

Resources:

Deadlines:

  • Proposal: Monday, May 1, 11:59pm
  • Version 1: Friday, May 12, 11:59pm
  • Final version: Friday, June 2, 11:59pm

Office hours for project milestones

  • Proposal office hours: Thursday, April 27, 9am, Allen Center, CSE1 678
  • Milestone 1 and Final version office hours: Fridays May 5 to June 2, 10am, Allen Center, CSE1 678

Grading for the project is distributed as such: 10% for the proposal, 25% for version 1, and 65% for the final version. The project is 50% of total course project grade.

Contact dettmers@cs.washington.edu if you have any issues with the project, such as finding a team, finding a project, or making progress on your project.

Tim will also be available for office hours centered around the project. Office hours will be posted soon.

Grading

For students enrolled in CSE 493, your class grade will be determined by:

  • 25% assignment 1
  • 25% assignment 2
  • 50% final project

For students enrolled in CSE 599, your class grade will be determined by:

  • 20% assignment 1
  • 20% assignment 2
  • 10% reading assignment
  • 50% final project