CS 229S - Systems for Machine Learning

Overview

Deep learning and specifically Large Language Models (LLMs) are now used to serve billions of users across applications such as search, data management, and productivity assistants. Their widespread adoption amplifies the need for high performance in both training and serving.

However, performance is hindered by a multitude of hurdles across the computing stack, including the increasing complexity of the models, massive sizes of training and inference data, heterogeneity of the available accelerators and multi-node platforms, and diverse network properties.

This course will focus on performance efficiency and scalability of deep learning systems. We will cover a diverse set of topics on efficient training, fine-tuning, and inference, with an emphasis on Transformer architectures and LLMs.

Our slides are available on the Calendar page.
Collection of Systems for ML research: Scaling Intelligence Lab
To contact the teaching team, please email cs229s-24-staff@lists.stanford.edu.

Teaching Team

Azalia Mirhoseini

azalia@stanford.edu

Course Creator and Instructor

Ananth Agarwal

ananthag@stanford.edu

Teaching Assistant

Dan Fu

danfu@cs.stanford.ed

Teaching Assistant

Jordan Juravsky

jbj@stanford.edu

Teaching Assistant

Liana Patel

lianapat@stanford.edu

Teaching Assistant

Logistics

Where: In Person and online, Gates B3

When: Mondays and Fridays, 1:30PM-2:20 PM

Prerequisites: Knowledge of basic computer science principles and skills at a level sufficient to write a reasonably non-trivial computer program in Python/NumPy to the equivalency of CS106A, CS106B, or CS106X, familiarity with probability theory to the equivalency of CS 109, MATH151, or STATS 116, and familiarity with multivariable calculus and linear algebra to the equivalency of MATH51 or CS205. Foundations of Machine Learning (e.g. CS221, CS229, CS230, or CS124) is preferrable.

Class

Welcome to the second iteration of CS229S!

The course format will primarily include lectures taught by course instructors. In the latter half of the course, we will also hear from guest experts in various aspects of systems for machine learning, representing a diverse set of experiences.

We are excited to create custom assignments for this course! Assignments are as follows:

Project 1 (1.5 Weeks): Implementing and Training Transformers for Language Modeling, Hardcoding the Weights of a Two-Layer Transformer to Solve Associative Recall (in a Jupyter Notebook!), Arithmetic Intensity Math for Multi-Headed Attention

Project 2 (1 Week): Transformer Inference with KV Caching and Speculative Decoding. Inference Math for Both Techniques, Implementing Both Techniques in Karpathy's NanoGPT Backbone.

Final Project (7.5 Weeks)

Participation: Attending and Asking Questions in Guest Lectures

Past Iterations: Fall 2023