Skip to main content Link Search Menu Expand Document (external link)

CS 229S - Systems for Machine Learning


Deep learning and neural networks are being increasingly adopted across industries. They are now used to serve billions of users across applications such as search, data management, and productivity assistants. As models become more capable and intelligent, this trend of large-scale adoption will continue to grow rapidly. Due to the widespread application, there is an increasing need to achieve high performance for both training and serving deep-learning models. However, performance is hindered by a multitude of infrastructure and lifecycle hurdles - the increasing complexity of the models, massive sizes of training and inference data, heterogeneity of the available accelerators and multi-node platforms, and diverse network properties. The slow adaptation of systems to new algorithms creates a bottleneck for the rapid evolution of deep-learning models and their applications. This course will focus on performance efficiency and scalability of deep learning systems. We will cover a diverse set of topics on efficient training, finetuning, and inference, with an emphasis on Transformer architectures and Large Language Models.

Teaching Team

Azalia Mirhoseini

Course Creator and Instructor

Simran Arora

Course Creator and Instructor

Hermann Kumbong

Head Teaching Assistant


Where: In Person, OSHMAN McMurtry Art and Art History Building, Oshman Presentation Space, Room 102

When: Mondays and Wednesdays, 10:30-11:20am PST.

Prerequisites: Knowledge of basic computer science principles and skills at a level sufficient to write a reasonably non-trivial computer program in Python/NumPy to the equivalency of CS106A, CS106B, or CS106X, familiarity with probability theory to the equivalency of CS 109, MATH151, or STATS 116, and familiarity with multivariable calculus and linear algebra to the equivalency of MATH51 or CS205. Foundations of Machine Learning (e.g. CS221, CS229, CS230, or CS124) is preferrable.


Welcome to the first iteration of CS229S!

The course format will primarily include lectures taught by course instructors. In the latter half of the course, we will also hear from guest experts in various aspects of systems for machine learning, representing a diverse set of experiences.

We are excited to create custom assignments for this course! Assignments are as follows:

  • Project 1 (2.5 Weeks): Implementing and Training Transformers for Language Modeling, Hardcoding the Weights of a Two-Layer Transformer to Solve Associative Recall (in a Jupyter Notebook!), Arithmetic Intensity Math for Multi-Headed Attention, Programming a CUDA Kernel for 1D Depthwise Convolution
  • Project 2 (1 Week): Transformer Inference with KV Caching and Speculative Decoding. Inference Math for Both Techniques, Implementing Both Techniques in Karpathy's NanoGPT Backbone.
  • Final Project (5 Weeks)
  • Participation: Attending and Asking Questions in Guest Lectures