Skip to main content
Link
Search
Expand
Document
(external link)
Calendar
Week 1
Sep 27
Lecture by Azalia Introduction
Course Overview: Why systems for machine learning?
Logistics
Week 2
Oct 02
Lecture by Simran Transformer Architecture
Introduction to Sequence Modeling
Strengths and Challenges of Early Recurrent Neural Networks
Transformers Walkthrough
Pretraining and Fine-Tuning
Oct 04
Lecture by Azalia Hardware Aware Algorithm Design
Introduction to Processors, Parallelism, and the GPU Memory Hierarchy
Introduction to Arithmetic Intensity and Measures of Efficiency
Hardware Aware Algorithms for Matrix Multiplication
Principles for Achieving High Performance
Week 3
Oct 09
Lecture by Azalia Analyzing the Peformance of Transformers
Measuring the FLOPs of MLP and Transformer Training and Inference (Intro to Backpropagation)
Reviewing Autoregressive Generation and KV Caching
Measuring the Efficiency of KV Caching (FLOPs, Arithmetic Intensity)
Speculative Decoding
Oct 11
Lecture by Simran Hardware Aware Algorithm Design
Understanding the Scaling Bottlenecks of Standard Attention
Main Ideas in Efficient Attention Algorithms (Sparsity, Low Rank, Kernelization)
Motivating Hardware Aware Design. Profiling Standard Attention and Computing Arithmetic Intensity.
Reviewing the GPU execution model. How is attention executed on GPUs?
Reviewing Three Key Ideas: Fusion, Tiling, Caching vs. Recompute
Detailed Walkthrough of FlashAttention v1 and v2
Week 4
Oct 16
Lecture by Simran Memory Efficient Neural Networks
Pruning and Sparsity (Structured vs. Unstructured Pruning Tradeoffs, Sparse Tensor Cores and N:M Sparsity, Magnitude and Regression Based Pruning, Pruning Sensitivity Analysis)
Quantization (How are numbers represented in computers?, K-means Quantization, Linear Quantization)
Knowledge Distillation
Oct 18
Lecture by Azalia Adapting Large Language Models
Scaling Laws
Zero-short, Few-shot, Emergent Abilities
Instruction Following Models
RLHF-RLAIF-Constitutional AI
Parameter Efficient Finetuning
Week 5
Week 6
Oct 30
Speaker Deepak Narayanan â€“ Parallelism
Stanford, Microsoft, NVidia
Watch Deepak's awesome talk here! YouTube
Nov 01
Lecture by Simran Efficient Attention-Free Archiectures
Revisit Attention Scaling Bottlenecks
Introduction to Convolutions, Fourier Transforms, and the FFT-Convolution Theorem
Tradeoffs of Transformer vs. RNN vs. CNN Training & Inference Efficiency, and Inductive Biases
State Space Models Walkthrough
Limitations of State Space Models; Towards Input-Dependent Sequence Mixers
Week 7
Nov 06
Speaker Ce Zhang â€“ Decentralized Training
ETH Zurich, Together.ai, University of Chicago
Training on Heterogeneous Compute
Communication Compression
RedPajama and Data Quality
Nov 08
Lecture by Azalia Adapting Large Language Models
Supervised Fine-tuning Loss
Reinforcement Learning Fundamentals
LLM Fine-tuning with RL
Reward Modeling
Week 8
Week 9