Calendar

Sep 27

Presented by AzaliaIntroduction

Course Overview: Why systems for machine learning?
Logistics

Oct 02

Presented by SimranTransformer Architecture

Introduction to Sequence Modeling
Strengths and Challenges of Early Recurrent Neural Networks
Transformers Walkthrough
Pretraining and Fine-Tuning

Oct 04

Presented by AzaliaHardware Aware Algorithm Design

Introduction to Processors, Parallelism, and the GPU Memory Hierarchy
Introduction to Arithmetic Intensity and Measures of Efficiency
Hardware Aware Algorithms for Matrix Multiplication
Principles for Achieving High Performance

Oct 09

Presented by AzaliaAnalyzing the Peformance of Transformers

Measuring the FLOPs of MLP and Transformer Training and Inference (Intro to Backpropagation)
Reviewing Autoregressive Generation and KV Caching
Measuring the Efficiency of KV Caching (FLOPs, Arithmetic Intensity)
Speculative Decoding

Oct 11

Presented by SimranHardware Aware Algorithm Design

Understanding the Scaling Bottlenecks of Standard Attention
Main Ideas in Efficient Attention Algorithms (Sparsity, Low Rank, Kernelization)
Motivating Hardware Aware Design. Profiling Standard Attention and Computing Arithmetic Intensity.
Reviewing the GPU execution model. How is attention executed on GPUs?
Reviewing Three Key Ideas: Fusion, Tiling, Caching vs. Recompute
Detailed Walkthrough of FlashAttention v1 and v2

Oct 16

Presented by SimranMemory Efficient Neural Networks

Pruning and Sparsity (Structured vs. Unstructured Pruning Tradeoffs, Sparse Tensor Cores and N:M Sparsity, Magnitude and Regression Based Pruning, Pruning Sensitivity Analysis)
Quantization (How are numbers represented in computers?, K-means Quantization, Linear Quantization)
Knowledge Distillation

Oct 18

Presented by AzaliaAdapting Large Language Models

Scaling Laws
Zero-short, Few-shot, Emergent Abilities
Instruction Following Models
RLHF-RLAIF-Constitutional AI
Parameter Efficient Finetuning

Oct 23

SpeakerTim Dettmers - Efficient Training and Inference: University of Washington

Watch Tim's awesome talk here! YouTube

Oct 25

Presented by AzaliaParallelism Fundamentals

Data Parallelism (All-Reduce, Ring All-Reduce, ZeRO, PyTorch FSDP)
Tensor Parallelism
Pipeline Parallelism
Automatic Parallelization (Alpa)
Comparisons, Pros and Cons of Each Strategy

Oct 30

SpeakerDeepak Narayanan – Parallelism: Stanford, Microsoft, NVidia

Watch Deepak's awesome talk here! YouTube

Nov 01

Presented by SimranEfficient Attention-Free Archiectures

Revisit Attention Scaling Bottlenecks

Introduction to Convolutions, Fourier Transforms, and the FFT-Convolution Theorem

Tradeoffs of Transformer vs. RNN vs. CNN Training & Inference Efficiency, and Inductive Biases

State Space Models Walkthrough

Limitations of State Space Models; Towards Input-Dependent Sequence Mixers

Nov 06

SpeakerCe Zhang – Decentralized Training: ETH Zurich, Together.ai, University of Chicago

Training on Heterogeneous Compute

Communication Compression

RedPajama and Data Quality

Nov 08

Presented by AzaliaAdapting Large Language Models

Supervised Fine-tuning Loss

Reinforcement Learning Fundamentals

LLM Fine-tuning with RL

Reward Modeling

Nov 13

SpeakerWilliam Fedus – Sparse Models, MoE: OpenAI

Nov 15

SpeakerChris Ré: Stanford University

Nov 20: HolidayThanksgiving Break
Nov 22: HolidayThanksgiving Break

Nov 27

SpeakerTianqi Chen - TVM and ML Compilers: Carnegie Mellon University

Whatch Tianqi's awesome talk here! YouTube

Nov 29

Presented by AzaliaScheduling ML Clusters

Week 10

Dec 4

Presented by SimranData Pipelines

Weak Supervision for Data Labeling
Pretraining Data Selection Methods

Dec 6

Presented by AzaliaResearch for ML Systems

Calendar

Week 1

Week 2

Week 3

Week 4

Week 5

Week 6

Week 7

Week 8

Week 9

Week 10