Skip to main content Link Search Menu Expand Document (external link)


Week 1

Sep 27
Lecture by AzaliaIntroduction
  • Course Overview: Why systems for machine learning?
  • Logistics

Week 2

Oct 02
Lecture by SimranTransformer Architecture
  • Introduction to Sequence Modeling
  • Strengths and Challenges of Early Recurrent Neural Networks
  • Transformers Walkthrough
  • Pretraining and Fine-Tuning
Oct 04
Lecture by AzaliaHardware Aware Algorithm Design
  • Introduction to Processors, Parallelism, and the GPU Memory Hierarchy
  • Introduction to Arithmetic Intensity and Measures of Efficiency
  • Hardware Aware Algorithms for Matrix Multiplication
  • Principles for Achieving High Performance

Week 3

Oct 09
Lecture by AzaliaAnalyzing the Peformance of Transformers
  • Measuring the FLOPs of MLP and Transformer Training and Inference (Intro to Backpropagation)
  • Reviewing Autoregressive Generation and KV Caching
  • Measuring the Efficiency of KV Caching (FLOPs, Arithmetic Intensity)
  • Speculative Decoding
Oct 11
Lecture by SimranHardware Aware Algorithm Design
  • Understanding the Scaling Bottlenecks of Standard Attention
  • Main Ideas in Efficient Attention Algorithms (Sparsity, Low Rank, Kernelization)
  • Motivating Hardware Aware Design. Profiling Standard Attention and Computing Arithmetic Intensity.
  • Reviewing the GPU execution model. How is attention executed on GPUs?
  • Reviewing Three Key Ideas: Fusion, Tiling, Caching vs. Recompute
  • Detailed Walkthrough of FlashAttention v1 and v2

Week 4

Oct 16
Lecture by SimranMemory Efficient Neural Networks
  • Pruning and Sparsity (Structured vs. Unstructured Pruning Tradeoffs, Sparse Tensor Cores and N:M Sparsity, Magnitude and Regression Based Pruning, Pruning Sensitivity Analysis)
  • Quantization (How are numbers represented in computers?, K-means Quantization, Linear Quantization)
  • Knowledge Distillation
Oct 18
Lecture by AzaliaAdapting Large Language Models
  • Scaling Laws
  • Zero-short, Few-shot, Emergent Abilities
  • Instruction Following Models
  • RLHF-RLAIF-Constitutional AI
  • Parameter Efficient Finetuning

Week 5

Oct 23
SpeakerTim Dettmers - Efficient Training and Inference
University of Washington
  • Watch Tim's awesome talk here! YouTube
  • Oct 25
    Lecture by AzaliaParallelism Fundamentals
    • Data Parallelism (All-Reduce, Ring All-Reduce, ZeRO, PyTorch FSDP)
    • Tensor Parallelism
    • Pipeline Parallelism
    • Automatic Parallelization (Alpa)
    • Comparisons, Pros and Cons of Each Strategy

    Week 6

    Oct 30
    SpeakerDeepak Narayanan – Parallelism
    Stanford, Microsoft, NVidia
  • Watch Deepak's awesome talk here! YouTube
  • Nov 01
    Lecture by SimranEfficient Attention-Free Archiectures
  • Revisit Attention Scaling Bottlenecks
  • Introduction to Convolutions, Fourier Transforms, and the FFT-Convolution Theorem
  • Tradeoffs of Transformer vs. RNN vs. CNN Training & Inference Efficiency, and Inductive Biases
  • State Space Models Walkthrough
  • Limitations of State Space Models; Towards Input-Dependent Sequence Mixers
  • Week 7

    Nov 06
    SpeakerCe Zhang – Decentralized Training
    ETH Zurich,, University of Chicago
  • Training on Heterogeneous Compute
  • Communication Compression
  • RedPajama and Data Quality
  • Nov 08
    Lecture by AzaliaAdapting Large Language Models
  • Supervised Fine-tuning Loss
  • Reinforcement Learning Fundamentals
  • LLM Fine-tuning with RL
  • Reward Modeling
  • Week 8

    Nov 13
    SpeakerWilliam Fedus – Sparse Models, MoE
    Nov 15
    SpeakerChris Ré
    Stanford University

    Nov 20
    HolidayThanksgiving Break
    Nov 22
    HolidayThanksgiving Break

    Week 9

    Nov 27
    SpeakerTianqi Chen - TVM and ML Compilers
    Carnegie Mellon University
    • Whatch Tianqi's awesome talk here! YouTube
    Nov 29
    Lecture by AzaliaScheduling ML Clusters

    Week 10

    Dec 4
    Lecture by SimranData Pipelines
    • Weak Supervision for Data Labeling
    • Pretraining Data Selection Methods
    Dec 6
    Lecture by AzaliaResearch for ML Systems