--- title: Week 3 --- Oct 09 : **Lecture**{: .label .label-green }[Throughput, Latency, and Efficient Inference](#) - What are the different types of inference workloads? Understanding throughput and latency. - Reviewing Autoregressive Generation - Inference Efficiency Properties of RNNs vs. Transformers - Techniques for Efficient Inference: KV-caching, FlexGen, vLLM, etc. Oct 11 : **Lecture**{: .label .label-green }[Alternate Architectures to Transformers & Course Project Intro](#) - Reviewing the Efficiency Properties of Convolutions vs. RNNs vs. Transformers - Recent Subquadratic Models - Hardware Aware Algorithms for the Subquadratic Models - Course Project Introduction