Shortcuts | Project
Course information
-
Instructor: Minlan Yu.
-
Semester: Fall 2024.
-
Website: https://github.com/minlanyu/cs243-site/tree/fall2024
-
Outline: Data parallelism and sharding, model parallelism and pipelining, parameter server and all-reduce, collective communication optimizations; LLM training, LLM serving, throughput-latency tradeoffs, distributed serving; NCCL as a service, flow scheduling, RDMA, congestion control, ethics; Checkpointing, fault tolerance, diagnosis; Data ingestion, LLM training in production, TPU, sustainable AI.
-
Technologies: Python, C++/CUDA, PyTorch, vLLM, Amazon Web Services (AWS).
Project
Our final project was about integration of KV cache sparsification in vLLM in a performant and memory-efficient way. Details of the project can be found here.