Introduction to Efficient Deep Learning Systems
The "Efficient Deep Learning Systems" project is a comprehensive educational program that caters to students and professionals interested in mastering the principles and techniques of efficient deep learning systems. This project offers a course that is jointly taught at the Faculty of Computer Science at HSE University and the Yandex School of Data Analysis. The course aims to equip learners with advanced skills for optimizing deep learning models, deploying applications, and managing complex computational tasks.
Course Structure and Topics
The course is structured over several weeks, each focusing on different aspects of deep learning systems.
-
Week 1: Introduction
- This initial week provides an overview of the course and introduces students to core concepts of GPU architecture and the CUDA API, crucial for efficient deep learning computations. Seminars allow practical exploration of CUDA operations using PyTorch and introduce benchmarking techniques.
-
Week 2: Experiment Management and Testing
- Students are introduced to experiment tracking, model and data versioning, and techniques for testing deep learning code in Python. Through lectures and seminars, learners engage with tools like DVC, Weights & Biases, and pytest for creating and managing rigorous experimental workflows.
-
Week 3: Training Optimizations and Profiling
- This week emphasizes optimizing training processes and profiling deep learning code. Mixed-precision training and techniques for efficient data storage and loading are covered, along with profiling tools like py-spy and nvprof to analyze and enhance model training.
-
Week 4: Basics of Distributed Machine Learning
- Learners explore distributed training methods, focusing on process-based communication and parameter server architectures. Practical seminars include multiprocessing basics and parallel GloVe training, offering hands-on experience in distributed computing.
-
Week 5: Data-Parallel Training and All-Reduce
- The focus shifts to data-parallel training of neural networks, exploring All-Reduce algorithms for efficient implementation. Seminars provide an introduction to PyTorch Distributed and data-parallel training primitives.
-
Week 6: Training Large Models
- As models become more complex, this week delves into model parallelism, gradient checkpointing, offloading, and sharding. Practical sessions focus on these techniques with hands-on training in gradient checkpointing and tensor parallelism.
-
Week 7: Python Web Application Deployment
- The course transitions to the deployment of deep learning models as web applications. Topics include building production-ready web services using technologies like Docker and Prometheus, and implementing APIs through HTTP and gRPC.
-
Week 8: LLM Inference Optimizations
- Students study the optimizations used in large language model (LLM) inference, focusing on techniques such as KV caching, batch inference, and FlashAttention. Practical work includes Triton programming and layer fusion.
-
Week 9: Efficient Model Inference
- This week targets efficient model inference strategies, discussing hardware utilization, knowledge distillation, quantization, and efficient model architectures. Practical seminars cover techniques like data-free quantization and GPTq in the PyTorch environment.
-
Week 10: Guest Lecture
- The course concludes with a guest lecture providing additional insights and advanced topics to enhance the learning experience.
Grading and Assignments
Students' performance is assessed through several homework assignments spread across the course duration. These assignments focus on key topics such as training pipeline design, code profiling, distributed training, and model deployment and optimization. The final grade is calculated as a weighted sum of scores from these assignments.
Course Contributors
The course is guided by a team of expert instructors, including Max Ryabinin, Just Heuristic, Alexander Markovich, Anton Chigin, and Ruslan Khaidurov, who bring extensive experience and knowledge in the field of deep learning systems.
Past Versions
For those interested in previous versions of the course, materials from the years 2021, 2022, and 2023 are available, providing a broader perspective on the evolution of the course content and teaching approach.