Introduction to the Deep Learning for Audio (DLA) Project
The Deep Learning for Audio (DLA) project offers a comprehensive course focusing on various aspects of audio processing through deep learning. Conducted in the autumn of 2024 at the CS Faculty of the Higher School of Economics (HSE), this course offers a rich blend of theoretical and practical knowledge.
Course Structure
Each week of the course explores different areas of audio processing using deep learning techniques. These weeks are packed with lectures, seminars, self-study materials, and Q&A sessions.
Week 1: Introduction to the Course
- Lecture: The initiation into the course where students learn about the overall structure and expectations.
- Seminar: Basics of experiment tracking, understanding
Hydra
, usingGit
, andVisual Studio Code
. - Self-Study: Students are introduced to
PyTorch
, a fundamental tool for the course.
Week 2: Digital Signal Processing
- Lecture: Students explore signals, the Fourier Transform, spectrograms, and concepts like MelScale and MFCC.
- Seminar: Hands-on experience with Digital Signal Processing (DSP) practices, creating spectrograms, and frequency filtering.
Week 3-4: Speech Recognition
- Lectures: Covering metrics, datasets, Connectionist Temporal Classification (CTC), LAS, RNN-T, and language models.
- Seminars: Practical applications like audio augmentations and hybrid model training.
Week 5: Advanced Speech Recognition and Audio Self-Supervision
- Lecture: Delving into self-supervised models for audio and discussing long language models (Audio LLMs).
Week 6-7: Source Separation
- Lectures: Discussing architectures for source separation and denoising with a focus on various models like Demucs, DCCRN, and others.
- Seminars: Exploring practical aspects such as WienerFilter, streaming processing, and performance metrics.
Week 8: Audio-Visual Deep Learning
- Lecture: Fusion of audio and visual data for enhanced applications like source separation and speech recognition.
- Q&A Sessions: Covering project discussions, including creating intelligent voice assistants.
Projects and Assignments
The course includes practical homework and project opportunities:
- Speech Recognition Model Training
- Audio-Visual Speech Separation Model Training
Templates and detailed guidelines are provided to assist students in completing these tasks effectively.
Resources and Support
A dedicated YouTube playlist hosts lecture recordings, primarily in Russian, with some sections having English subtitles. The materials have been developed by a skilled team of contributors over several years, providing deep insights and practical knowledge in the field.
Contributors
The course material has been prepared and delivered by a team of experts, including Maxim Kaledin, Petr Grinberg, Grigory Fedorov, and several others.
Access to Past Versions
Students can also review the content from previous years (2020-2023) for a broader perspective and understanding of the evolution of deep learning applications in audio processing.
The DLA course is an extensive exploration of the integration of deep learning with audio, preparing students for innovative applications and research in this rapidly growing field.