dla - Comprehensive Course on Advanced Audio Deep Learning Techniques and Applications

Introduction to the Deep Learning for Audio (DLA) Project

The Deep Learning for Audio (DLA) project offers a comprehensive course focusing on various aspects of audio processing through deep learning. Conducted in the autumn of 2024 at the CS Faculty of the Higher School of Economics (HSE), this course offers a rich blend of theoretical and practical knowledge.

Course Structure

Each week of the course explores different areas of audio processing using deep learning techniques. These weeks are packed with lectures, seminars, self-study materials, and Q&A sessions.

Week 1: Introduction to the Course

Lecture: The initiation into the course where students learn about the overall structure and expectations.
Seminar: Basics of experiment tracking, understanding Hydra, using Git, and Visual Studio Code.
Self-Study: Students are introduced to PyTorch, a fundamental tool for the course.

Week 2: Digital Signal Processing

Lecture: Students explore signals, the Fourier Transform, spectrograms, and concepts like MelScale and MFCC.
Seminar: Hands-on experience with Digital Signal Processing (DSP) practices, creating spectrograms, and frequency filtering.

Week 3-4: Speech Recognition

Lectures: Covering metrics, datasets, Connectionist Temporal Classification (CTC), LAS, RNN-T, and language models.
Seminars: Practical applications like audio augmentations and hybrid model training.

Week 5: Advanced Speech Recognition and Audio Self-Supervision

Lecture: Delving into self-supervised models for audio and discussing long language models (Audio LLMs).

Week 6-7: Source Separation

Lectures: Discussing architectures for source separation and denoising with a focus on various models like Demucs, DCCRN, and others.
Seminars: Exploring practical aspects such as WienerFilter, streaming processing, and performance metrics.

Week 8: Audio-Visual Deep Learning

Lecture: Fusion of audio and visual data for enhanced applications like source separation and speech recognition.
Q&A Sessions: Covering project discussions, including creating intelligent voice assistants.

Projects and Assignments

The course includes practical homework and project opportunities:

Speech Recognition Model Training
Audio-Visual Speech Separation Model Training

Templates and detailed guidelines are provided to assist students in completing these tasks effectively.

Resources and Support

A dedicated YouTube playlist hosts lecture recordings, primarily in Russian, with some sections having English subtitles. The materials have been developed by a skilled team of contributors over several years, providing deep insights and practical knowledge in the field.

Contributors

The course material has been prepared and delivered by a team of experts, including Maxim Kaledin, Petr Grinberg, Grigory Fedorov, and several others.

Access to Past Versions

Students can also review the content from previous years (2020-2023) for a broader perspective and understanding of the evolution of deep learning applications in audio processing.

The DLA course is an extensive exploration of the integration of deep learning with audio, preparing students for innovative applications and research in this rapidly growing field.