ASR_Theory - Speech Recognition Insights: Theories, Practices, and Innovative Applications

Introduction to ASR_Theory

The ASR_Theory project serves as a comprehensive collection of summaries and insights based on the learnings from the initial stage of graduate studies. This repository includes both theoretical knowledge and practical experiences, encapsulated through various academic papers and personal perspectives.

Transition and Updates

Please note that this repository will no longer receive updates. For ongoing developments and future research results, individuals are encouraged to visit the Meta-Speech website or join the official WeChat group. Those interested in becoming part of the group are advised to communicate through issues within the repository.

Personal Blog for Summary Insights

For those keen on more personalized insights, the creator maintains a personal blog on CSDN. This serves as a platform to share recent learnings and reflections on the subject matter.

PPT and Papers: A Resourceful Collection

The collection of PPT slides and academic papers is one of the highlights of this repository. It includes resources developed during the initial year of using the Kaldi speech recognition toolkit to build acoustic models such as GMM-HMM and NN-HMM. The repository's paper collection consists of various scholarly articles reviewed from the first year of study onward. Scholars and enthusiasts who have a keen interest in theory are encouraged to follow updates, as papers will continue to be added periodically.

Insights from INTERSPEECH Google Presentation

One notable inclusion is the Google INTERSPEECH presentation from 2018. This PPT is praised for its comprehensive and well-structured content, making it a valuable learning resource for those interested in deep learning applications in speech recognition.

Deep Learning Summary

The repository also features an image offering a summary of recent networks in deep learning, signifying the creator's understanding and synthesis of complex information within the field. Additionally, there are three related projects available on GitHub, each focusing on building acoustic models using different units of speech:

ASR_Syllable: Constructs speech recognition acoustic models with a focus on syllables as modeling units.
ASR_WORD: Focuses on word-level units to develop speech recognition models.
ASR_Phone: Uses phoneme-level units for constructing speech recognition acoustic models.

These resources collectively offer a firm foundation for students, researchers, and practitioners interested in advancing their knowledge and skills in automatic speech recognition and deep learning technologies.