nn-zero-to-hero - In-depth Course on Neural Networks and Language Models Development

Neural Networks: Zero to Hero

"Neural Networks: Zero to Hero" is a highly accessible online course designed to take learners from the fundamental principles of neural networks to more advanced concepts. This course unfolds through a series of engaging YouTube videos with corresponding Jupyter notebooks provided for hands-on coding exercises. The course structure encourages active participation, offering exercises in the video descriptions to enhance understanding.

Lecture 1: Introduction to Neural Networks and Backpropagation

The journey begins with an exploration of neural networks and backpropagation by developing a simple neural network framework called micrograd. This lecture assumes a basic understanding of Python and some calculus. The content focuses on how neural networks learn through backpropagation.

Resources:

Lecture 2: Introduction to Language Modeling

In this lecture, students implement a bigram character-level language model, which is a stepping stone toward developing complex models like Transformers. The session emphasizes using torch.Tensor and introduces the framework of language modeling, including model training, sampling, and loss evaluation.

Resources:

Lecture 3: Building a Multilayer Perceptron (MLP)

This session expands on the makemore project by implementing a multilayer perceptron (MLP) for language modeling. Learners are introduced to foundational machine learning concepts such as hyperparameters, under/overfitting, and evaluation methods.

Resources:

Lecture 4: Activations, Gradients, and Batch Normalization

In this lecture, students delve into the intricacies of multilayer perceptrons, examining forward pass activations and backward pass gradients. The lecture also introduces Batch Normalization, a technique that revolutionizes the training of deep neural networks.

Resources:

Lecture 5: Advanced Backpropagation Techniques

Building on previous knowledge, this lecture provides a deep dive into manual backpropagation in a neural network, enhancing intuition on the gradient flow through neural nets. This approach equips learners with a robust understanding of network optimization.

Resources:

Lecture 6: Building a Convolutional Neural Network

This session transforms the existing MLP into a convolutional neural network, comparable to DeepMind's WaveNet architecture. It provides insights into the architecture and functionality of deep learning frameworks like torch.nn.

Resources:
- Video Lecture
- Jupyter Notebook

Lecture 7: Creating a GPT Model

Participants construct a Generatively Pretrained Transformer (GPT), exploring advanced language models and their real-world applications, such as ChatGPT. Prior familiarity with language modeling and PyTorch is recommended.

Resources:
- Video Lecture

Lecture 8: Developing a GPT Tokenizer

This lecture focuses on the critical role of tokenizers in Large Language Models, connecting strings and tokens. The session covers tokenizer training techniques like Byte Pair Encoding and discusses their implications on language model behavior.

Resources:

The course is ongoing, with future lectures promising to further refine students' understanding of neural network and machine learning concepts within the realm of modern AI technologies.

License The course content is released under the MIT License, encouraging widespread use and adaptation.