mlp - Improve Language Models with Multi-layer Perceptron

Multi-layer Perceptron as an N-gram Language Model

The MLP project is an exhilarating venture aiming to train a Multi-layer Perceptron (MLP) to function as an n-gram Language Model. This endeavor draws inspiration from the paper "A Neural Probabilistic Language Model," published by Bengio et al. in 2003. Through this project, several different methodologies converge to achieve identical outcomes, showcasing the versatility and robustness of this approach.

Diverse Implementations

The MLP is realized through multiple parallel implementations, each achieving the same end result yet executing the process through distinctly different methods:

C Version: This version meticulously explicates each operation required for training the MLP, offering clarity and insight into the functioning of the model.
Numpy Version: By incorporating the Array abstraction, this version groups operations into functions that act on arrays. Despite this level of abstraction, both the forward and backward passes must be manually executed, ensuring the user retains control over these processes.
PyTorch Version: Harnessing the power of the Autograd engine, the PyTorch version simplifies the process. The PyTorch Tensor is similar to the Array in the numpy version but brings with it the additional benefit of tracking the computational graph. The user specifies the forward pass while PyTorch meticulously computes the gradients when backward() is invoked on the loss.

Advantages Offered by PyTorch

PyTorch stands out for several significant reasons:

It provides a highly efficient Tensor object akin to numpy’s Array. Although some API details differ, PyTorch facilitates device management, allowing Tensors to be run on GPUs, which can drastically enhance computational speed.
The Autograd engine is a remarkable feature that consistently records the computational graph of Tensors, automatically computing the necessary gradients, simplifying the process for users.
PyTorch’s nn library is a treasure trove of pre-built layers and loss functions, minimizing the effort required to implement common deep learning tasks.

Achievements and Future Goals

The project progresses towards an impressive achievement—a notably lower validation loss than the standard observed in a traditional ngram model, despite using fewer parameters. This improvement, however, comes with a trade-off in increased computational demands during the training phase and somewhat during inference phase since the dataset essentially gets compressed into model parameters.

The roadmap for advancing this project includes:

Fine-tuning hyperparameters to improve the model’s performance. Currently, the validation loss is recorded at 2.06, an improvement over the count-based 4-gram model’s 2.11, but there remains room for optimization.
Unifying the three main implementations (C, numpy, PyTorch) to ensure they deliver consistent results across platforms.
Developing illustrative diagrams to enhance understanding of the module’s operations.

Conclusion

This MLP project is not merely about creating a functional n-gram Language Model; it's an exploration of the interplay between various computational strategies and the automation of complex processes through modern tools like PyTorch. The ongoing efforts to refine and enhance this project promise to contribute significantly to the field of deep learning, offering a nuanced balance between performance and computational efficiency.

License

The project is open-source under the MIT license, promoting wide accessibility and collaborative enhancement.