WebGPT - Exploring GPU Integration in Web Environments for Educational Purposes

Introduction to WebGPT

WebGPT is an innovative project that represents a significant leap forward in web application technology. Developed over six years, it is set to revolutionize how web browsers interact with graphic processing units (GPUs). By granting web applications near-native access to the GPU, WebGPT allows for advanced computing capabilities through the use of compute shaders.

What is WebGPT?

WebGPT is a transformer model implemented using vanilla JavaScript and HTML. Its primary purpose is as a proof-of-concept and an educational tool, demonstrating the potential of web GPUs. The project has successfully tested models with up to 500 million parameters and anticipates supporting even larger models with further optimization.

Current Performance Metrics

WebGPT has been tested on a 2020 M1 Mac with varying performance times depending on the model size and precision:

For models with 5 million parameters, it takes approximately 3 milliseconds per token.
With models scaling up to 117 million parameters, the time increases to around 30 milliseconds per token.
Larger models, comprising 377 million parameters, take about 70 milliseconds per token.
Models with 775 million parameters register approximately 120 milliseconds per token.
A model with 1.5 billion parameters is functional but currently unstable, with a time of around 1000 milliseconds per token due to certain inefficiencies.

How to Run WebGPT

Running WebGPT is straightforward, as it only requires HTML and JavaScript files. However, due to WebGPU's gradual release, it must be operated using a compatible browser. As of now, it is available on Chrome version 113, but the easiest way to ensure compatibility is by using Chrome Canary or Edge Canary browsers.

The project provides two different models: a preliminary GPT-Shakespeare model and a more developed GPT-2 with 117 million parameters. For more technical guidance on running these models or importing custom models, refer to the main.js file or use the scripts available for conversion in the misc folder.

Test out WebGPT on the demo website KMeans.org, though it is recommended to clone the repository and run it locally to mitigate slow remote model weight loading. For cloning, remember that Git LFS is necessary to download the model files.

Development Roadmap

The developers of WebGPT have charted an extensive roadmap to enhance its functionality and performance. Accomplished tasks include GPU-based embeddings/de-embeddings, pipeline reinitialization streamlining, buffer reuse, and kernel optimizations. Future goals are focused on GPU operation optimization, such as attention kernel enhancements for larger models and improving speed-up through attention caching. The team also plans to transform WebGPT into a user-friendly package and explore additional instructional content.

Acknowledgments

The creation of WebGPT was enriched by invaluable resources and insights from various experts. Notably, Andrej Karpathy's YouTube series on neural networks and constructing GPT models from scratch offered considerable guidance. Additional inspiration and code support came from the nanoGPT repository and LatitudeGames' JavaScript implementation of OpenAI's GPT-3 tokenizer.

WebGPT marks a milestone in web technology, paving the way for future advancements in web-based computational capability and accessibility.