ggml - Cross-Platform Memory Efficient Tensor Library

Introduction to ggml

ggml is a cutting-edge tensor library designed to facilitate machine learning development. The library stands out due to its low-level cross-platform implementation and support for integer quantization, catering to a broad range of hardware. It is noteworthy that ggml is continually evolving, with ongoing development efforts visible in related repositories such as llama.cpp and whisper.cpp.

Key Features

Low-level Cross-platform Implementation: ggml provides a fundamental infrastructure compatible with various operating systems, ensuring flexibility for developers.
Integer Quantization Support: The library includes support for integer quantization, optimizing performance and efficiency in machine learning models.
Broad Hardware Support: Whether you're working on a personal computer or high-performance computing environments, ggml supports a vast array of hardware architectures.
Automatic Differentiation: Facilitate complex gradient-based optimization tasks in your machine learning endeavors with ggml's automatic differentiation feature.
Optimizers: ggml comes equipped with powerful optimization algorithms such as ADAM and L-BFGS, crucial for enhancing the performance of machine learning models.
No Third-party Dependencies: By not relying on external libraries, ggml reduces the potential for compatibility issues and simplifies the build process.
Zero Memory Allocations During Runtime: The efficient memory management in ggml ensures that there are no dynamic memory allocations during runtime, optimizing performance.

Building ggml

To get started with ggml, clone the repository from GitHub, set up a Python virtual environment, and install the necessary dependencies. Once these preliminary steps are completed, you can proceed to build the examples provided in the repository using CMake commands.

git clone https://github.com/ggerganov/ggml
cd ggml

# install python dependencies in a virtual environment
python3.10 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

# build the examples
mkdir build && cd build
cmake ..
cmake --build . --config Release -j 8

Running GPT inference

ggml comes with example programs to showcase its capabilities. For instance, you can execute the GPT-2 small 117M model with the following command:

../examples/gpt-2/download-ggml-model.sh 117M
./bin/gpt-2-backend -m models/gpt-2-117M/ggml-model.bin -p "This is an example"

Refer to the 'examples' folder in the ggml repository for more detailed instructions and additional examples.

Platform-specific Configurations

ggml offers flexibility in terms of specific platform configurations, allowing developers to harness the power of diverse environments including CUDA, hipBLAS, and SYCL:

Using CUDA

Set the path pointing to the CUDA compiler:

cmake -DGGML_CUDA=ON -DCMAKE_CUDA_COMPILER=/usr/local/cuda-12.1/bin/nvcc ..

Using hipBLAS

Configure for hipBLAS support with:

cmake -DCMAKE_C_COMPILER="$(hipconfig -l)/clang" -DCMAKE_CXX_COMPILER="$(hipconfig -l)/clang++" -DGGML_HIPBLAS=ON

Using SYCL

For SYCL, you must install the applicable environment and execute CMake with the corresponding settings. Examples are provided for both Linux and Windows systems.

Compiling for Android

Building ggml for Android involves downloading the NDK, configuring the appropriate environment variables, and transferring the compiled binaries and model files to your Android device:

# Create directories
adb shell 'mkdir /data/local/tmp/bin'
adb shell 'mkdir /data/local/tmp/models'

# Transfer the files
adb push bin/* /data/local/tmp/bin/
adb push src/libggml.so /data/local/tmp/
adb push models/gpt-2-117M/ggml-model.bin /data/local/tmp/models/

# Execute on Android
adb shell
cd /data/local/tmp
export LD_LIBRARY_PATH=/data/local/tmp
./bin/gpt-2-backend -m models/ggml-model.bin -p "this is an example"

Additional Resources

For those interested in delving deeper into ggml, several resources are available:

An Introduction to ggml providing a comprehensive overview.
Details on The GGUF File Format, which is integral to ggml's structure and functionality.

These resources offer valuable insights into harnessing ggml for efficient and effective machine learning development.