gguf-tools - Versatile Library for Handling and Documenting GGUF Files in Machine Learning

Introduction to GGUF Tools

The GGUF tools project is a developing library aimed at handling GGUF files. These files are gaining prominence within the realm of local machine learning, serving as a central component for various models. The library's goal is not only to be functional but also to provide an accessible code base that effectively documents GGUF files. This documentation aspect is particularly significant as GGUF files are extensively used in the llama.cpp project by Georgi Gerganov.

GGUF tools capitalize on this library to perform an array of operations on GGUF files. These tools aim to showcase the library's utility in practical scenarios. Currently, the utility provides several subcommands, each serving a distinct purpose.

GGUF Tools Functionalities

Viewing GGUF File Information

The gguf-tools show file.gguf command offers detailed insights into the contents of a GGUF file. By executing this command, users can view all key-value pairs, including arrays and tensor information. Tensor offsets given are absolute, measured from the start of the file, which enhances the understanding of the file structure.

Example Output:

./gguf-tools show models/phi-2.Q8_0.gguf | head -20
models/phi-2.Q8_0.gguf (ver 3): 20 key-value pairs, 325 tensors
...

Comparing GGUF Files

One of the intriguing features is the gguf-tools compare file1.gguf file2.gguf command. It enables users to discern the relationship between two models, such as if one is a fine-tune of the other or if both share the same parent model. The tool evaluates this by comparing each matching tensor's average weight difference, expressed in percentage terms. This can be particularly beneficial in understanding modifications during fine-tuning processes.

Example Output:

./gguf-tools compare mistral-7b-instruct-v0.2.Q8_0.gguf \
                     solar-10.7b-instruct-v1.0-uncensored.Q8_0.gguf
[token_embd.weight]: avg weights difference: 44.539944%
...

Inspecting Tensor Weights

The command gguf-tools inspect-tensor file.gguf tensor.name [count] lets users scrutinize the weights of a specified tensor. If 'count' is given, only the first few weights are displayed. This function assists in examining quantization effects, measuring errors introduced, or performing model fingerprinting.

Extracting Models from Mixtral

Through gguf-tools split-mixtral, users can extract a model from Mixtral 7B MoE based on a specified MoE ID sequence. It's an exploratory command intended as an exercise in utilizing the library. While models generated may not perform significant tasks, this functionality exemplifies the library's potential.

GGUF Library API

Documentation primarily resides within the implementation. However, users can refer to gguf-tools.c for specific usage details. Although the library is actively evolving, its API remains straightforward and augmented with comments for ease of understanding.

Limitations and References

At present, the library does not support numerous quantization formats, indicating areas for further development. For additional details regarding file layout and meta-data, users can consult the Official GGUF Specification. Furthermore, information on quantization formats can be found here.

In summary, GGUF tools are an expanding suite designed to explore and manipulate GGUF files, underscoring the library's efforts to both serve practical needs and document essential methodologies within the machine learning community.