axodox-machinelearning - Comprehensive C++ Solution for Stable Diffusion Image Generation

Overview of the axodox-machinelearning Project

The axodox-machinelearning project offers a comprehensive, C++-based implementation of Stable Diffusion, a cutting-edge technique for image synthesis. This means the project can generate images using text prompts, perform image-to-image generation, and even handle inpainting—all without relying on Python. The solution operates efficiently within a single process, resulting in streamlined deployments that are smaller and simpler, comprised mainly of a few executables, library files, and model weights. The beauty of this library lies in its adaptability, allowing integration of Stable Diffusion into almost any application—specifically favored by developers of realtime graphics applications and games often built using C++.

ControlNet Support

The axodox-machinelearning library supports ControlNet, which enhances the image generation process by using input images as guidance. It improves outcomes by directing the generation towards specific features of the input. For instance, by employing an OpenPose estimator, creators can guide image generation by defining the desired pose, thus achieving superior results. With HED edge detection and ControlNet integration, the image's style can be transformed to mimic a comic book while preserving its original layout. Similarly, a depth-based ControlNet can be employed to retain the setup while altering the character depicted.

Feature Extractors

This library excels with GPU-accelerated feature extractors for enhanced processing capability. It includes:

Pose Estimation: Extracts human skeletons from images using OpenPose.
Depth Estimation: Computes depth for each pixel from a single image using MiDAS.
Edge Detection: Identifies edges within an image using Holistically-Nested Edge Detection.

Code Examples

For developers eager to dive into practical applications, some simple code examples are available:

Reference Models

The library relies on AI models formatted in ONNX, optimized for DirectML via Microsoft Olive. Models ready for use with this library include:

Stable Diffusion 1.5 with ControlNet support
Realistic Vision 1.4 with ControlNet support
ControlNet with various feature extractors

Users can also bring their own models, following guidance on converting them appropriately.

Technical Background

The axodox-machinelearning project leverages the ONNX format for model storage, executed using ONNX Runtime, compatible with various platforms (like Windows and Linux) and execution providers (like NVIDIA CUDA / TensorRT). The library’s functionality is showcased in an integration example called Unpaint, useful for evaluating performance through a WinUI-based interface. While mainly targeting Windows with DirectML, the code can be ported to other platforms with minimal adjustments.

Licensing

The source code for the axodox-machinelearning library is available under the MIT license, granting flexibility for various applications.

Integration Guidelines

Developers can access prebuilt versions through Nuget under the Axodox.MachineLearning name, supporting both desktop and UWP projects on the x64 platform. Essential steps for integration include adding appropriate packages, ensuring C++20 compilation, and including certain headers in your code.

Building the Project

To make and test modifications to the axodox-machinelearning library, developers need Visual Studio 2022 with specific workloads for C++ development. Post-building, they can configure their environment to override existing package installations with local project builds, allowing seamless updates and changes.

In summary, the axodox-machinelearning project is a robust, efficient tool for developers interested in integrating advanced image synthesis capabilities into C++ applications, particularly those in the realm of real-time graphics and game development.