#model deployment
pytorch-lightning
This framework facilitates AI model workflows by providing streamlined pretraining, finetuning, and deployment for flexible and scalable use, especially among PyTorch users. It integrates smoothly with LitServe for model serving, ensuring efficient workflow management. Designed with practicality in mind, it effortlessly handles classification, segmentation, and summarization tasks. Supported by comprehensive documentation and community-driven examples, it serves as a vital resource for various deep learning projects.
phi3-Chinese
The phi3-Chinese project presents a collection of varied training versions of the phi3 model, notable for its compact size, which facilitates mobile deployment. The repository includes models such as Phi-3-mini-128k-instruct-Chinese, along with training, inference, and deployment tutorials. While there are discrepancies between its claimed and actual performance, its potential remains strong for lightweight applications. Improvements in tokenization may further enhance its efficiency. Discover diverse model variants to explore unique characteristics and weights.
ortex
Ortex, a wrapper for ONNX Runtime, enhances the deployment of ONNX models by supporting concurrent and distributed execution with Nx.Serving. This tool caters to various backends, including CUDA and Core ML, for efficient inference and easy model handling. Designed for models exported from PyTorch and TensorFlow, it offers a storage-only tensor implementation suitable for integration within Elixir applications. Installation involves adding Ortex to dependencies in mix.exs, with Rust required for compilation.
llama3-Chinese-chat
The project presents llama3 in Chinese, aimed at fostering Chinese LLM learning and collaboration. It highlights community contributions to enhance PRs, develop datasets, and improve model features. The project also offers tutorials on model localization and deployment utilizing tools such as LMStudio. Featuring llama3.1 Chinese DPO and comprehensive documentation, it invites user participation in model testing and content expansion, providing an interactive platform for AI practitioners.
autotrain-advanced
AutoTrain Advanced offers an intuitive no-code platform for fast training and deployment of advanced machine learning models, making it accessible to users in just a few steps if the data format is correct. It supports both Colab and Hugging Face Spaces execution, with costs only for used resources. Local installations need Python 3.10 and compatible packages within a conda environment for optimal performance. Users can choose between a graphical interface and command line for flexible workflows, backed by extensive documentation for support.
machine-learning
This repository provides an extensive overview of data science and machine learning with Python, featuring Jupyter Notebook guides on key topics like deep learning, model deployment, and reinforcement learning. It covers essential areas such as neural networks, time series analysis, and A/B testing, utilizing libraries like scikit-learn and TensorFlow for practical and theoretical learning needs. Delve into detailed tutorials on Fasttext, Graph Neural Networks, and Transformers to enhance machine learning expertise.
fastllm
Fastllm is a pure C++ library requiring no third-party dependencies, ensuring high-performance inference across platforms like ARM, X86, and NVIDIA. It supports Hugging Face model quantization and OpenAI API server setups, aiding multi-GPU and CPU deployments with dynamic batching. Featuring a front-end and back-end separation for better device compatibility, it integrates with models such as ChatGLM and LLAMA. Python support also allows custom model structures with extensive documentation for straightforward setup and use.
serving
TensorFlow Serving provides a stable and scalable platform for deploying machine learning models in production environments. It integrates effortlessly with TensorFlow while accommodating different model types and supporting simultaneous operation of multiple model versions. Notable features include gRPC and HTTP inference endpoints, seamless model version updates without client-side code alterations, low latency inference, and efficient GPU batch request handling. This makes it well-suited for environments seeking effective model lifecycle management and version control, enhancing machine learning infrastructures with adaptable and reliable functionalities.
mmdeploy
MMDeploy is an open-source toolset aimed at efficiently deploying deep learning models. It supports over 2,300 AI models across key frameworks like ONNX, NCNN, TRT, and OpenVINO. Compatible with a wide range of hardware, it facilitates the conversion of Torch models to various formats. Designed for the OpenMMLab ecosystem, it integrates with models from multiple codebases such as mmdet, mmseg, and mmocr. The platform includes a C/C++ SDK for extensive customization and supports multiple inference backends on operating systems like Linux, Windows, macOS, and Android, optimizing deployment efficiency.
yolort
This project combines training and inference for object detection using a dynamic shape strategy, based on the YOLOv5 model framework. It incorporates pre-processing and post-processing directly into the model graph, thereby facilitating deployment on platforms such as LibTorch, ONNX Runtime, TVM, and TensorRT. The design takes cues from Ultralytics's YOLOv5, ensuring familiarity for those used to torchvision's models. Recent enhancements include TensorRT C++ interface integration and expanded ONNX Runtime support. The project offers simple installation via PyPI or source with minimal dependencies, enhancing the efficiency of both Python and C++ deployment.
torchchat
torchchat facilitates smooth operation of large language models across platforms such as desktop, server, iOS, and Android. It features multimodal capabilities with the Llama3.2 11B model and integrates smoothly with PyTorch, supporting different execution modes like eager and AOT Inductor. Key features include interaction with well-known LLMs, hardware and OS compatibility, and versatile quantization and execution schemes.
executorch
ExecuTorch facilitates efficient on-device AI inference by streamlining the deployment of PyTorch models to diverse mobile and edge platforms, from smartphones to microcontrollers. As a component of the PyTorch Edge ecosystem, it ensures cross-platform compatibility and enhances developer productivity by integrating familiar PyTorch tools for model development and deployment. Users benefit from optimized performance through a lightweight runtime that fully utilizes hardware capabilities such as CPUs, NPUs, and DSPs. Access comprehensive documentation and model adaptation examples on the ExecuTorch platform for enhanced understanding and application.
llama2.c
Llama 2 Everywhere (L2E) provides a portable AI OS compatible with minimal hardware, fostering wider AI accessibility through compact, specialized models. It supports device compatibility and initiates AI models in low-resource areas, including schools without internet. L2E contributes to global AI decentralization with a flexible ecosystem. Integrating advancements from the llama2.c project, it improves portability and performance, seamlessly connecting AI with robotics and IoT environments. Explore its unique capabilities in training and deployment.
Feedback Email: [email protected]