#training
seq2seq-couplet
This project utilizes a seq2seq model to generate Chinese couplets using Tensorflow. It offers a demo and requires Python 3.6 and a dataset. The model can be trained via 'couplet.py', with metrics like loss and BLEU score tracked on Tensorboard. For continuous training, sessions can be resumed effortlessly. Additionally, the model can be deployed as a web service via 'server.py' or Docker. Example couplets include '天朗气清风和畅' paired with '云蒸霞蔚日光辉'. Suitable for those interested in NLP and language generation.
VoiceFlow-TTS
VoiceFlow uses rectified flow matching to improve the efficiency and quality of text-to-speech synthesis. This ICASSP 2024 paper offers a detailed implementation guide covering environment setup, data preparation, training, and inference. The project advances flow matching and employs rectified flows to enhance performance and accuracy. The repository provides utility scripts and model configurations, allowing for customization across various datasets. It also presents experimental functions such as voice conversion and likelihood estimation, broadening the capabilities of flow matching in speech synthesis. Aimed at developers looking for efficient TTS solutions.
thorough-pytorch
Understand PyTorch through a well-organized course that guides both newcomers and experienced users. The curriculum spans fundamental to advanced PyTorch subjects, such as key modules, model deployment, and deep learning operations. Participants enhance programming skills and effectively use PyTorch in practical scenarios. Involve in practice sessions, join collaborative learning, and utilize additional video tutorials, enabling a deeper grasp of PyTorch's capabilities.
multimodal
TorchMultimodal is a PyTorch library designed for comprehensive multimodal multi-task model training. It provides modular fusion layers, adaptable datasets, and pretrained model classes while enabling integration with elements from the PyTorch framework. The library includes numerous examples for training, fine-tuning, and evaluating models on various multimodal tasks. Models such as ALBEF, BLIP-2, CLIP, and DALL-E 2 facilitate the replication of state-of-the-art research, providing a valuable resource for researchers and developers aiming to advance in multimodal model training.
llm.c
llm.c enables efficient pretraining of GPT-2 and GPT-3 in plain C/CUDA, circumventing large frameworks such as PyTorch. The project is developed collaboratively, highlighting both educational and practical perspectives for large model training, and supports further language adaptations, making it suitable for a diverse range of deep learning practitioners.
LMFlow
Offers an inclusive toolbox for efficient finetuning of large-scale machine learning models, accessible to the community while supporting diverse optimizers, conversation templates such as Llama-3 and Phi-3, and advanced techniques like speculative decoding and LISA for memory-efficient training. Recognized with the Best Demo Paper Award at NAACL 2024, it provides essential tools for chatbot deployment and model evaluation, suited for professionals aiming to enhance and deploy large models effectively in an objective and unbiased manner.
bigscience
This workshop explores large language models with Megatron-GPT2 architecture through detailed trainings and experiments. It addresses model scaling, training dynamics, and instabilities, supported by extensive documentation and logs. Providing resources like code repositories and training scripts, the project fosters transparency and collaboration within the AI community, guiding toward future advancements in language models.
parler-tts
Parler-TTS is an open-source model for generating high-quality text-to-speech in different speaker styles. It provides complete access to datasets, training codes, and model weights under permissive licenses. The model supports rapid synthesis and is trained on extensive audiobook data, making it a suitable framework for researchers and developers. Parler-TTS allows for the customization of speech features through simple text prompts.
OLMo
OLMo serves as a resource for open language models, developed for scientific purposes by AI2. It provides detailed instructions for PyTorch-based setup and offers models like OLMo 1B and 7B, trained on the Dolma dataset. Users can access checkpoints for model training and inferencing, facilitated by integration with Hugging Face. This repository aims to support clear and open research in language modeling.
unified-io-2
Unified-IO 2 offers advanced solutions in multimodal AI by integrating vision, language, audio, and action into one versatile toolset. It includes demo, training, and inference capabilities. Recent updates feature Pytorch code for improved audio processing and VIT-VQGAN integration, supporting complex datasets with robust pre-processing. Designed for both TPU and GPU use, it facilitates efficient training and evaluation with JAX. With T5X architecture, it provides clear data visualization and effective model optimization for specific tasks. Unified-IO 2 stands at the forefront of autoregressive model research, contributing significantly to AI advancement.
onnxruntime
ONNX Runtime optimizes machine learning by accelerating inference and training across platforms. It supports models from frameworks like PyTorch and TensorFlow, and systems like scikit-learn and XGBoost, focusing on hardware optimization. By using multi-node NVIDIA GPUs, it notably reduces training time with minimal changes to PyTorch scripts. With compatibility across various operating systems, ONNX Runtime efficiently enhances performance while cutting costs. Access resources for deeper insights.
Fast-SRGAN
This project provides an efficient solution for real-time super resolution of low-resolution videos utilizing the SR-GAN inspired architecture and pixel shuffle technique. Capable of processing videos up to 720p at 30fps on MPS devices, it offers a pretrained model for image inference and detailed instructions for custom training with editable CLI configurations. The repository welcomes contributions for model enhancement and feature expansion, facilitating advancements in video quality through established machine learning methodologies.
bayesian-flow-networks
Discover Bayesian Flow Networks designed for effective modeling of continuous and discrete data. This project offers flexible loss functions, versatile probability models, and essential scripts for tasks such as training and testing. Experience practical experiments including MNIST and CIFAR-10, enhanced by PyTorch integration. It ensures reproducibility and offers integration options for advanced data analysis, benefiting researchers and practitioners in data-centric domains.
PickScore
The Pick-a-Pic project offers open-source datasets and a model to explore text-to-image user preferences. Available datasets include over a million examples in v2 and the original v1, along with the PickScore model. The repository includes a web application, installation instructions, and guides for inference, training, evaluation, and dataset download. A demo is available on HF Spaces, facilitating advanced AI research.
tensorflow-yolov3
This project delivers an implementation of YOLOv3 using TensorFlow 2.0, ensuring compatibility and improvements over older versions. It facilitates rapid deployment with pre-trained models, supports training with custom datasets, and offers starting options with COCO weights. The well-documented scripts and guides make it accessible for both hobbyists and professionals interested in exploring YOLOv3's potential in TensorFlow.
chatglm_finetuning
This project enhances ChatGLM models by offering diverse tuning options with integrations for PyTorch Lightning, ColossalAI, and Transformer trainers. It includes guidance for LoRA and other fine-tuning methods, installation instructions, data scripts, and continual updates for improved model application.
AnyDoor
This project presents an innovative approach to zero-shot object-level image customization, allowing image personalization without large datasets. Key features include availability of training and inference code, online demo support on platforms such as ModelScope and Hugging Face, and development of robust models for applications like virtual try-on and face swapping. The installation is facilitated via Conda or Pip, utilizing the ControlNet framework, with community contributions enhancing its capabilities. It targets simplifying intricate image generation tasks, providing a vital tool for contemporary image processing.
ao
Torchao provides effective solutions for PyTorch users to optimize inference and training through quantization and sparsity, enhancing model efficiency. It enables significant speed and memory improvements with weight and activation quantization. For training, it introduces Float8 data types and sparse training, ensuring resource efficiency. Its compatibility with PyTorch's `torch.compile()` and FSDP2 facilitates integration into existing workflows while supporting custom kernel development and experimental features. Suitable for researchers and developers looking to enhance performance while maintaining accuracy.
Feedback Email: [email protected]