#pre-trained models

Logo of gpt-neo
gpt-neo
This open-source project provides an advanced framework for developing large language models similar to GPT-3, utilizing model and data parallelism with the mesh-tensorflow library. It supports both TPU and GPU environments, featuring distinct capabilities such as local and linear attention, and Mixture of Experts, which set it apart in the AI landscape. Although active code development ceased in August 2021, the repository continues to be a valuable resource for enthusiasts and professionals interested in AI model training. The project's integration with HuggingFace Transformers allows for simplified model experimentation, catering to both beginner and advanced users. Additionally, the transition to a GPU-focused repository, GPT-NeoX, highlights its adaptability to the evolving hardware landscape, further driven by community contributions and open-source collaboration.
Logo of CPM
CPM
Discover an innovative makeup transfer framework excelling in both color and pattern applications. The CPM model integrates an enhanced color branch and an original pattern branch, and offers four new datasets for thorough training and evaluation. Access pre-trained models, follow installation guidelines, and run the framework as detailed in usage instructions. Explore detailed qualitative comparison results online.
Logo of keras-nlp
keras-nlp
KerasHub provides a comprehensive library that supports natural language processing, computer vision, audio, and multimodal models on TensorFlow, JAX, and PyTorch. Developed using Keras 3, it features a rich collection of pre-trained models and foundational components suitable for diverse applications. The library ensures consistent model definitions across frameworks, facilitating straightforward fine-tuning on both GPUs and TPUs. KerasHub enhances performance with support for model and data parallel training, offering seamless model migrations without additional costs across platforms.
Logo of MetaCLIP
MetaCLIP
This project presents an innovative method for curating CLIP data that prioritizes data quality over quantity. It features a transparent and scalable approach to data curation, managing over 300B image-text pairs from CommonCrawl without needing prior models. By focusing on signal preservation and noise reduction, it offers improved data quality compared to other open-source initiatives. MetaCLIP integrates OpenAI CLIP's training framework for precise and unbiased model comparisons and includes metadata and training data distribution details for a complete understanding of pretraining datasets, catering to those aiming to enhance their data pipeline comprehensively.
Logo of RNA-FM
RNA-FM
RNA-FM utilizes a foundation model pre-trained on diverse RNA sequences to improve RNA structure and function predictions. The project outperforms current RNA language models in accuracy. The addition of mRNA-FM broadens its scope to coding sequences, aiding protein and mRNA research. Comprehensive tutorials and pre-trained models make it accessible, enabling the generation of contextual embeddings to address complex RNA tasks. Discover our guides and resources to fully leverage RNA-FM for scientific progress.
Logo of PromptPapers
PromptPapers
Discover an open-source toolkit for prompt-based learning focused on improving training procedures and unifying tasks. This resource aids researchers in enhancing pre-trained language models through collaborative efforts, offering valuable insights into prompt-learning methods. The project invites contributions via pull requests, maintaining an up-to-date repository of significant papers. A useful resource for exploring efficient model adaptations and advancements in language processing technology.
Logo of Medical_NLP
Medical_NLP
A detailed repository of medical NLP resources including evaluations, competitions, datasets, papers, and pre-trained models, maintained by third-party contributors. It features Chinese and English benchmarks like CMB, CMExam, and PromptCBLUE, highlights ongoing and past events such as BioNLP Workshop and MedVidQA, and catalogs diverse datasets like Huatuo-26M and MedMentions. The repository also provides access to open-source models like BioBERT and BlueBERT, and large language models including ApolloMoE, catering to researchers in the medical NLP sphere.
Logo of ComfyUI-AnimateAnyone-Evolved
ComfyUI-AnimateAnyone-Evolved
This project provides a refined solution for converting image sequences into stylized videos, optimized for GPUs comparable to RTX 3080. It utilizes various samplers and schedulers like DDIM, DPM++ 2M Karras, LCM, and Euler for efficient video generation up to 120+ frames. The integration with ComfyUI ensures a modular workflow. Future enhancements focus on accelerating processing speeds through pre-trained models and techniques like RCFG and stable-fast conversion.
Logo of openWakeWord
openWakeWord
OpenWakeWord is a flexible wakeword library focusing on enhancing voice-enabled applications with pre-trained models for commonly used words and phrases. Key features include up-to-date installation guidance and practical insights on usage. The library supports noise suppression and voice activity detection (VAD) to improve accuracy and is adaptable for platforms like Linux and Windows. By prioritizing straightforward integration and efficient real-time detection, it caters to various voice recognition needs, allowing easy model training using synthetic speech data.
Logo of Latte
Latte
The project presents an innovative approach to video generation using Latent Diffusion Transformers with PyTorch. It utilizes spatio-temporal token extraction and Transformer blocks for modeling video distribution in latent spaces, improving video quality on datasets such as FaceForensics and Taichi-HD. Including efficient model variants and extensions for text-to-video generation, the project achieves advanced performance benchmarks. The integration into diffusers also lowers GPU demands, facilitating access to efficient video creation infrastructures.
Logo of DNABERT
DNABERT
DNABERT employs pre-trained encoders to enhance DNA sequence analysis, offering extensive resources like source codes and visualization tools. An extension of Hugging Face's transformers for genomic DNA, DNABERT is continually updated, featuring DNABERT-2 for multi-species genomes. It supports general and task-specific fine-tuning, offering efficiency and ease of use for researchers employing NVIDIA GPUs on Linux, ultimately facilitating advanced genomic insights.
Logo of OpenDelta
OpenDelta
OpenDelta offers a versatile framework for parameter-efficient tuning, referred to as delta tuning. It allows tweaking selective parameters while keeping most unchanged, with support for methods like prefix-tuning and LoRA applicable to diverse pre-trained models. It is compatible with Python 3.8.13 and PyTorch 1.12.1, but adaptable to other versions. Recent updates enhance documentation and BMTrain integration, providing tools to inspect model modifications.
Logo of Coloring-greyscale-images
Coloring-greyscale-images
This open-source project leverages neural networks to turn grayscale photos into color images, featuring step-by-step tutorials from basic neural models to complex GAN architectures. With insights into color space conversion, this project also explores efficient image resolutions and pretrained model optimizations, offering developers and researchers a comprehensive resource for mastering AI-driven image colorization.
Logo of overeasy
overeasy
Overeasy enables the creation of custom computer vision solutions with zero-shot models, supporting tasks like bounding box detection, classification, and segmentation without extensive datasets. The tool offers easy installation and features robust agents and execution graphs to facilitate the management and visualization of image processing workflows.
Logo of bert4torch
bert4torch
This open-source project supports a variety of tasks including loading and fine-tuning large language models like chatglm, llama, and baichuan. It simplifies deployment with a single command and features models such as BERT, RoBERTa, ALBERT, and GPT for flexible finetuning. Extensive practical examples are provided, validated on public datasets. The project offers intuitive tools that incorporate effective techniques, allowing model loading from the transformers library and efficient process monitoring. Initially developed with 'torch==1.10', it now accommodates 'torch2.0', making it a versatile resource for developers seeking flexibility and ease in model training and deployment.
Logo of torchgeo
torchgeo
TorchGeo provides specialized resources including datasets, samplers, and models for geospatial data processing. It bridges the gap between machine learning and geospatial data, facilitating tasks such as image classification and semantic segmentation. Integrated with PyTorch, it allows seamless incorporation into workflows, while its support for pre-trained weights accommodates multispectral imaging needs. Compatibility with tools like Lightning enhances its use in reproducible and experimental workflows, positioning TorchGeo as an essential tool for geospatial deep learning.
Logo of VADER
VADER
VADER improves video-text alignment and video creation efficiency using discriminative-based, pre-trained reward models, overcoming the need for large datasets. This method not only extends video generation to three times the sequence length but also surpasses non-gradient methods. It optimizes video models such as VideoCrafter2, Open-Sora V1.2, and ModelScope for streamlined production.
Logo of assets
assets
This repository provides a complete suite of visual assets, pre-trained models, and curated datasets that integrate smoothly with the Ultralytics YOLO ecosystem. It offers essential tools for object detection, image classification, among others, suitable for both personal and commercial applications. Users can easily download pre-trained models to perform inference with minimal effort. Additionally, it features a wide range of visual assets and datasets to facilitate diverse machine learning projects. With comprehensive documentation and varied licensing options, the repository is designed to support both hobbyists and professionals in advancing their computer vision capabilities.