#pretrained models
xlnet
XLNet implements a novel generalized permutation language modeling for advanced unsupervised language representation learning. Utilizing Transformer-XL, XLNet effectively handles long-context language tasks, achieving top-tier results in areas like question answering, sentiment analysis, and document ranking. Displaying enhanced performance over BERT on numerous benchmarks, XLNet is a robust choice for text classification and reading comprehension tasks. Available pre-trained models and flexible fine-tuning options cater to a vast range of NLP needs.
awesome-recommend-system-pretraining-papers
This paper list investigates the latest advancements in pretrained recommendation models, highlighting large language models and novel methods in sequence representation and user modeling. It provides a comprehensive overview with various datasets and studies, encouraging community collaboration through open contributions. Read up on significant studies presented at conferences like SIGIR, CIKM, and WSDM, and explore innovative techniques including graph pretraining and generative recommendations, under the guidance of Xiangyang Li from Peking University.
FollowYourEmoji
FollowYourEmoji introduces a diffusion-based framework for precision in portrait animation using target landmark sequences, accepted by Siggraph Asia 2024. This tool offers nuanced control for creating realistic, smooth animated portraits. Leveraging models like AnimateDiff and tools including MediaPipe, it caters to both creators and developers seeking sophisticated animation solutions. Access pretrained models and comprehensive guides to begin animating dynamic portraits effortlessly.
snac
SNAC is an innovative audio codec that compresses audio into hierarchical tokens for low-bit-rate efficiency. Its applications greatly benefit language models in audio generation. SNAC provides pre-trained models for mono audio across various bitrates and sample rates, suitable for both music and speech. It economizes on bitrate by sampling coarse tokens less frequently, making it ideal for modeling audio structures up to 3 minutes long. Implement SNAC in Python to explore practical encoding and decoding solutions.
chronos-forecasting
Discover Chronos' use of language model architectures to improve time series forecasting precision and efficiency. Transform time series data to token sequences for powerful probabilistic predictions. Chronos integrates with AutoGluon, facilitating deployment and advanced analytics without exaggerated claims. Benefit from optimized inference and comprehensive datasets available on HuggingFace, underpinned by Amazon's innovative machine learning research. Ideal for researchers and analysts seeking improved forecasting tools.
transformers
Access a wide range of pretrained transformer models suitable for various applications in text, vision, and audio, with easy integration using JAX, PyTorch, and TensorFlow. The Transformers library by Hugging Face offers tools for deploying and refining these models, promoting collaboration among developers and researchers. Benefit from reduced computational demands, flexible model configurations, and the ability to transition seamlessly across different frameworks. Applicable to tasks such as sentiment analysis, object detection, and speech recognition, these models support the development of contemporary AI solutions.
vampnet
This repository offers tools and guidance for developing and fine-tuning generative music models with the Descript Audio Codec. It details setup requirements like Python 3.9 due to dependencies. Key features include access to pre-trained models licensed under CC BY-NC-SA 4.0, interactive sessions with Gradio UI, and configuration management with argbind. The repository also provides single or multi-GPU training and debugging, as well as audio input fine-tuning, focusing on customizable configuration files to enhance model development and deployment.
octo
Discover a novel method in robotic control using transformer-based diffusion models trained on 800k diverse robot trajectories. Integrating language commands and RGB inputs, Octo efficiently handles various action spaces with limited resources. Adaptations through zero-shot evaluations and custom finetuning allow seamless transitions to new robotic settings. Pretrained checkpoints and detailed guides facilitate deployment while advanced attention mechanisms enhance adaptability and resource efficiency.
speechbrain
SpeechBrain, an open-source PyTorch toolkit, simplifies Conversational AI development with over 200 training recipes for speech and text processing tasks. It includes capabilities like speech recognition, speaker recognition, and speech enhancement, suitable for rapid prototyping and educational use. The toolkit integrates easily with HuggingFace pretrained models and offers extensive documentation, facilitating research and development in complex AI systems. Discover its features and models tailored for diverse AI applications, balancing ease of use with advanced technical capabilities.
diffwave
DiffWave is a diffusion-based neural vocoder known for transforming Gaussian noise into high-quality speech through iterative refinement. It utilizes log-scaled Mel spectrograms for precise control, and supports features such as fast inference, multi-GPU training, and mixed-precision training. Recent updates include unconditional waveform synthesis and a fast sampling algorithm. With pretrained models and audio samples readily available, DiffWave offers a robust solution for both research and practical speech synthesis tasks.
ProphetNet
ProphetNet by MSRA NLC offers a comprehensive suite of natural language generation tools, including future-informed pretrained models, GLGE baselines, joint generator-ranker systems, and diffusion models like GENIE and AR-Diffusion. The CRITIC module enhances LLM functionality through interaction with external tools. The project adheres to open-source principles, aligned with Microsoft's code of conduct.
malaya
Malaya is a versatile toolkit for the Bahasa Malaysia language, utilizing PyTorch for sophisticated language modeling. Compatible with Python 3.6 and above, this library permits flexible installation options. It features comprehensive documentation and a library of pretrained models available on PyPI and Hugging Face. Supported by KeyReply, Nvidia, and TensorFlow Research Cloud, Malaya serves as an essential tool for researchers and developers focused on Malaysian linguistic projects. Detailed resources and community support enhance its usability.
klaam
Klaam offers advanced Arabic speech technology using models like Wave2Vec and FastSpeech2 for recognition, classification, and text-to-speech. Supports both Modern Standard Arabic and dialects such as Egyptian, leveraging datasets like MGB-3 and Common Voice. Comprehensive guides facilitate easy integration into projects, ideal for developers working on Arabic language processing.
GPT2-Chinese
The GPT2-Chinese project provides a comprehensive toolkit for training Chinese language models using GPT2 technology. It includes support for BERT tokenizer and BPE models, enabling the generation of varied textual content such as poems and novels. The repository offers diverse pre-trained models, from ancient Chinese to lyrical styles, ideal for NLP practitioners. This resource supports large training corpora and encourages community collaboration through discussions and model contributions, aiding developers in advancing their NLP expertise in a practical and informative manner.
StyleShot
StyleShot offers a novel approach to style transfer without the need for test-time tuning, utilizing a style-aware encoder and the StyleGallery dataset. This method efficiently replicates styles like 3D, flat, and abstract, outperforming existing techniques. Available resources on platforms like HuggingFace increase its accessibility for AI researchers and developers.
Feedback Email: [email protected]