#Transformers

Logo of nlp-recipes
nlp-recipes
This repository offers tools and examples for developing NLP systems using cutting-edge AI techniques. It features Jupyter notebooks and utility functions for state-of-the-art scenarios, supporting multilingual tasks such as text classification and intelligent chatbots. This resource highlights the use of pretrained models like BERT and transformers to speed up solution development, including integration with Azure Machine Learning and the use of prebuilt APIs for effective NLP task management.
Logo of peft
peft
Parameter-Efficient Fine-Tuning (PEFT) offers a cost-effective way to adapt large pretrained models with reduced computational and storage needs, maintaining high performance similar to fully fine-tuned models. PEFT works with tools like Transformers, Diffusers, and Accelerate, making it versatile for model training and inference across different domains. This method helps in managing large models on consumer hardware by minimizing memory consumption without compromising accuracy.
Logo of best_AI_papers_2021
best_AI_papers_2021
Explore key AI developments of 2021, featuring breakthroughs with ethical focus, bias awareness, and innovative applications that enhance quality of life. This list provides insights through video summaries, detailed articles, and code repositories, giving a broad understanding of the year's AI achievements. Discover advances from OpenAI's DALL·E to innovations in computer vision and neuroprosthetics, all while considering the critical choices in AI technology implementation.
Logo of LTSF-Linear
LTSF-Linear
The LTSF-Linear project introduces a set of linear models that enhance time series forecasting capabilities, surpassing the performance of traditional Transformers. These include the Linear, NLinear, and DLinear models, crafted to effectively accommodate trend, seasonality, and distribution variations. Offering high efficiency with low memory and parameter requirements, these models also provide interpretability via weight visualization. The project includes well-documented Python implementations and benchmarks, supporting both univariate and multivariate forecasting tasks. The models efficiently streamline the training process and enable quick inference, showing notable improvements over existing methods in capturing temporal dynamics.
Logo of REaLTabFormer
REaLTabFormer
The REaLTabFormer framework utilizes sequence-to-sequence models and GPT-2 to generate realistic relational and tabular data. It's designed for dataset synthesis, handling both relational structures and independent observations effectively. Available on PyPi, REaLTabFormer is easy to install and use, demonstrating excellent performance in prediction tasks. It employs techniques like target masking and statistical bootstrapping to reduce overfitting, with a user-friendly interface for creating validators to ensure high-quality synthetic samples.
Logo of shared_colab_notebooks
shared_colab_notebooks
This repository contains a wide range of Google Colaboratory notebooks catering to tasks in NLP, NLG, and computer vision. It features models like T5 and DialoGPT for language processing, ViT and ConvNeXT for visual tasks, and unique applications like 3D photo inpainting. Users can also find tutorials and projects on UI/UX with GPT2, making it suitable for those researching diverse ML domains. Explore and tailor these ML projects effortlessly.
Logo of Mamba-in-CV
Mamba-in-CV
This collection of Mamba-focused computer vision projects highlights recent developments including human activity recognition, anomaly detection, and autonomous driving. It explores the capabilities of state space models as alternatives to transformers, providing links to detailed papers and code. Ideal for researchers and practitioners interested in visual state space models.
Logo of mamba
mamba
The Mamba project provides an innovative state space model architecture designed for efficient handling of information-dense data such as language models, overcoming the shortcomings of earlier subquadratic models. Its architecture, focusing on hardware efficiency similar to FlashAttention, utilizes selective state space modeling for scalable solutions. Pretrained models are available on Hugging Face, and the `lm-evaluation-harness` library enables evaluations. Comprehensive resources include installation guides, usage instructions, and benchmarking scripts to support seamless integration and performance optimization.
Logo of LongNet
LongNet
LongNet, an advanced Transformer variant, scales sequence lengths to 1 billion tokens without affecting performance on shorter sequences. Using innovative dilated attention, it maintains linear computational complexity and a logarithmic token dependency, suitable for distributed training of lengthy sequences. The model integrates with existing Transformer optimizations, delivering strong results in long-sequence and general language tasks. Explore the possibilities of managing vast sequences like entire corpora or the Internet with improved efficiency and expressivity.
Logo of caduceus
caduceus
Learn about bi-directional equivariant methods in DNA sequence modeling, aiding tasks such as genomic prediction and classification. The project uses pre-trained models from HuggingFace, supporting processes like pretraining and fine-tuning. It is a valuable resource for genomic researchers and bioinformaticians. Access detailed guides for model deployment and experiments with Python scripts in advanced computing setups.
Logo of nlp-journey
nlp-journey
Discover a wide array of deep learning and natural language processing resources, including key books, notable research papers, informative articles, and crucial GitHub repositories. Topics include transformer models, pre-training, text classification, and large language models. Ideal for developers, researchers, and enthusiasts to expand their knowledge of NLP developments.
Logo of Efficient-LLMs-Survey
Efficient-LLMs-Survey
The survey systematically reviews efficiency challenges and solutions for LLMs, offering a clear taxonomy in model-centric, data-centric, and system domains. Recognizing the computational demands of LLMs, it underscores the importance of techniques like model compression, quantization, parameter pruning, and efficient tuning. This resourceful overview aims to aid researchers and practitioners in advancing LLM efficiency without overstating or using subjective descriptions.
Logo of single-cell-transformer-papers
single-cell-transformer-papers
Examine the role of transformers in single-cell omics, featuring a carefully curated list of notable models and their comprehensive assessments. This collection excludes bulk-data-based or partially transformative models, highlighting key applications in the single-cell field. Contributions through pull requests or issues are encouraged for community engagement. Discover models like Precious3GPT and LangCell that provide innovative approaches in scRNA-seq and spatial transcriptomics. Gain insights into advancements in zero-shot tasks, cell type annotation, and other novel methodologies.
Logo of ML-Notebooks
ML-Notebooks
Discover a diverse set of machine learning notebooks covering applications from neural networks to computer vision. These resources on Codespaces offer clear setup guidance for straightforward learning in machine learning fields, including PyTorch and generative adversarial networks, ideal for expanding knowledge or efficiently prototyping.
Logo of annotated_deep_learning_paper_implementations
annotated_deep_learning_paper_implementations
Discover a detailed set of annotated PyTorch implementations focused on neural networks and deep learning algorithms. The resource is continually updated and documented with comprehensible notes, providing practical insights into models like Transformers, GANs, and Diffusion Models, as well as reinforcement learning methods. Suitable for developers interested in architectures and optimization strategies, and complemented by regular updates to ensure resourcefulness. An essential repository for those wishing to broaden their deep learning acumen.
Logo of transformers
transformers
Access a wide range of pretrained transformer models suitable for various applications in text, vision, and audio, with easy integration using JAX, PyTorch, and TensorFlow. The Transformers library by Hugging Face offers tools for deploying and refining these models, promoting collaboration among developers and researchers. Benefit from reduced computational demands, flexible model configurations, and the ability to transition seamlessly across different frameworks. Applicable to tasks such as sentiment analysis, object detection, and speech recognition, these models support the development of contemporary AI solutions.
Logo of commented-transformers
commented-transformers
Explore comprehensive implementations of Transformers in PyTorch, focusing on building them from scratch. The project features highly commented code for Bidirectional and Causal Attention layers and offers standalone implementations of models like GPT-2 and BERT, designed for seamless compilation. Perfect for those interested in the inner workings of attention mechanisms and transformer models.
Logo of detoxify
detoxify
Detoxify provides accurate toxic comment classification using Pytorch Lightning and Transformers, with models for multilingual and unbiased detection. The library effectively identifies toxic content across various languages, minimizing biases to support researchers and content moderators. Discover how to train and deploy these models on diverse datasets to enhance online safety.
Logo of transformers-tutorials
transformers-tutorials
Discover how transformer models like BERT have transformed NLP and how comprehensive tutorials guide fine-tuning for various tasks. This resource explains advanced NLP techniques using Hugging Face's Transformers for customizing applications in text classification and sentiment analysis. Suitable for those integrating deep learning in business, the tutorials present a practical approach to neural architectures.
Logo of torchscale
torchscale
TorchScale, a PyTorch library, enables the scaling of Transformers for researchers and developers. It supports the development of new architectures for foundation models, enhancing stability, generality, capability, and efficiency in modeling. Key features include scaling Transformers to 1,000 layers, tuning sparse Mixture-of-Expert models, and achieving length extrapolation with new position embeddings. Recent innovations such as DeepNet, BitNet, RetNet, and LongNet enhance model stability and capacity across various tasks like language, vision, and speech. TorchScale offers straightforward installation for easy integration.
Logo of former
former
Delve into basic transformer concepts through a streamlined PyTorch model, ideal for educational use. The project features clear implementations minus the complexity of larger transformers, focusing on a single transformer block stack. Follow simple installation instructions using pip or conda environments. Execute basic experiments such as IMDb data classification with customizable hyperparameters. Perfect for hands-on exploration and customization, for those interested in learning about self-attention mechanisms without added complexity. Best suited for Python 3.6+ users interested in a comprehensive educational experience with transformers.
Logo of detr
detr
Discover DETR's novel object detection method using Transformers, ensuring efficient and parallel predictions with reduced complexity. Learn through PyTorch examples and explore its application in computer vision.
Logo of SAM-Med2D
SAM-Med2D
Delve into SAM-Med2D, an expansive and varied dataset crafted for 2D medical image segmentation, including 4.6 million images and 19.7 million masks. Designed to refine models, it covers 10 data modalities and a multitude of anatomical structures. Through sophisticated enhancements to the Segment Anything Model (SAM), this initiative pioneers advancements in medical imaging segmentation, providing notable gains in accuracy and operational efficiency. Keep abreast of continuous updates and potential collaborations in propelling the field of medical AI forward.
Logo of ABigSurveyOfLLMs
ABigSurveyOfLLMs
This survey offers an expansive view of recent developments in large language models (LLMs) within artificial intelligence. It aggregates a significant number of research papers from diverse conferences and open-access resources, providing an extensive review of the LLM field. Topics covered include alignment, data management, societal implications, and applications in sectors like healthcare and education. Key challenges such as safety, misinformation, and efficiency are also examined. This compilation is intended to assist researchers and practitioners in gaining a rapid understanding of the field and in exploring emerging research pathways in LLMs.
Logo of fast-DiT
fast-DiT
The project provides an improved PyTorch implementation for scalable diffusion models with transformers, focusing on optimizing training and memory efficiency. It features pre-trained class-conditional models on ImageNet (512x512, 256x256) and tools for both sampling and training. Enhancements like gradient checkpointing and mixed precision training lead to notable performance gains. Resources such as Hugging Face Space and Colab notebooks facilitate easy deployment and model training. Evaluation tools support metrics computation like FID and Inception Score for thorough analysis.
Logo of bumblebee
bumblebee
Bumblebee integrates with Axon and Hugging Face to provide easy access to pre-trained models, enabling efficient machine learning tasks with minimal code. It supports integration with Livebook, offering quick setup and execution of neural network functions.
Logo of LLM-Finetuning
LLM-Finetuning
This guide provides insights into advanced techniques for efficiently fine-tuning large language models with tools like LoRA and Hugging Face. Featuring comprehensive tutorials on various methods such as PEFT, RLHF training, and transformer-based approaches, it offers clear, step-by-step guides for model enhancement—suitable for data scientists and AI researchers seeking to optimize machine learning processes and accuracy.
Logo of AnglE
AnglE
Discover a framework for sentence embeddings using BERT and LLM models engineered for semantic textual similarity. It provides diverse training and inference options with loss methods like AnglE, CoSENT, and Espresso to improve performance. The framework includes numerous pre-trained models and can be seamlessly installed through PyPI, supporting both single and multi-GPU setups. Notable achievements feature acceptance in conferences such as ACL and NAACL. Utilize this tool for advanced semantic text comparison solutions.
Logo of SpanMarkerNER
SpanMarkerNER
SpanMarker provides a robust framework for Named Entity Recognition, using encoders such as BERT, RoBERTa, and ELECTRA. It integrates with the Hugging Face Transformers library, offering features like model management, hyperparameter tuning, and mixed precision training. SpanMarker enhances usability by supporting different annotation schemes and enables seamless access to the Hugging Face Hub, including a free API for fast deployment. It is suitable for developers aiming to train or utilize high-performance NER models on datasets like FewNERD and OntoNotes5.
Logo of course
course
Explore the capabilities of Transformers in natural language processing with this detailed course. Gain hands-on experience with Hugging Face tools, including 🤗 Transformers, 🤗 Datasets, 🤗 Tokenizers, and 🤗 Accelerate. The course is free and open-source, available in multiple languages, and welcomes translation contributions. Ideal for expanding knowledge of deep learning applications beyond NLP in a collaborative environment.
Logo of How-to-use-Transformers
How-to-use-Transformers
Accompanying the 'Transformers Library Quick Start' tutorial, this code repository offers structured modules and datasets for practical NLP learning. Focused on core concepts of Transformer models, it covers applications like sentiment analysis, sequence labeling, and text summarization. The tutorial also delves into large language model integration, with ongoing updates to introduce the latest NLP advancements.
Logo of pytorch-sentiment-analysis
pytorch-sentiment-analysis
This series of tutorials offers a detailed guide on sequence classification for sentiment analysis utilizing PyTorch, covering Neural Bag of Words, Recurrent Neural Networks, Convolutional Neural Networks, and BERT transformers. It begins with foundational models and gradually advances in complexity and precision for movie review sentiment prediction. Instructions for environment setup and essential resources are provided, making it suitable for both newcomers and experienced practitioners of sentiment analysis in Python.
Logo of Local-LLM-User-Guideline
Local-LLM-User-Guideline
This guide delves into the features and differences of Local Large Language Models (LLMs), emphasizing privacy management, versatility, and open-source community contributions. It compares online and local setups, considering privacy safeguarding, cost efficiency, and management aspects of solutions like GPT, LLama, and Mistral. The publication discusses viable scenarios for on-premises LLM application, such as environments with sensitive data, task diversity, and high-volume data handling. Community-driven development is promoted, while recognizing the difficulties of self-managing these systems. It's a crucial resource for those aiming to understand AI's changing landscape with an emphasis on independence and data protection.
Logo of exporters
exporters
Exporters facilitates the conversion of Hugging Face Transformers models to Core ML, ensuring deployment across Apple platforms like macOS and iOS. It offers ready-made configurations for models like BERT and GPT2, supports the ML Program format, and provides options for model optimization and quantization. The package underscores the importance of validation on macOS and suggests pre-optimization with Hugging Face's 'Optimum' for mobile use.
Logo of Flowformer
Flowformer
Flowformer tackles the issue of quadratic complexity in traditional transformers by proposing a linear complexity model based on flow network theory. It facilitates efficient processing of lengthy sequences over 4,000 tokens with task-universal application, suitable for various domains such as Long Sequence Modeling, Vision, NLP, Time Series, and Reinforcement Learning. Flow-Attention design enhances resource allocation through competitive dynamics, underpinned by solid theoretical foundations, promising advancements in transformer technology with proven superior performance metrics.
Logo of Depth-Anything-V2
Depth-Anything-V2
Depth Anything V2 presents a refined depth estimation model offering improved detail accuracy and robustness compared to its predecessor. It outperforms SD-based models in speed, parameter efficiency, and precision. With extensive integration into Apple Core ML and Transformers and new pre-trained models, it provides broad usability. The release supports easy setup for image and video depth mapping through code or demo apps on platforms like Hugging Face. The open-source model is available under the Apache-2.0 license for smaller versions and CC-BY-NC-4.0 for others, promoting community engagement and accessibility.
Logo of machine-learning-list
machine-learning-list
The reading list systematically introduces fundamental and advanced machine learning concepts, especially focusing on language models. It serves as a guide to key principles, deployment strategies, reasoning techniques, and AI’s broader implications. Structured in tiers, it balances theory and practical application. Subjects include machine learning basics, transformers, training methods, and applications, with insights into AI safety, economic, and philosophical aspects — ideal for understanding and scaling machine learning models.
Logo of InfiniTransformer
InfiniTransformer
InfiniTransformer provides an unofficial implementation of Infi-attention in PyTorch and Hugging Face Transformers, designed to optimize infinite context transformers like Llama3 and Gemma. It offers two types of implementation: model-wise, which requires overrides and custom training for drastically reduced memory use, and attention-layer focused, maintaining compatibility with standard HF Trainer. Key features include efficient memory use for large sequence lengths and practical guidelines, with training script examples for MiniPile and WikiText datasets, making InfiniTransformer a groundbreaking solution for scalable, context-rich transformer models.
Logo of Transformers-Recipe
Transformers-Recipe
This neutral guide showcases a broad array of materials for understanding and implementing transformer models, applicable from NLP to computer vision. It features overviews, concise technical insights, tutorials, and applicable examples, suitable for learners and professionals interested in transformers. Highlighted elements include detailed illustrations, technical summaries, and important references such as the 'Attention Is All You Need' paper. The guide also offers practical insights into implementation via resources like the HuggingFace Transformers library.
Logo of kogpt
kogpt
KoGPT by KakaoBrain is a Korean generative pre-trained transformer designed for tasks such as classification, search, summarization, and generation of Korean text. It features over 6 billion parameters with 28 layers, requiring at least 32GB of GPU RAM for optimal functioning. Available in various precision formats, including float16, this reduces memory use. Users should be aware of the potential generation of sensitive content due to training on raw data. Learn more about its specifications for integration into AI applications.
Logo of rust-bert
rust-bert
Rust-bert offers Rust-native NLP model implementations supporting translation, summarization, sentiment analysis, and more, utilizing Hugging Face's Transformers via tch-rs and onnxruntime bindings. Includes multi-threaded tokenization and GPU inference with easy-to-use pipelines.
Logo of transformers-code
transformers-code
Discover tutorials on Transformers, from basic concepts to real-world NLP applications, including parameter fine-tuning and model training. Learn distributed training with Accelerate, using practical examples in text and chatbot solutions. Suitable for AI enthusiasts seeking to enhance skills on platforms like Bilibili and YouTube.
Logo of Autoformer
Autoformer
Autoformer is an innovative forecasting model enhancing traditional Transformers with deep decomposition and auto-correlation mechanisms, achieving a 38% enhancement in long-term forecasting for areas like energy and weather. It offers practical applications with speed-optimized architecture and integrates with platforms such as Hugging Face.
Logo of insanely-fast-whisper
insanely-fast-whisper
Achieve rapid audio transcription with Whisper and Flash Attention on supported devices. Utilizing fp16 and batching optimizations, transcribe 150 minutes of audio significantly faster. The tool allows for automatic speech recognition with options for language detection and speaker diarization. Compatible with CUDA and mps, it streamlines installation and execution from any directory.
Logo of minRF
minRF
Discover a simplified implementation of scalable rectified flow models leveraging SD3 training techniques and LLaMA-DiT architecture. Ideal for beginners, this repository provides straightforward instructions for training on datasets such as MNIST and CIFAR, with further options for advanced exploration using ImageNet. Engage with rectified flow transformers easily, without requiring extensive prior knowledge, in a project that emphasizes accessibility and innovation in flow model methodologies.
Logo of Mamba_State_Space_Model_Paper_List
Mamba_State_Space_Model_Paper_List
This resource provides a comprehensive list of papers dedicated to State-Space Models (SSMs) and their applications, emphasizing their potential as alternatives to Transformers in innovative network solutions. It encompasses detailed theoretical insights and practical applications, including visual object tracking, 3D pose estimation, and medical image processing. The collection boasts thorough surveys, theses references, and the latest research publications, serving as a valuable tool for academics and practitioners looking to keep pace with the evolving role of state-space modeling. It is regularly updated to reflect the latest progressions in the field.
Logo of manga-ocr
manga-ocr
Manga OCR offers specialized recognition for Japanese manga, handling complex text scenarios such as vertical and horizontal text and low-quality images in one pass. It integrates smoothly with tools like ShareX for efficient text extraction, perfect for learners and enthusiasts alike.
Logo of CogView
CogView
CogView uses a 4 billion parameter transformer model for general text-to-image generation. It includes code releases and demos, with PB-relax and Sandwich-LN techniques for stable transformer training. While supporting multiple languages, CogView primarily uses Chinese text input with recommended English translations. It offers pretrained models, inference, and super-resolution features, along with detailed setup instructions for various environments, suitable for complex AI tasks, including both single and multi-node training.
Logo of audio-transformers-course
audio-transformers-course
This open-source course offers a deep dive into using Transformers for audio and speech processing, provided by Hugging Face. It includes translations in multiple languages like English, Spanish, and French. Participants can contribute translations and engage with a global community via GitHub and Discord. Interactive Jupyter notebooks are available for practical learning. The course aims to make machine learning education accessible globally with well-structured chapters.
Logo of Awesome-LLM
Awesome-LLM
Explore a curated collection of resources on large language models, featuring key papers, training frameworks, deployment tools, and educational materials. This repository offers insights into the methodologies and impacts of LLMs, providing access to important research, various applications, and all available LLM checkpoints and APIs. Gain understanding of ChatGPT and related technologies through milestone papers, trending projects, and LLM evaluations. Ideal for both beginners and professionals, this platform covers the essentials from basic principles to advanced advancements in AI-powered language processing.