#Transformer

Logo of trax
trax
Explore Trax, the deep learning library prioritizing code clarity and speed. Maintained by Google Brain, it features pre-trained models like Transformers and welcomes community contributions. Trax supports diverse environments from Python scripts to shell, operates on CPUs, GPUs, and TPUs, and integrates TensorFlow Datasets for data handling. It simplifies model training with functional pipelines, providing accessible high-performance deep learning solutions.
Logo of AiLearning-Theory-Applying
AiLearning-Theory-Applying
The project offers an in-depth look into AI concepts ranging from basic to advanced levels, covering areas like machine learning, deep learning, and BERT-based natural language processing. It includes extensive tutorials and datasets, making it suitable for learners at different stages. The curriculum spans key areas such as foundational mathematics, machine learning competitions, the basics of deep learning, and a user-friendly Transformer guide. The materials are regularly refreshed to reflect the latest in AI development, providing a clear and thorough understanding of AI models.
Logo of mint
mint
Discover a minimalistic PyTorch library implementing common Transformer architectures, ideal for model development from scratch. Engage with sequential tutorials featuring BERT, GPT, and additional models crafted to enhance understanding of Transformers. Utilize fast subword tokenization with HuggingFace tokenizers. The library supports pretraining on various dataset sizes using in-memory and out-of-memory techniques and includes fine-tuning capabilities. Experience features such as the BERT completer for masked string completion. A functional toolkit to support machine learning projects.
Logo of detrex
detrex
detrex is an open-source toolbox offering cutting-edge Transformer-based detection algorithms. It is built on Detectron2 and features a modular design for custom models and robust baselines. The project is user-friendly and lightweight, incorporating a LazyConfig System and training engine. detrex supports models like Focus-DETR and SQR-DETR and uses PyTorch 1.10+ for integration. Regular updates and comprehensive tutorials enhance usability. Explore detrex's project page for detailed features, documentation, and training techniques.
Logo of Jamba
Jamba
Discover Jamba, a versatile Hybrid Transformer-Mamba Language Model implemented in PyTorch. Designed for efficient language processing, it features customizable parameters like input dimensionality and model depth. Suitable for researchers and developers working with token-based input data, Jamba offers a straightforward installation via pip and easy training with the supplied train.py script. Explore its integration with PyTorch for enhanced language modeling tasks without unnecessary complexity.
Logo of llm-resource
llm-resource
This detailed guide offers a comprehensive collection of state-of-the-art resources in the field of Large Language Models (LLM), covering key topics such as algorithms, training processes including fine-tuning and alignment, as well as inference and data engineering. It explores model compression techniques, evaluation metrics, and prompt engineering, supported by diagrams and practical examples. The guide provides insights into models like Transformer, GPT, and MoE, while looking into future multimodal model developments. Including links to in-depth guides and code repositories, it is a crucial resource for staying informed in the rapidly evolving domains of AI and machine learning.
Logo of GNT
GNT
Generalizable NeRF Transformer (GNT) employs transformers to reconstruct Neural Radiance Fields efficiently. It utilizes a dual-stage architecture with attention mechanisms for scene representation and rendering, achieving results comparable to state-of-the-art methods. GNT demonstrates improvements in PSNR and capabilities in depth and occlusion inference, optimized for various datasets.
Logo of ai_and_memory_wall
ai_and_memory_wall
Examine the memory footprint, parameter count, and FLOPs of state-of-the-art AI models in computer vision, NLP, and speech. Access detailed metrics for transformer and vision model training and inference, with historical memory breakdowns. This resource offers valuable data from the AI and Memory Wall study, aiding in optimizing model efficiency for contemporary applications.
Logo of awesome-DeepLearning
awesome-DeepLearning
Serving as an online deep learning encyclopedia, this PaddlePaddle project provides a wide array of resources designed to simplify the mastery and application of deep learning technologies across real-world scenarios. Suitable for both beginners and seasoned developers, it includes online courses, instructional videos, practical programming books, and case studies tailored to industry needs. Diverse learning formats such as interactive tutorials and live broadcasts are available. Continuously updated to reflect the latest PaddlePaddle releases, it helps users stay informed about new algorithms and methodologies, facilitating the development of skilled AI professionals.
Logo of ProPainter
ProPainter
Discover the latest advancements in video inpainting through ProPainter's use of efficient propagation and transformer methods. This project features capabilities such as video completion and object removal, emphasizing memory-efficient processes across well-known platforms like Hugging Face and OpenXLab. Designed for research and educational purposes, ProPainter enhances training efficiency through pre-computed optical flow, marking it as a pivotal tool in video editing and AI research.
Logo of SwissArmyTransformer
SwissArmyTransformer
Discover SwissArmyTransformer, a unified codebase for integrating model-agnostic components into Transformer-based models. Utilize DeepSpeed and model parallelism for efficient pretraining and finetuning of large-scale models with ease. Implement prefix-tuning in models such as GLM and GPT to boost performance with minimal effort. Leverage extensive training support on multiple GPUs or nodes, accommodating models like T5-10B and experimental ones like CogView2. SwissArmyTransformer offers a comprehensive environment for developing and optimizing Transformer variants designed for various AI tasks.
Logo of flatformer
flatformer
FlatFormer enhances 3D point cloud transformer efficiency with flattened window attention, solving latency challenges for applications like autonomous driving. It reduces processing overhead by grouping point clouds equally and applying self-attention, achieving significant speed improvements over SST and CenterPoint while ensuring high accuracy on the Waymo Dataset. It delivers real-time performance on edge GPUs, surpassing traditional sparse convolutional methods in speed and performance on large-scale benchmarks.
Logo of TATS
TATS
Discover an innovative method for generating long-form videos using Time-Agnostic VQGAN and Transformer models. This system generates extensive frames from brief training sequences and supports video creation from text or audio inputs, offering diverse output options. Recent findings reveal discrepancies between FVD metrics and human evaluations, providing new insights. It also includes guidelines for setup and usage across different datasets, making it an essential resource for industry professionals.
Logo of RWKV-LM
RWKV-LM
Leveraging a unique attention-free architecture, RWKV combines the strengths of RNNs and Transformers to deliver exceptional language model performance. It supports rapid inference, low VRAM usage, and efficient training. RWKV's parallelization capabilities facilitate GPT-style computation, making it adaptable for various AI applications such as text generation and image processing. This model is compatible with edge devices, ensuring resource efficiency and offering diverse training and fine-tuning options for tailored outputs across different data scales.
Logo of llm_interview_note
llm_interview_note
Explore a curated collection of large language model concepts and interview questions, particularly suited for resource-constrained scenarios. Discover 'tiny-llm-zh', a compact Chinese language model, alongside projects including llama and RAG systems for practical AI learning. Engage with resources on deep learning, machine learning, and recommendation systems.
Logo of pysentimiento
pysentimiento
Pysentimiento is an open-source Python library tailored for sentiment analysis and other social NLP tasks using transformer models. It supports several languages, such as Spanish, English, Italian, and Portuguese, in tasks including hate speech and emotion detection, irony detection, as well as NER & POS tagging. Installable via pip, it provides quick sentiment predictions and emotion analysis. It also features preprocessing tools for social media content. Note models are based on third-party datasets and are suitable for non-commercial use. Access comprehensive guides and examples on Colab to make the most of its functionality in analyzing multilingual social media content.
Logo of memit
memit
This open-source project allows mass-editing of transformer model memory, enabling precise manipulations of stored facts. It features a user-friendly API for easy implementation and offers extensive tools for thorough evaluation. Developers can leverage the comprehensive documentation for straightforward setup and application, fostering wide adoption within the AI research community.
Logo of tr
tr
A cutting-edge text recognition SDK engineered in C++ with Python interfaces, tailored for offline functionality on scanned documents. It emphasizes combining CRNN with Transformer models to improve multi-line text recognition and document comprehension. By turning images into sequences, it aims to transcend traditional OCR boundaries. The SDK accommodates multi-threading and incorporates a lightweight Transformer framework for contextual error correction. Optimal for handling curved texts and intricate document layouts, offering high adaptability and effectiveness.
Logo of TransMorph_Transformer_for_Medical_Image_Registration
TransMorph_Transformer_for_Medical_Image_Registration
Discover how Vision Transformers, including TransMorph, revolutionize medical image registration for enhanced precision. Evaluate different TransMorph variants and benchmark comparisons to gain insight into performance. Find guidance on Docker setups for brain MRI registration and access reproducible results from datasets like IXI and OASIS. Perfect for researchers interested in advanced transformer methods for unsupervised image alignment, equipped with pre-trained models and comprehensive training documentation.
Logo of nlp-paper
nlp-paper
This directory provides a well-organized collection of important natural language processing (NLP) research papers, including significant topics like Transformer frameworks, BERT variations, transfer learning, text summarization, sentiment analysis, question answering, and machine translation. It features notable works such as 'Attention Is All You Need' and detailed investigations into BERT's functions. Covering downstream tasks like QA and dialogue systems, interpretable machine learning, and specialized applications, this collection is a valuable resource for researchers and developers exploring advancements and techniques influencing current NLP practices, with a focus on practical implications in machine learning.
Logo of Transformer-in-Vision
Transformer-in-Vision
This project examines the growing significance of Transformer technology in a variety of computer vision applications. It serves as a comprehensive resource, aggregating recent studies from fields like robotics and image processing, and highlighting the essential role of Transformers in AI models. The project outlines innovations such as LLM-in-Vision, and delivers thorough surveys on complex topics, such as multi-modal pre-training and generative adversarial networks, providing readers with insights into this evolving field.
Logo of SDT
SDT
Utilizing the style-disentangled Transformer (SDT), this technology improves online handwriting generation by distinguishing writer and character styles. It advances beyond traditional RNN-based methods by capturing subtle handwriting style variations, enhancing imitation accuracy. SDT further extends capabilities to offline Chinese handwriting enhancement. Diverse scripts can be explored with preconfigured datasets and pre-trained models, facilitating immediate use. SDT provides a flexible approach to replicating personalized writing styles.
Logo of DINO
DINO
DINO, featuring improved de-noising anchors, enhances Detection Transformers for superior object detection capabilities. It excels in both universal and open-set detection and segmentation tasks, showcasing significant performance on COCO benchmarks with a compact model. Utilizing ResNet and Swin Transformer backbones, DINO promises quick convergence and precision. Innovative variants like Mask DINO and Stable-DINO offer straightforward training and adaptability across diverse detection scenarios. The model zoo provides access to the latest checkpoints, supporting extensive multi-scale training and inference.
Logo of recurrent-memory-transformer
recurrent-memory-transformer
The Recurrent Memory Transformer (RMT) enhances AI model performance by using memory-augmented segment-level transformers. Designed for Hugging Face models, it incorporates special tokens to optimize memory and sequence processing. Features include comprehensive training examples, gradient accumulation, and metric management. It supports tasks with extensive context requirements and is developed in partnership with DeepPavlov.ai, AIRI, and London Institute for Mathematical Sciences.
Logo of TSFpaper
TSFpaper
The repository provides a comprehensive collection of over 300 papers on time series and spatio-temporal forecasting, categorized by model type. It is regularly updated with the latest studies from leading conferences, journals, and arXiv, supporting various kinds of forecasting such as univariate, multivariate, and spatio-temporal. It explains complex concepts and how deep learning affects model flexibility, and explores emerging subjects like irregular time series and recent innovations like the Mamba model. Contributions of relevant papers are welcome to further enrich this forecasting research resource.
Logo of Yi
Yi
Yi is an open-source project delivering advanced bilingual language models trained on extensive multilingual data. These models excel in linguistic understanding and reasoning and are noted for their performance on the AlpacaEval Leaderboard. Yi models, distinguished by unique architecture, offer stable integration within various AI frameworks, demonstrating superior precision in both English and Chinese contexts. Suitable for diverse applications such as coding, math, and creative work, these models are ideal for personal, academic, and commercial use.
Logo of whisper
whisper
Whisper, a speech recognition solution by OpenAI, utilizes a Transformer sequence-to-sequence approach for multilingual transcription and language identification. With models ranging from 'tiny' to 'turbo', it balances speed with accuracy and is compatible with multiple Python versions, supporting comprehensive audio processing tasks in Python as well as via command-line, catering to developers in need of robust pre-trained models across multiple languages.
Logo of llm-course
llm-course
Discover key aspects of large language models including essential mathematics, Python, and neural networks in a structured course. Learn to implement and deploy LLM-based applications with advanced techniques. Access interactive tools such as HuggingChat and ChatGPT for enriched learning experiences and detailed practical notebooks. Benefit from a comprehensive resource collection for mastering LLM construction, fine-tuning, and optimization.
Logo of Deep-Learning-Experiments
Deep-Learning-Experiments
This updated 2023-2024 version provides comprehensive notes and practical experiments on deep learning, covering topics such as supervised learning, CNNs, RNNs, transformers, and large language models. It includes downloadable resources, Jupyter notebooks, and guides on key programming tools like Python, PyTorch, and Docker, offering a complete educational experience for mastering modern machine learning methods.
Logo of YAYI2
YAYI2
This model, developed by Wenge Research, is a multilingual large language model utilizing over 2 trillion tokens in pre-training. It is optimized for general and specialized uses with millions of fine-tuning instructions and human feedback reinforcement learning to align with human values. The model offers enhancements in language understanding, reasoning, and code generation, exceeding the performance of similar-sized open-source models. Discover more through the detailed technical report and join the community in advancing the open-source pre-training model ecosystem with this 30B parameter innovation.
Logo of MST
MST
This toolbox provides a robust solution for spectral compressive imaging, featuring over 15 diverse algorithms, including MST++. It offers comprehensive support for spectral reconstruction using innovative transformer techniques, demonstrated in the NTIRE 2022 Challenge. The repository includes advanced model-based methods like TwIST, GAP-TV, and DeSCI, alongside ongoing developments in low light enhancement, making it a valuable resource for those exploring cutting-edge spectral data recovery methods.
Logo of ecco
ecco
Ecco is a Python library designed for exploring and explaining Transformer models through interactive visualizations. It focuses on pre-trained models such as GPT2 and BERT, providing features like feature attribution, neuron activation capture, and activation space comparison within Jupyter notebooks. Built on PyTorch and Hugging Face's transformers, it helps visualize token predictions and neuron activation patterns, offering insights into the functions of NLP models.
Logo of conformer
conformer
Discover how the Conformer model seamlessly integrates convolutional neural networks with transformers to enhance speech recognition. This method efficiently captures both local and global audio dependencies, offering improved accuracy over existing models. Built on PyTorch, it supports state-of-the-art performance and can be easily trained via OpenSpeech in Python environments. Highlights include straightforward installation, detailed usage guidance, and open-source contribution opportunities, adhering to PEP-8 standards.
Logo of PersFormer_3DLane
PersFormer_3DLane
PersFormer offers an innovative transformation module for 3D lane detection that uses camera parameters to enhance accuracy. This Python-based solution integrates a 2D/3D anchor design with multi-task learning, showing improved performance over existing methods on OpenLane and ONCE datasets. Presented at ECCV 2022, the research paper is supported by comprehensive evaluations on arXiv and other platforms, highlighting its capabilities. PersFormer introduces a unique perspective transformer approach, establishing a new standard in 3D lane detection.
Logo of GLiNER
GLiNER
GLiNER offers a lightweight solution for identifying diverse entity types with a BERT-like transformer encoder. It stands as a viable option against traditional NER models limited to predefined entities and large language models, often too resource-intensive. GLiNER balances flexibility and efficiency, applicable across various scenarios. Easy installation and pretrained models facilitate entity prediction. Access example notebooks for finetuning and model conversion, ensuring seamless integration in research and industry contexts.
Logo of BertWithPretrained
BertWithPretrained
Learn about implementing the BERT model with PyTorch for tasks such as text classification, question answering, and named entity recognition. The project offers insights into BERT's functionalities and related applications, aiding in the comprehension of transformer mechanisms. Suitable for developers and researchers employing BERT's pre-trained models in language processing.
Logo of HAT
HAT
The HAT project showcases a novel method for image restoration with emphasis on super-resolution. Utilizing advanced pixel activation, it enhances image quality on datasets like Set5, Set14, and Urban100, independent of ImageNet pretraining. The project includes GAN-based models tailored for sharper and more accurate results. Discover comprehensive performance insights through the available codes and pre-trained models, alongside straightforward testing and training guidance for practical application in real-world scenarios of image super-resolution.
Logo of poolformer
poolformer
Discover the capabilities of the MetaFormer architecture in vision tasks through PoolFormer, which leverages simple pooling for token mixing to outperform advanced transformers and MLP models. This project emphasizes straightforward design while achieving high accuracy on datasets such as ImageNet-1K. Find comprehensive resources including implementations, training scripts, model evaluations, and downloadable pretrained models, along with visualization tools to explore activation patterns in models like PoolFormer, DeiT, and ResNet. Ideal for those interested in simplifying computer vision models without sacrificing performance.
Logo of NLP-Tutorials
NLP-Tutorials
Discover a structured collection of NLP models and methods, covering essentials from TF-IDF to modern Transformer and BERT approaches. This tutorial details fundamental NLP concepts including Word2Vec, Seq2Seq, and attention models, enhanced by practical code examples and illustrations. Suitable for those aiming to expand their expertise in NLP frameworks and practical applications. Learn about straightforward installation guides and efforts to streamline intricate NLP models via Keras and PyTorch.
Logo of lightseq
lightseq
Explore a library that significantly boosts sequence processing speed for training and inference, with CUDA-based support for models like BERT and GPT. LightSeq achieves up to 15x faster performance compared to traditional methods using fp16 and int8 precisions. Compatible with frameworks like Fairseq and Hugging Face, it offers efficient computations for machine translation, text generation, and more without exaggeration.
Logo of step_into_llm
step_into_llm
Learn about AI model techniques and development with MindSpore open courses, tailored for both theoretical and practical insights. Join experts as they discuss large model applications, with access to downloadable materials and live broadcasts. Engage within the MindSpore community and take on skill-testing challenges. Stay updated for future sessions and enhance your AI knowledge using state-of-the-art resources.
Logo of Keras-TextClassification
Keras-TextClassification
The project offers multiple cutting-edge neural network models for efficient text classification, comprising top architectures such as BERT and XLNet. Users have the flexibility to utilize pre-trained models and embeddings or create custom solutions. With a user-friendly structure that supports various training and prediction scripts, the project simplifies the implementation process. Comprehensive documentation and examples assist in the swift adoption of these models in text analysis tasks.
Logo of LongRoPE
LongRoPE
LongRoPE extends the context window of large language models past 2 million tokens using non-uniform positional embeddings and a 256k fine-tuning strategy. This method sustains performance across various context lengths, supporting in-context learning and long document summarization.
Logo of TextPruner
TextPruner
Learn efficient techniques to reduce the size and increase the speed of language models without retraining. TextPruner supports models like BERT and RoBERTa, maintaining performance in NLP tasks. Use it as a Python package or CLI with examples provided. Access continuous updates and research for various language applications.
Logo of Awesome-LLM-Large-Language-Models-Notes
Awesome-LLM-Large-Language-Models-Notes
Explore a detailed compilation of large language models (LLMs), organized by year, size, and name. This resource covers foundational and recent models such as Transformer, GPT, BERT, GPT-4, and BLOOM, with links to research papers and implementations. An essential guide for NLP research and applications, complete with insightful articles and the significance of HuggingFace for model deployment.
Logo of MambaVision
MambaVision
MambaVision is a hybrid vision backbone merging self-attention and mixer blocks, achieving leading performance in accuracy and throughput. Utilizing a symmetric path without SSM, this PyTorch model enhances global context processing. Available on Hugging Face and GitHub, MambaVision pre-trained models process images of any resolution, adhering to CC-BY-NC-SA-4.0 licensing. Suitable for tasks like classification, detection, and segmentation, it offers multi-scale features over 4 stages. Integration is seamless via pip or Hugging Face.
Logo of Qwen2
Qwen2
Qwen2.5 offers developers unparalleled flexibility with multilingual and high-context support, significantly improving application performance across diverse deployment scenarios. Explore enhanced fine-tuning capabilities with detailed performance metrics to optimize your projects.
Logo of spreadsheet-is-all-you-need
spreadsheet-is-all-you-need
Experience a novel way to learn about transformers in a spreadsheet setup, guided by nanoGPT principles, and now accessible in Excel for enhanced learning and discovery. This project offers a hands-on, visual, and interactive representation of the transformer architecture, allowing users to manipulate and understand components such as embedding, layer norm, and self attention. Based on Andrej Karpathy's NanoGPT, it simplifies character-based prediction with interactive features. Dive into the details with preconfigured matrices and see how spreadsheet software handles large-scale calculations.
Logo of TransBTS
TransBTS
This repository features TransBTS and TransBTSV2, leveraging transformer models for efficient multimodal brain tumor segmentation. With datasets like BraTS, LiTS, and KiTS, and implemented using Python and PyTorch, these tools streamline training and testing. TransBTS specializes in detailed brain tumor analysis, while TransBTSV2 enhances volumetric segmentation efficiency. Offering comprehensive resources including implementation details, data preprocessing, and scripts, this project supports advancements in medical imaging research.
Logo of fairseq
fairseq
This toolkit facilitates state-of-the-art sequence model development for translation, summarization, and language modeling. It includes capabilities like multi-GPU and mixed precision training, and supports PyTorch integration. Additionally, it provides pre-trained models and implementations from influential studies. The toolkit is regularly updated and suitable for a wide range of datasets, serving as a valuable resource for researchers and developers.