#language models

Logo of LoRA
LoRA
LoRA employs low-rank matrix adaptations, reducing trainable parameters and optimizing task adaptation in large language models. This approach minimizes storage needs and avoids inference delays. The Python package integrates with PyTorch and the Hugging Face PEFT library, ensuring competitive performance alongside full fine-tuning in benchmarks like GLUE. LoRA adapts specific Transformer elements, like query and value projections, offering flexibility across models such as RoBERTa, DeBERTa, and GPT-2. The 'loralib' can be installed to apply these techniques efficiently.
Logo of raptor
raptor
RAPTOR offers an advanced approach to language models with its recursive tree structure, improving the efficiency of information retrieval in large texts. It supports integration with custom models for summarization and question-answering, making it highly adaptable to different research requirements. The open-source nature encourages continuous enhancement through community contributions.
Logo of OpenPrompt
OpenPrompt
Discover an open-source framework for prompt-learning that enhances pre-trained language models to adapt to diverse NLP tasks through textual templates and PLMs. Key features include seamless integration with Huggingface transformers and flexible adaptable strategies for various applications. Stay informed about the latest project updates like UltraChat for supervised instruction tuning. OpenPrompt offers a standardized platform for simplified and efficient NLP model deployment.
Logo of cognee
cognee
Cognee provides a flexible and scalable ECL pipeline solution designed to assist developers in effectively managing AI applications. It facilitates the integration and retrieval of historical conversations, documents, and audio transcriptions, thereby lowering hallucinations, development efforts, and costs. Supporting a range of tools like vector and graph storage alongside various LLMs, Cognee is apt for a wide range of data operations. Its modular nature and user management features not only improve development efficiency but also ensure robust and secure data management.
Logo of llms-from-scratch-cn
llms-from-scratch-cn
This project provides a detailed, step-by-step guide for building large language models (LLMs) from scratch. Focused on practical implementation and theoretical understanding, it includes tutorials and code examples for comprehension and creation of models like ChatGPT. Targeted at those interested in natural language processing and AI, the project emphasizes hands-on learning of LLM architecture, pre-training, and fine-tuning. Participants can explore models such as ChatGLM, Llama, and RWKV, enhancing their understanding of various model functionalities and mechanisms.
Logo of ScaleLLM
ScaleLLM
A cutting-edge inference system designed for large language models, utilizing advanced techniques such as tensor parallelism and OpenAI-compatible APIs. It supports leading open-source models like Llama3.1 and GPT-NeoX, aiming for seamless production deployment with high efficiency through tools like Flash Attention and Paged Attention. The system is under active development, introducing enhancements like CUDA Graph, Prefix Cache, and Speculative Decoding. Easy installation via PyPI, offering customization and a flexible server for various tasks, ideal for performance and scalability needs.
Logo of minimal-chat
minimal-chat
MinimalChat is an open-source app supporting multiple language models like GPT-4 Omni, designed for voice interaction and mobile responsiveness. It offers local hosting, ensuring offline access and secure data storage. Features include customization, markdown support, and easy model switching, making it a versatile tool for development and practical applications.
Logo of promptlib
promptlib
Explore the impact of refined prompt engineering for large models such as GPT-4 and ChatGPT. This project demonstrates how well-structured prompts enhance natural language processing and leverage advanced language model capabilities. It aims to develop tools for developers and knowledge workers, laying a groundwork for broader usage.
Logo of Recurrent-LLM
Recurrent-LLM
Learn how RecurrentGPT enhances long-text generation by integrating recurrence from RNNs into large language models like ChatGPT, addressing GPT models' limitations. This innovation allows for the generation of texts of any length through stored language-based memory, facilitating interactive and interpretable text generation. This advancement also positions RecurrentGPT as a key technology in next-generation writing systems and AI As Contents, enabling personalized user interactions.
Logo of MiniChain
MiniChain
Explore a small library that aims to simplify the process of coding with large language models through streamlined prompt chaining. The library supports the creation of applications such as Retrieval-Augmented QA and Chat with memory, requiring minimal code. Compatible with backends like OpenAI, Hugging Face, and Python, it features prompt visualization and integration with external data sources. MiniChain distinguishes itself with the separation of prompt text using Jinja templates and effortless function annotation, ideal for developers focusing on simplicity without losing functionality.
Logo of gpt-neox
gpt-neox
This repository offers a robust platform for training large-scale autoregressive language models with advanced optimizations and extensive system compatibility. Utilizing NVIDIA's Megatron and DeepSpeed, it supports distributed training through ZeRO and 3D parallelism on various hardware environments like AWS and ORNL Summit. Widely adopted by academia and industry, it provides predefined configurations for popular model architectures and integrates seamlessly with the open-source ecosystem, including Hugging Face libraries and WandB. Recent updates introduce support for AMD GPUs, preference learning models, and improved Flash Attention, promoting continued advancements in large-scale model research.
Logo of direct-preference-optimization
direct-preference-optimization
This repository offers a robust implementation of Direct Preference Optimization, including conservative DPO and IPO, to improve language model efficiency. Compatible with HuggingFace models, it facilitates easy dataset integration and supports diverse GPU setups, enhancing supervised fine-tuning and preference learning for scalable training solutions.
Logo of langsmith-sdk
langsmith-sdk
LangSmith SDKs provide tools to debug, evaluate, and monitor language models and intelligent agents. Integrating seamlessly with LangChain's Python and JavaScript libraries, these SDKs support application tracing and performance analysis for any LLM application. Simplify workflows using LangSmith, from the developers of LangChain. Access detailed documentation and tutorials for best practices to fully leverage the LangSmith platform.
Logo of h2o-llmstudio
h2o-llmstudio
H2O LLM Studio is a no-code platform for fine-tuning large language models using an intuitive GUI. It features cutting-edge techniques like Low-Rank Adaptation, supports multiple hyperparameters, and offers model performance tracking through Neptune and W&B integration. Recent enhancements provide robust training and optimization methods.
Logo of RWKV-Runner
RWKV-Runner
The RWKV-Runner project simplifies the use of large language models through automation and a lightweight executable. It is compatible with the OpenAI API, transforming ChatGPT clients into RWKV clients. Notable features include easy model startup, adaptable VRAM settings, user-friendly interfaces, and multilingual support. Additional tools for model conversion, download management, LoRA finetuning, and example server deployments are also available, making it suitable for users seeking efficient model management across various platforms.
Logo of Aquila2
Aquila2
The Aquila2 series includes open-source language models like AquilaChat2, known for its advanced long-text processing, surpassing other models on various benchmarks. With options such as Aquila2-7B and Aquila2-34B, alongside the experimental Aquila2-70B-Expr, the project facilitates finetuning and quantization, accompanied by comprehensive deployment guides for platforms like Hugging Face and ModelHub. This project provides significant improvements in reasoning tasks and long-context comprehension, ideal for complex language application development, with regular updates promoting continuous progress.
Logo of phasellm
phasellm
PhaseLLM is an open-source framework that simplifies the integration and evaluation of large language models such as OpenAI's GPT-3.5, Anthropic's Claude, and Cohere. It offers standardized API interactions, evaluation tools, and automation to optimize model performance in applications like chatbots, with a focus on efficiency and cost-effectiveness for developers and data scientists.
Logo of chrome-ai
chrome-ai
Chrome AI uses the built-in Gemini Nano model to deliver advanced language processing via the Vercel AI platform. The project allows integration for text generation and embedding, with customizable settings for personalized AI experiences. Though still under development, it showcases potential AI advancements in Chrome applications and supports browsers with WebGPU and WebAssembly, offering a versatile approach to AI.
Logo of vercel-llm-api
vercel-llm-api
This reverse-engineered API wrapper allows access to various language models like OpenAI's ChatGPT and Cohere's Command Nightly through Vercel AI Playground. It supports downloading models, text generation, and chat messages customization. While it has limitations like hardcoded user-agents and lacks authentication, the library simplifies model access. Install using 'pip3 install vercel-llm-api', with usage requiring simple Python client setup. Explore models such as Bloom and GPT-3.5 for flexible model interactions without subscription requirements.
Logo of laser
laser
Layer-Selective Rank Reduction (LASER) enhances performance in language models by employing low-rank approximations of weight matrices. This technique optimizes reasoning tasks such as question-answering without further training by targeting specific layers and parameters. The project is under active development, focusing on refactoring for better flexibility and usability. It provides reproducible results across various models and benchmarks while encouraging community contributions and interaction. Core features include efficient hyperparameter tuning and adaptability for different language models.
Logo of chat_gpt_sdk
chat_gpt_sdk
The library enhances Flutter by integrating OpenAI's GPT-3.5 and GPT-4 models, supporting text and chat completions, function calling, and image processing. It facilitates easy management of assistants, threads, and runs, and includes features like message and error handling, translation, and image generation. With a straightforward API, it allows for seamless interaction with OpenAI's language models, suitable for various Flutter applications.
Logo of kani
kani
Kani is a flexible framework optimized for chat-based language models, including both hosted and open-source versions like GPT and LLaMA. It offers robust customization for NLP researchers and developers, facilitating tool integration and function calls with comprehensive control. By managing chat memory and function execution efficiently, Kani allows for seamless incorporation and quick iteration, free from hidden operations. This framework caters to diverse applications ranging from academic studies to industry implementations, presenting a straightforward and adaptable choice compared to more rigid frameworks.
Logo of llama3
llama3
Explore the enhanced capabilities of Llama 3 models, ranging from 8B to 70B parameters, available on Hugging Face. Access model weights, tools, and community scripts for responsible AI innovation, detailed across various repositories with guidance on safe use.
Logo of machine-learning-list
machine-learning-list
The reading list systematically introduces fundamental and advanced machine learning concepts, especially focusing on language models. It serves as a guide to key principles, deployment strategies, reasoning techniques, and AI’s broader implications. Structured in tiers, it balances theory and practical application. Subjects include machine learning basics, transformers, training methods, and applications, with insights into AI safety, economic, and philosophical aspects — ideal for understanding and scaling machine learning models.
Logo of open-interpreter
open-interpreter
Utilize LLMs to execute code locally via a terminal interface, supporting multiple languages for data analysis and content creation. Open Interpreter offers more flexibility than ChatGPT's Code Interpreter by accessing local libraries and the internet. It ensures user-verified code execution, optimizing safety, and efficiency. Integrate effortlessly with development workflows and improve productivity with interactive demo capabilities.
Logo of open-instruct
open-instruct
Investigate the tuning of language models with leading-edge approaches on publicly accessible datasets. This project provides a unified codebase for training and assessing, featuring modern enhancements like LoRA, QLoRA, and efficient parameter updates. Find further insights and advancements through related research publications. The repository contains datasets, evaluation scripts for key benchmarks, and offers models such as Tülu tailored to diverse datasets, facilitating improved language model outcomes. Engage in fine-tuning for instruction adherence, employing advanced practices and reliable evaluation techniques.
Logo of keras-llm-robot
keras-llm-robot
Keras-llm-robot utilizes Langchain and Fastchat frameworks in a Streamlit UI for offline deployment of Hugging Face models, with features like model integration, multimodal support, and customizations including quantization and fine-tuning. It also offers tools for retrieval, speech, and image recognition, plus environment setup guides for multiple OSs, ideal for developers exploring AI model deployment.
Logo of mergekit
mergekit
MergeKit offers an effective solution for merging pre-trained language models with support for algorithms like Linear, SLERP, and Task Arithmetic. It is suitable for resource-constrained settings, functioning on both CPU and GPU with low VRAM requirements. Features include lazy tensor loading and layer-based model assembly. Compatible with models like Llama, Mistral, and GPT-NeoX, it also provides an intuitive GUI on Arcee's platform and supports sharing on the Hugging Face Hub. A versatile YAML configuration enables custom merge strategies.
Logo of instruct-eval
instruct-eval
InstructEval is a platform designed to evaluate instruction-tuned LLMs including Alpaca and Flan-T5, using benchmarks like MMLU and BBH. It supports many HuggingFace Transformer models, allows qualitative comparisons, and assesses generalization on tough tasks. With user-friendly scripts and detailed leaderboards, InstructEval shows model strengths. Additional datasets like Red-Eval and IMPACT enhance safety and writing assessments, providing researchers with in-depth performance insights.
Logo of floneum
floneum
Floneum facilitates local AI application development with Kalosm, a Rust interface for text, audio, and image model processing, offering quantization and acceleration. Floneum Editor allows intuitive design of AI workflows. Support includes models like Llama and Whisper, and tools for context extraction and web scraping. Engage with the community via Discord and GitHub.
Logo of datablations
datablations
Discover strategies for scaling language models in data-limited contexts. This repository includes experiments on data repetition and computational budgets, working with up to 900 billion tokens and models with 9 billion parameters. It offers a scaling law for computational efficiency, considering the decreasing utility of repeated tokens and excess parameters. Methods to address data limitations, such as code augmentation and filtering techniques including perplexity and deduplication, are explained. Access to over 400 training models and datasets is provided, supporting robust language model development in constrained environments.
Logo of bigscience
bigscience
This workshop explores large language models with Megatron-GPT2 architecture through detailed trainings and experiments. It addresses model scaling, training dynamics, and instabilities, supported by extensive documentation and logs. Providing resources like code repositories and training scripts, the project fosters transparency and collaboration within the AI community, guiding toward future advancements in language models.
Logo of DeepInception
DeepInception
Large language models, while successful, face risks from adversarial jailbreaks affecting their safety. DeepInception offers a novel, less resource-intensive method, inspired by the Milgram experiment, to bypass usage controls via personification and nested scenes. This approach highlights vulnerabilities in various LLMs, underlining the need for enhanced safety measures.
Logo of PanelGPT
PanelGPT
Explore a new method for enhancing language model reasoning abilities through 'Panel Discussion' techniques. Taking inspiration from expert panels in conferences, this approach improves understanding and discourse, leading to better results in zero-shot prompting contexts. Evaluations on the GSM8K dataset underscore its effectiveness, establishing it as a superior method over strategies like Chain-of-Thought and Tree-of-Thought. The method's potential covers complex reasoning tasks, providing an efficient solution. Learn how integrating a collaborative discussion framework can enhance AI capabilities.
Logo of LLaMA-LoRA-Tuner
LLaMA-LoRA-Tuner
The tool facilitates LLaMA model evaluation and adjustment with low-rank adaptation (LoRA), featuring a 1-click setup on Google Colab for streamlined training, easy switching among primary base models like 'llama-7b-hf' and 'gpt4all-j', and compatibility with various dataset formats. Recent updates introduce a chat UI and demo mode for innovative model interaction, though the latest version lacks fine-tuning capability. It remains a valuable asset for researchers seeking a versatile and accessible model exploration tool.
Logo of genslm
genslm
GenSLMs utilizes large-scale language models to analyze SARS-CoV-2 evolution through sequence embeddings and synthetic sequence generation. Operating on supercomputers like Polaris and Perlmutter, it uses a hierarchical diffusion model for detailed genomic analysis, supporting efficient genome sequence modeling. The platform enhances research accuracy, serving as a robust tool for advancing virology studies.
Logo of SWE-bench
SWE-bench
SWE-bench is a benchmark for testing language models' abilities to solve real-world GitHub software issues. It provides a containerized evaluation environment using Docker, ensuring repeatable assessments. Recent updates feature SWE-bench Verified, a collection of 500 engineer-confirmed solvable problems. Developed in collaboration with OpenAI, SWE-bench supports reproducible evaluations across different systems. Its resources are designed to help with model training, inference, and task creation, supporting NLP and machine learning applications in software engineering.
Logo of simple-evals
simple-evals
This repository provides a lightweight library for transparent evaluations of language models, emphasizing zero-shot and chain-of-thought methods. It includes benchmark results for models such as GPT-4, using tests like MMLU and HumanEval. The library favors simple, realistic instructions over complex prompting to better gauge real-world performance. While not actively maintained, it allows for updates such as bug fixes and new models. The setup supports OpenAI and Anthropic APIs for efficient, adaptable evaluations.
Logo of readme-ai
readme-ai
Boost development efficiency with an AI tool that automatically creates detailed README files, supporting numerous programming languages and customizable settings. Compatible with models like OpenAI and Google Gemini, it offers offline operation to meet various project documentation needs.
Logo of opencommit
opencommit
OpenCommit streamlines version control by auto-generating meaningful commit messages using AI. It supports GPT-4, offers easy setup, GitMoji integration, and language customization, and is compatible with providers like OpenAI, Azure, and Ollama.
Logo of LLM-Kit
LLM-Kit
This open-source project provides a versatile WebUI toolkit designed to manage language model workflows effortlessly. Users can create custom models and applications without coding, in environments like Python and CUDA. The toolkit features robust modules, including APIs for prominent language models such as OpenAI and Baidu's Wenxin Yiyan. It supports functionalities including chat, image generation, dataset processing, and embedding models. Key features include role-play settings with memory and background libraries, and compatibility with large-scale models like ChatGLM and Phoenix-Chat. Operating under the AGPL-3.0 license, it encourages community involvement and shared development.
Logo of FastEdit
FastEdit
FastEdit provides a quick solution for injecting new information into large language models efficiently with just one command. It supports models such as GPT-J, LLaMA, and BLOOM, allowing for updated outputs. The tool requires Python 3.8+ and PyTorch 1.13.1+, leveraging Rank-One Model Editing for enhanced performance. Easy data preparation and installation enable effective model editing to maintain accuracy and relevance in multilingual contexts.
Logo of LLM-Zoo
LLM-Zoo
This project details the release and characteristics of global large language models, providing a valuable resource for both open-source and closed-source LLMs developed after ChatGPT. It gathers essential data such as model sizes, supported languages, domains, and training datasets, alongside links to GitHub repositories, HuggingFace models, and academic publications. Regular updates keep users informed, with an invitation for contributions to enhance this dataset. Ideal for researchers and developers interested in the dynamics of natural language processing models.
Logo of GPT-Jailbreak
GPT-Jailbreak
Access a repository with straightforward instructions for modifying language models such as GPT-3 and GPT-4. Discover how to personalize these models for enhanced AI functionality, without the need for installations, and contribute to improving their capabilities with community input.
Logo of LLM.swift
LLM.swift
LLM.swift is a lightweight library offering interaction with large language models on macOS, iOS, watchOS, tvOS, and visionOS. It supports developers focusing on performance and ease of integration with Swift projects. With options for using HuggingFace models, it balances speed and stability. The library features customizable preprocess, postprocess, and update functions, providing precision and control for AI integration.
Logo of catalyst
catalyst
Explore a fast and versatile C# Natural Language Processing library offering efficient non-destructive tokenization, flexible entity recognition, and reliable language detection. Catalyst facilitates FastText and StarSpace embeddings training, with readily available pre-trained models. Compatible with Windows, Linux, and macOS, it offers robust tools for semantic analysis. It suits projects requiring quick processing, aligning with .NET standard 2.0 for smooth pipeline integration.
Logo of helm
helm
Stanford's CRFM-HELM project presents a framework for evaluating language models, including datasets such as NaturalQuestions and models like GPT-3. It expands evaluations beyond accuracy to metrics such as efficiency and bias, assesses robustness through perturbations, and offers access through a modular API and proxy server. The project also explores vision-language and text-to-image model evaluations with reliable findings. Comprehensive documentation supports effortless installation and use by researchers assessing language models.
Logo of codellama
codellama
Discover Code Llama, advanced language models based on Llama 2, designed to facilitate coding with features like code infilling and zero-shot learning. Models are available for both Python and general applications, ranging from 7B to 34B parameters and support up to 100K tokens. This offering is suitable for both individuals and businesses, providing models for diverse use cases with essential safety measures. Access resources and initial code to explore these pretrained and fine-tuned models effectively.
Logo of OLMo-Eval
OLMo-Eval
OLMo-Eval, an evaluation framework for language models, leverages task sets for metric computation on NLP tasks. Built with ai2-tango and ai2-catwalk, it offers adaptable evaluation and integrates with Google Sheets for reporting. Deployment is simple through command line, supporting diverse models and datasets, facilitating ongoing development and analysis. Suited for benchmarking a variety of language models on standard tasks.
Logo of EasyContext
EasyContext
This project demonstrates how established methods can expand language models to manage contexts as long as 1 million tokens using efficient strategies such as sequence parallelism, Deepspeed zero3 offload, and flash attention. It delivers comprehensive training scripts, supports various parallel approaches, and highlights significant improvements in both perplexity and 'needle-in-a-haystack' evaluations for Llama2 models.