en

#Large Language Model

LLMs-from-scratch

This comprehensive guide covers the entire process of building a GPT-like large language model, starting from coding the basics. It provides step-by-step instructions with clear explanations and examples, making it a valuable resource for understanding model development, pretraining, and finetuning techniques. The guide parallels techniques used in technologies like ChatGPT and includes information on loading and refining larger pre-trained models. Access the official code repository for updates and additional resources.

Gorilla project facilitates the integration of over 1,600 APIs with large language models, aiming for accurate API calls derived from natural language queries. It aims to reduce errors and broaden API support, pushing open-source language models toward greater efficiency. Tools like GoEx ensure safe execution while Gorilla OpenFunctions provide alternatives for function invocation. The project, under Apache 2.0, encourages developers to integrate and expand its API ecosystem, promoting community involvement in enhancing API use within language models.

Examine the role of AI-powered models in psychological counseling and their enhancements in understanding and support. EmoLLM offers a structured approach combining cognitive, emotional, and behavioral insights with social and physiological components. Stay updated with the latest model configurations and tools for advancing mental well-being. Access comprehensive resources for implementing and innovating AI-driven mental health strategies.

AICI provides a framework for developers to create versatile and secure controllers to efficiently manage and refine the output of large language models (LLMs). It simplifies the development of controllers, offering compatibility with diverse inference engines like llama.cpp and HuggingFace Transformers. These controllers are implemented as WebAssembly modules that run concurrently with the LLM engine, ensuring minimal overhead and enhanced performance. Experience flexible experimentation and tailored solutions with upcoming multi-tenant deployments prioritizing security and efficiency.

searchGPT is an open-source search engine utilizing RAG and large language models for real-time language processing. Key features include web and file content search with semantic engines like FAISS and pyterrier. API integrations include OpenAI and GooseAI, complemented by a user-friendly interface. Access the demo at searchgpt-demo.herokuapp.com. Python 3.10.8 and relevant API keys are required for setup. Contributions welcome under the MIT License.

BLIVA offers a streamlined approach to handling visual questions abundant in text, securing significant rankings in both perception and cognition tasks. Featuring models that are commercially and openly accessible, BLIVA demonstrates high efficacy in multiple VQA benchmarks, providing precise insights across varied datasets.

azure-openai-samples

Explore the resources available for understanding GPT basics and its applications with Azure's offerings. Learn to integrate GPT with services such as Synapse Analytics for NLP and Business Process Automation. Access practical samples including serverless SQL and OpenAI-powered semantic search. Stay informed about the latest advancements including GPT-4 and contribute to the ongoing development. This is ideal for developers and organizations looking to leverage AI in diverse sectors such as chatbots, customer service, and content creation.

TransNormerLLM's linear attention architecture offers improved accuracy and efficiency compared to traditional methods. Utilizing a corpus of 1.4 trillion tokens, it allows experimentation in various languages and domains. The open-source model provides weights and extensive fine-tuning options for academic use, with available base versions of 385M, 1B, and 7B parameters. Continuing development suggests expanding capabilities, highlighting its significant impact on AI evolution.

BayLing utilizes advanced language model capabilities to enhance cross-lingual communication by optimizing translation and instruction following. Suitable for deployment on consumer-grade GPUs, BayLing supports English and Chinese text creation, translation, and interaction. The latest version, BayLing-13B-v1.1, includes expanded Chinese linguistic knowledge, improving the evaluation and application of large language models in various scenarios. Try the online demo or local GUI for efficient cross-language translation and interaction.

Awesome-System-for-Machine-Learning

Discover a curated compilation of resources for machine learning systems, covering infrastructure elements like training and inference systems, as well as specialized areas such as edge AI, federated learning, and reinforcement learning. Access useful tools, educational content, and video tutorials on ML, LLM, and GenAI without biases. The project is well-maintained, inviting community contributions to broaden its scope. Keep abreast with emerging trends and a newly launched website dedicated to AI infrastructure.

Baichuan-13B is a scalable open-source language model with 130 billion parameters, excelling in Chinese and English benchmarks. It offers both pre-trained and alignment models for versatile applications. Supporting ALiBi positional encoding and bilingual text, it provides deployment solutions with int8 and int4 quantization, reducing resource needs. Available for academic and commercial use upon request, Baichuan-13B enhances efficient inference in language modeling.

The KnowLM framework aids in the development of informed Large Language Models, emphasizing data processing, pre-training, fine-tuning, and knowledge enhancement. The model zoo includes adaptable models such as ZhiXi and OneKE for straightforward implementation. Core features entail instruction handling through EasyInstruct, knowledge modification with EasyEdit, and hallucination identification via EasyDetect. Regular updates in model weights ensure support for ongoing advancements accessible via HuggingFace, suitable for users focused on extracting information and knowledge.

Salesforce AI's CodeGen series offers scalable models optimized for program synthesis, featuring up to 16 billion parameters. Latest iterations like CodeGen2.5 demonstrate superior performance despite smaller size, and are available on Hugging Face Hub. With support from the Jaxformer library, these models offer streamlined data processing and training, marked by strong peer-reviewed credentials.

Chinese-Mixtral

Discover Mixtral models tailored for Chinese language with enhanced architecture for effective long-text processing. The collection includes a base model amplified for Chinese and an Instruct model for interactive tasks. With native 32K context length, extendable to 128K, the models are ideal for tasks needing deep context, such as math reasoning and code generation. Offering open-source scripts for training and fine-tuning, users can easily adapt or develop custom models. Streamlined for integrations like transformers and llama.cpp, it facilitates quantification and deployment on local devices.

LISA employs a large language model to enhance segmentation tasks, particularly in reasoning segmentation through complex and implicit queries. It uses a detailed benchmark of image-instruction pairs, encompassing extensive world knowledge to provide detailed answers and supports multi-turn dialogue. Demonstrating strong zero-shot learning ability, LISA performs well on datasets without reasoning data, and fine-tuning even with limited data boosts its performance. LISA achieved notable recognition at CVPR 2024. Discover LISA's efficiency through our online demo.

Discover a versatile Node.js library that facilitates inferencing with numerous large language models such as LLaMA, Alpaca, and RWKV. Based on reliable tools like llama.cpp and rwkv.cpp, it offers support for various platforms including Darwin, Linux, and Win32. Featuring straightforward npm installation and broad platform compatibility, this library is ideal for developers seeking to incorporate AI models into their projects. Although in early development, the library promises regular updates and enhancements. Explore installation options, backend selections, and engage with a vibrant community for collaborative growth.

Repochat is a chatbot leveraging a Large Language Model to enable conversation about GitHub repositories. It supports both local and cloud-based installations, providing flexibility for different requirements. Features like file fragmentation, embedding processes, and memory retention offer deep engagement with repository content. Setup involves configuring a Python environment and integrating with models such as CodeLlama.

Awesome-AGI-Agents

Discover an extensive collection of AI agent resources, featuring articles, videos, and innovative projects utilizing LLMs. This list includes insightful papers and advanced frameworks for building autonomous agents. Keep informed on the evolving landscape and find key initiatives like Auto-GPT and MetaGPT, along with tools from LangChain and AutoChain. Explore AI-driven solutions applicable to various platforms and sectors.

LLMBook-zh.github.io

The book provides a detailed overview of large language models, focusing on core principles, major technologies, and future uses. It examines the evolution of these models, emphasizing OpenAI's GPT advancements. Highlighting the need for transparency, it tackles challenges academic researchers face due to limited resources. Designed for students with deep learning expertise, it includes tools like LLMBox and YuLan for model development while encouraging community engagement through various channels.

llm-verified-with-monte-carlo-tree-search

Investigate the approach of verified code generation utilizing Monte Carlo Tree Search and language models. This method employs verifiers at each stage, integrating tools such as Dafny, Coq, Lean, Scala, and Rust. It allows less advanced models to perform competitively against more robust ones, complete with comprehensive setup and execution guides. The project's reliance on GPU and its application across various languages ensure the synthesis of effective multi-step programs with precise verification, suited for developers focused on code validation and synthesis.

This course provides a detailed guide for creating a Storytelling AI from the ground up. It covers Large Language Models (LLMs) and deep learning topics such as language modeling and transformer architecture through practical projects. Utilize Python, C, and CUDA to build an AI application akin to ChatGPT, suitable for learners with basic computer science knowledge. This comprehensive approach imparts practical AI development skills.

Explore Motif's unique method of using a Large Language Model to define reward functions for AI agent training in NetHack. This method features a three-phase process: dataset annotation, reward training, and reinforcement learning, transforming LLM preferences into intrinsic agent motivation. Discover intuitive, human-aligned AI behaviors guided by customizable prompts and gain insights into Motif's capabilities for feedback-driven intrinsic rewards in reinforcement learning.

The project presents an innovative AI platform for generating detailed vlogs from user inputs, employing a Large Language Model in an oversight role. The approach divides vlog creation into distinct phases including scripting, acting, videography, and narration, utilizing tailored models to maintain narrative integrity and visual quality. Featuring the new ShowMaker model, it enhances the spatial-temporal alignment between script and visuals. Comprehensive evaluations demonstrate the platform's capability to produce coherent, extended vlogs, pushing forward zero-shot video generation benchmarks.

Discover the AcmeTrace public releases from Shanghai AI Lab, featuring workload data from March to August 2023. Intended for academic research, these datasets are analyzed in the NSDI '24 paper and include comprehensive insights into job submissions, GPU utilization, and resource usage in data centers. While GitHub offers a preview, full data files totaling 80GB can be accessed via HuggingFace. Understand the nuances of AI model development and deployment by exploring detailed schemas and visualization examples.

Towhee enhances unstructured data processing by leveraging LLM-based orchestration, converting text, images, audio, and video into efficient database-ready formats such as embeddings. It supports multiple data modalities and provides comprehensive models across CV, NLP, and additional fields. Offering prebuilt ETL pipelines and efficient backend support using Triton Inference Server, Towhee's Pythonic API allows for the easy development of custom data workflows. Streamline data operations for production environments with Towhee's adaptable and scalable technology.

Awesome-Multimodal-LLM

This article examines multimodal learning facilitated by large language models (LLMs), focusing on diverse modalities such as text, vision, and audio. It underscores the role of open-source, research-supportive LLM backbones like LLaMA, Alpaca, and Bloom and reviews various learning techniques including fine-tuning and in-context learning. Examples of models such as OpenFlamingo and MiniGPT-4 are discussed alongside evaluation methods like MultiInstruct and POPE. The article highlights key research advancements from 2021 to 2023, offering insights into projects enhancing LLM visual and language processing capabilities. It also provides resources and guidelines for contributors to encourage ongoing exploration and progress in LLM-guided multimodal learning.

Created by Alibaba's Foundation Model Inference Team, the rtp-llm inference engine is engineered for high-performance acceleration of large language models across Alibaba platforms such as Taobao and Tmall. It features optimized CUDA kernels and broad hardware support, including AMD ROCm and Intel CPUs, and integrates seamlessly with HuggingFace models. The engine supports multi-machine, multi-GPU parallelism and introduces features like contextual prefix caches and speculative decoding, enhancing deployment efficiency on Linux with NVIDIA GPUs. Explore its proven reliability and broad usage in Alibaba's AI projects.

PowerInfer, a high-speed inference engine for Large Language Models, leverages consumer-grade GPUs for enhanced performance. Utilizing activation locality and a hybrid CPU/GPU model, it optimizes resource demands while maintaining efficiency. PowerInfer offers up to 11 times faster performance than llama.cpp, generating an average of 13.20 tokens per second, with peaks of 29.08 tokens per second, nearly matching professional servers. This architecture incorporates adaptive predictors and sparse operators, facilitating integration, backward compatibility, and efficient deployment on models like Falcon-40B and Bamboo-7B.

LLMFlows is a framework that enables the development of straightforward and well-structured large language model applications such as chatbots and question-answering systems. It focuses on providing transparent operations and full user control without hidden prompts. With minimal abstractions, LLMFlows supports features like customizable prompt templates, efficient LLM interaction structuring, and async operations for better performance. The framework also integrates with vector databases to facilitate efficient data handling. Its core principles of simplicity, explicitness, and transparency ensure ease in monitoring, maintenance, and debugging.

This model, developed by Wenge Research, is a multilingual large language model utilizing over 2 trillion tokens in pre-training. It is optimized for general and specialized uses with millions of fine-tuning instructions and human feedback reinforcement learning to align with human values. The model offers enhancements in language understanding, reasoning, and code generation, exceeding the performance of similar-sized open-source models. Discover more through the detailed technical report and join the community in advancing the open-source pre-training model ecosystem with this 30B parameter innovation.

Graphologue transforms large language model text responses into interactive diagrams, making complex information easy to handle. The system extracts and visualizes key elements and relationships in real-time, enabling users to engage in flexible, graphical dialogues. This allows efficient information organization and comprehension, surpassing text-only interactions. Visit the official website for a live demo and details.

ShenNong-TCM-LLM

This project integrates traditional Chinese medicine knowledge into AI by developing a specialized large language model. The ShenNong-TCM model is trained on the ShenNong_TCM_Dataset and utilizes entity-centric self-instruct methods to enhance medical consultations. Built on the LlaMA foundation with LoRA tuning, it provides detailed advice in TCM contexts. Explore its performance enhancements in TCM dialogue scenarios, with a focus on academic research and non-commercial use.

Discover a wide array of models in software engineering alongside related academic papers, routinely updated through a specialized literature search engine. The collection features various code models, categorized by popularity and detailed statistical insights. It includes recent preprints and detailed analyses, offering essential resources for researchers and industry experts. Maintain an up-to-date knowledge base in the dynamic field of software engineering with this meticulously organized repository.

MiniGPT-4 employs extensive language models to advance vision-language comprehension by creating a cohesive platform for a variety of tasks. It backs complex applications like image captioning and diagnostic interaction, underlining enhancements in understanding tasks. Available in versions like Vicuna V0 and Llama 2, it provides flexible uses in research and practical projects. Discover its features through online demos and community-driven programs, broadening its application across multiple sectors.

Transformer-from-scratch

This demo provides a simple introduction to training a Large Language Model with PyTorch, encapsulated in around 240 lines of code. Taking inspiration from nanoGPT, it demonstrates the training of a 51M parameter model on a 450Kb dataset. Suitable for beginners, this guide includes step-by-step instructions and additional materials that help in understanding transformer-based models. Explore hyperparameter optimization, visualize the training outcomes, and generate text with included examples, all designed for those interested in learning language model architecture from the ground up.

A-Guide-to-Retrieval-Augmented-LLM

This guide provides a thorough exploration of Retrieval Augmented Large Language Models (LLMs), focusing on alleviating common issues such as hallucinations and outdated information. It examines how integrating LLMs with external retrieval techniques can enhance accuracy and address challenges related to data freshness. It also details core concepts, implementation strategies, and potential applications. By enhancing LLMs' abilities with long-tail knowledge and private data, and improving their source-traceability, this guide provides useful insights for developing efficient retrieval-augmented AI systems. It highlights key components such as data management, indexing, and retrieval processes.

Huozi offers notable advancements in language processing with its sparse mixture of experts (SMoE) architecture, enabling efficient handling of extended contexts. Designed for use in both academic and industrial settings, it features enhancements such as multilingual knowledge integration and refined reasoning capabilities. The model's release comes with various checkpoints and broad platform support, allowing comprehensive deployment and performance acceleration across systems like Transformers and ModelScope.

llm_training_handbook

This handbook provides methodologies for engineers involved in large language model training, featuring scripts and commands to streamline problem-solving. Focus is placed on model parallelism, throughput maximization, tensor precision, and hyper-parameter tuning. Aimed at technical professionals, it acts as a valuable resource for enhancing training efficiency. For more conceptual insights, see the Large Language Model Training Playbook. The content is available under Attribution-ShareAlike 4.0 International and Apache License 2.0.

Odyssey equips Minecraft agents with open-world skills using a language model. It includes a skill library, a refined LLaMA-3 model, and a benchmark system to enhance autonomy in various tasks. The available datasets and code promote further innovation in agent technology.

TigerBot is an advanced foundation model designed to bolster innovation in China through its sophisticated language model capabilities. Recent updates include support for longer context lengths, enhanced search functionalities, and the ability to engage third-party APIs using function calling. These improvements position TigerBot competitively against both previous models and other market leaders in Chinese and English assessments. The ongoing development aims to aid educational and scientific sectors by offering strong model infrastructures and API options, promoting comprehensive AI research and development.

Chat-Haruhi-Suzumiya

This project employs large language models to bring anime characters to life through realistic chat interactions, accurately imitating their voices, personalities, and narratives. Featuring zero-shot role-playing with the Haruhi Suzumiya model version 0.3, it supports 32 characters with a 54,000-entry dataset and includes continuous development for ChatHaruhi 2.0. This open-source project permits commercial use, bound by relevant licensing agreements, while adhering to character and API provider compliance.

Learn about HealthGPT, an open-source iOS app using the Stanford Spezi framework designed for managing Apple Health data via natural language queries. Developed by Stanford Biodesign Digital Health, it provides a chat interface with speech-to-text and text-to-speech accessibility, supports local execution to enhance privacy, and integrates effectively with the Apple Health app. Users can interact with data such as sleep patterns and physical activities using advanced GPT-3.5 and GPT-4 models. Downloadable via TestFlight, HealthGPT is tailored for personalized data interaction while prioritizing extensibility and user privacy.

SECap provides insights into speech emotion captioning by leveraging large language models to enhance accuracy and relevance. The repository contains model code, scripts for training and testing, and a dataset of 600 audio files with emotion descriptions. It offers pretrained models and weights for inference and evaluation of description similarity with ground truth, serving as a comprehensive resource for emotion analysis research.

The repository provides a comprehensive collection of over 300 papers on time series and spatio-temporal forecasting, categorized by model type. It is regularly updated with the latest studies from leading conferences, journals, and arXiv, supporting various kinds of forecasting such as univariate, multivariate, and spatio-temporal. It explains complex concepts and how deep learning affects model flexibility, and explores emerging subjects like irregular time series and recent innovations like the Mamba model. Contributions of relevant papers are welcome to further enrich this forecasting research resource.

The project introduces a text-to-image diffusion process using a large language model that enhances semantic comprehension and a diffusion-based model for drawing. Comprising an LLM-based planning component and diffusion model, the system accurately aligns with text prompts in two phases. Listed as a CVPR 2024 oral paper, the package includes model weights such as a LoRA-finetuned LLaMa-2-7B and fully-finetuned SDv2.1. Users can explore image creation interactively through Gradio demos and apply continuous edits for targeted image changes.

Explore a carefully curated list of top LLMOps tools, vital for developers working with Large Language Models and CV Foundation Models. The list includes resources for model training, fine-tuning, and serving, along with security and management tools for scalable deployment. It also offers frameworks that support efficient AI operations, aiding technological advancements. This guide is key for developers aiming to leverage the latest in AI model serving, monitoring, and optimization methods.

Terms of Use Privacy Policy Advertising Services

Feedback Email: [email protected]