en

#language model

chatgpt-mirai-qq-bot

This versatile chatbot supports multiple AI models and messaging platforms, offering features such as image sending, keyword-triggered responses, and content moderation. It's compatible with platforms like Mirai, go-cqhttp, Telegram, Discord, and WeChat, and provides API services as an HTTP server. With notable support for AI models like ChatGPT, Bing Chat, and ChatGLM-6B, it's designed for diverse user needs including text RPGs and life assistance.

Explore a language model with 1.9 billion parameters and 32K context support, designed for efficient long-document processing. It excels in multilingual translation, particularly in East Asian languages, and offers role-playing capabilities. Recent updates include adaptations for llamacpp and Ollama, with open-source checkpoints available. Evaluation results demonstrate its competitiveness with larger models, making it ideal for chat, translation, and fine-tuning applications.

Dolly, a large language model by Databricks with 12 billion parameters and based on EleutherAI's Pythia-12b, is commercially licensed. Fine-tuned on around 15,000 instruction-response pairs, Dolly excels in instruction adherence. Although not a state-of-the-art model, it targets accessibility and AI democratization. Challenges such as managing complex prompts and factual accuracy remain, with ongoing improvements. Available on Hugging Face, Dolly facilitates straightforward inference and training on diverse GPU configurations.

AgentLego is an open-source library providing versatile tool APIs that expand and enhance large language model (LLM) agents. It includes a variety of multimodal tools such as visual perception, image generation, and speech processing. These tools are easily integrable with custom interfaces and support remote access for computationally intensive applications. Integration is seamless with popular frameworks like LangChain, Transformers Agents, and Lagent. Explore these tools to boost the capabilities of your LLM-based projects.

SkyText-Chinese-GPT3

Discover a versatile Chinese language model equipped for tasks such as chat, Q&A, translation, and content creation. Utilizing advanced Chinese encoding and comprehensive data cleaning, it excels in language processing and generation, from poetry to interview questions. Join the developer community and explore its features and real-world applications through live demos on the official platform.

Awesome-ChatGPT-Prompts-CN

This guide presents a comprehensive overview of ChatGPT's capabilities across different domains, including software development, content creation, and assistance tasks. It provides effective strategies for using ChatGPT to answer queries and generate content, while offering tips for accessing the service from unsupported regions. Additionally, it highlights third-party projects for enhanced functionality. The guide aims to enhance user experience and maximize the impact of AI solutions.

BERTweet is a pioneering language model pre-trained on a large scale for English Tweets using the RoBERTa method. It utilizes a comprehensive dataset of 850 million tweets, including 5 million related to COVID-19, to enhance performance in NLP tasks. The model can be used with frameworks like `transformers` and `fairseq`, offering pre-trained models such as `bertweet-base` and `bertweet-large`, suitable for deep learning applications. It features effective tweet normalization, facilitating refined text analysis and predictions, supporting research and practical usage.

Functionary, a versatile language model, efficiently interprets and executes functions/plugins using JSON Schema Objects. Supporting both serial and parallel execution, it triggers only necessary functions. Highlights include 128k-context models and precise grammar sampling for function names. It seamlessly integrates with serverless systems and supports OpenAI-compatible requests, making it a reliable choice for developers focusing on accurate tool execution and multi-turn interactions.

Investigate an extensive multi-round dialogue dataset developed to enhance chat model training, providing large-scale and varied interactions to meet complex dialogue needs. It highlights language models such as UltraLM-13B known for their conversational competence and leading ranks among open-source models. Continuous updates and releases advance dialogue systems, enhancing AI's comprehension, reasoning, and creativity in diverse applications.

ChatLM-mini-Chinese

The project focuses on training a compact 0.2B parameter Chinese generative language model suitable for environments with limited computational resources. The model training is feasible with just 4GB GPU and 16GB RAM, supporting extensive methods like data cleaning, tokenizer training, SFT fine-tuning, and RLHF optimization using open-source datasets. The Huggingface frameworks such as transformers and accelerate assist in the process. Further, the project facilitates uninterrupted training continuation and offers support for downstream task fine-tuning, with regular updates enhancing its utility for researchers in scalable language model implementations.

Discover the efficiency and scalability of the RWKV-6, an RNN language model that competes with transformer models in quality and performance, reducing VRAM usage and increasing speed. ChatRWKV offers cutting-edge demos and multi-platform inference options, supported by Stability EleutherAI. This model provides rich community resources and developer tools, including GPU optimization and various project backends, suitable for developers involved in innovative AI research or constructing effective chatbot systems.

AnyGPT is a versatile model handling speech, text, images, and music through discrete representations, enabling smooth conversions. Utilizing the AnyInstruct dataset, it supports tasks like text-to-image and text-to-speech and showcases advanced data compression within generative training. This approach unlocks new capabilities beyond traditional text-only models.

VisualGLM-6B is a multi-modal dialog language model supporting images, Chinese, and English, based on ChatGLM-6B with 7.8 billion parameters including visual capabilities from BLIP2-Qformer. The model achieves visual-linguistic interoperability and can be deployed on consumer GPUs by using quantized accuracy. It is pre-trained on 330 million captioned images, optimizing alignment across languages while adhering to open-source protocols. Limitations include image specificity and potential model hallucinations, with plans for future improvements.

The project ports RWKV language model architecture to ggml, supporting FP32, FP16, and various quantized inferences like INT4, INT5, and INT8. Primarily CPU-focused, it includes both a C library and a Python wrapper, with optional cuBLAS support. It supports RWKV versions 5 and 6, providing competitive alternatives to Transformer models, especially for extensive contexts, and accommodates LoRA checkpoint integration, offering detailed performance metrics for efficient computations.

This open-source language model utilizes 2.6 trillion high-quality tokens to deliver top performance in Chinese, English, and multilingual benchmarks. With Baichuan2-13B-Chat v2, it excels in mathematical and logical reasoning. Available in 7B and 13B, both Base and Chat editions are offered for academic research and free commercial use upon official approval. Access detailed technical insights and download links for the latest versions.

LM-PPL: An Efficient Tool designed to calculate text perplexity using pre-trained language models such as GPT, BERT, and T5. It aids in assessing text fluency by computing ordinary perplexity for recurrent models and decoder perplexity for encoder-decoder models, while utilizing pseudo-perplexity for masked models. Suitable for a range of applications like sentiment analysis, LM-PPL helps select texts with lower perplexity, ensuring better model predictions. Installable via pip, it offers a user-friendly way to leverage popular models for varied text evaluation needs.

awesome-ChatGPT-resource-zh

Find curated resources and guides for OpenAI's ChatGPT and GPT-3 in Chinese. Access community discussions, key research papers, and enhance interactions with prompt examples. Stay updated with China's AI models and insights from global technology firms like Google and Microsoft. Explore API tools and applications to expand ChatGPT usability across platforms.

EduChat is a sophisticated chatbot system designed for intelligent education, leveraging pre-trained large-scale language models fine-tuned with a variety of educational data. It offers services such as automated assignment grading, emotional support, tutoring, and exam guidance to improve personalized education. Developed by the EduNLP team at East China Normal University, the project focuses on aligning educational values and providing comprehensive educational tools. Its features cater to teachers, students, and parents, promoting fair and engaging education.

The Lemur project offers an open source language model that combines natural language understanding with coding capabilities, providing a strong foundation for language agents. By harmonizing language and coding, the model performs well across benchmarks, allowing agents to execute tasks effectively. Explore models such as OpenLemur/lemur-70b-v1 and OpenLemur/lemur-70b-chat-v1 for advanced applications and regular updates. Review integration options and deployment strategies within diverse interactive environments.

DeepSeek LLM offers a cutting-edge language model trained on a vast dataset of 2 trillion tokens in English and Chinese. It is open-source and available for research purposes, exceeding the capabilities of Llama2 70B Base in reasoning, coding, math, and understanding of the Chinese language. The 67B Chat model performs better than GPT-3.5 specifically in Chinese language proficiency. Focusing on comprehensive data richness and privacy, the project improves benchmarks in multi-choice questions and generalizes effectively, scoring highly on diverse evaluations like coding and math exams.

Llama3-Chinese-Chat

Explore a refined language model crafted for Chinese-English use, featuring enhancements in roleplay, function calling, and math. With significantly larger datasets, it enhances performance and minimizes language mixing. Offered in multiple formats such as GGUF and BF16 for broad compatibility. Find detailed guidance for reproducing and integrating the model on Hugging Face and Ollama. Stay informed about the latest updates and enhancements in Llama3 releases.

Explore an innovative language modeling approach with image-based text processing, removing fixed vocabulary limitations. This approach enables smooth language adaptation across different scripts. Pretrained with 3.2 billion words, this model surpasses BERT in handling non-Latin scripts. Utilizing components like a text renderer, encoder, and decoder, it reconstructs images at the pixel level, enhancing syntactic and semantic tasks. Access detailed pretraining and finetuning guidelines via Hugging Face for enhanced multilingual text processing.

FastChat is a platform for training, serving, and evaluating chatbots using large language models. It supports various models and enhances chatbot interactions with a distributed serving system and OpenAI-style APIs. FastChat powers Chatbot Arena, handling over 10 million requests and maintaining an LLM Elo leaderboard through extensive human feedback. Recent updates, including Vicuna v1.5 and the LMSYS-Chat-1M datasets, keep FastChat aligned with modern chatbot development needs.

Discover the MatMul-Free LM, a groundbreaking architecture that removes matrix multiplication, optimized for the Transformers library. Leveraging efficient ternary weights, it outperforms traditional models such as Transformer++ in computational efficiency. This model ranges from 370M to 2.7B parameters, ensuring easy integration with PyTorch, Triton, and einops for seamless language model deployment.

Cheshire Cat is a framework designed to facilitate the creation of bespoke AI solutions across different language models, akin to how WordPress or Django serve web developers. It encompasses features like an API-first architecture for seamless conversational layer integration, memory retention for contextual conversations, and plugin extensibility. Compatible with Docker and an intuitive admin interface, it supports major language models such as OpenAI and Google, making it a robust choice for AI development. Discover its capabilities and connect with a vibrant community for collaboration and growth.

Investigate an innovative approach using Generative Representational Instruction Tuning (GRIT) to seamlessly integrate generative and embedding tasks. GritLM models lead the field on the Massive Text Embedding Benchmark, surpassing competitors in generative performance and improving Retrieval-Augmented Generation by more than 60%. Access all necessary resources to replicate study methods and participate in AI developments on this platform. Explore models, code, and extensive materials available for free on GitHub for superior text processing solutions.

ERNIE SDK includes ERNIE Bot Agent and ERNIE Bot, providing a solid AI model development platform. Featuring function-calling capabilities, ERNIE Bot Agent enables seamless tool orchestration and automatic scheduling, supporting flexible integration of tools, plugins, and knowledge bases. It offers pre-set tools and no-code interfaces, making AI application development accessible and straightforward for developers. ERNIE Bot offers backend support for robust AI functions like text generation, dialogue, semantic vectors, and AI sketching, enabling efficient use of AI capabilities.

xlang-paper-reading

XLang advances language model agents to execute real-world tasks via language instructions, enhancing interactions in databases, web applications, and robotics. Explore our curated collection of papers on LLM integration, code generation, and more, all focusing on enhancing human-computer interaction through natural interfaces.

AI00 RWKV Server is an inference API server for the RWKV language model supporting Vulkan GPUs, eliminating the need for pytorch or CUDA. The compact design supports AMD and integrated graphics, and aligns with OpenAI ChatGPT API for applications like chatbots, text generation, translation, and Q&A. Open-source under the MIT license, it offers a streamlined LLM API experience.

TeleChat, a semantic large language model developed by China Telecom's AI unit, includes the TeleChat-1B, 7B, and 12B models which are open-source and trained on vast multilingual data. TeleChat-12B features improvements in structure and training that enhance performance in areas like Q&A, coding, and mathematics without exaggeration. The models support advanced deep learning techniques and excel in reasoning, understanding, and long-text generation for a range of uses.

Openlogprobs is a Python API that uses specific algorithms to uncover log-probabilities from language model APIs, often hidden due to security reasons and data size limitations. Utilizing logit bias adjustments and methods like binary search, it efficiently extracts full probability vectors from APIs such as OpenAI's. It offers methods like 'topk', 'exact', and 'binary search' based on API capabilities, making it ideal for researchers and developers interested in understanding AI model outputs. Besides, it contributes to notable academic papers like 'Language Model Inversion'.

AutoWebGLM enhances web navigation with the ChatGLM3-6B model, featuring HTML simplification and hybrid AI-human training for better browsing comprehension. It employs reinforcement learning to optimize real-world tasks, supported by the AutoWebBench bilingual benchmark. Open evaluation tools offer robust frameworks for testing and improving the agent's efficiency in web interactions.

mixture-of-experts

Discover the Pytorch implementation of Sparsely Gated Mixture of Experts intended to enhance language model capacity by increasing parameters without additional computation. This version adds features to the original TensorFlow model, supporting complex architectures such as hierarchical mixtures, and enables customization of expert networks with various activation functions and gating policies. Suitable for developers who wish to scale models effectively while maintaining performance, it includes setup and usage instructions for easy integration.

This open-source project introduces a commercially viable language model with 7 billion parameters based on the transformer architecture. It's optimized for both Chinese and English, demonstrating superior performance on benchmarks like C-Eval and MMLU. With 1.2 trillion tokens and a context length of 4096, the model employs advanced tokenization to enhance language compression efficiency and computational throughput. Compatible with Hugging Face and other platforms, this project provides a comprehensive training guide.

MING is an open-source Chinese medical model that provides detailed answers for medical inquiries, featuring intelligent diagnostics. Developed by Shanghai Jiao Tong University, it supports diverse applications from 1.8B to 14B parameters, using state-aware simulations and decoupled clinical alignments. Just 15GB GPU is needed for straightforward setup and usage, ensuring smooth user interaction.

Chinese-LLaMA-Alpaca-3

Based on the latest Meta Llama-3 model, this project provides an open-source platform for crafting high-efficiency Chinese language models. The releases feature the Llama-3-Chinese base and Llama-3-Chinese-Instruct models, extensively fine-tuned on Chinese dataset, boosting semantic and instruction processing. Key enhancements include a larger vocabulary, extended context length, and efficient group query attention. Users have the flexibility to train or fine-tune with provided scripts compatible with popular frameworks like transformers. Discover next-level Chinese NLP with these cutting-edge models.

Explore the API for seamless interaction with the GPT-J model, supporting text generation and zero-shot classification. Integration is easy with Python and Bash, and no authentication is required for endpoints. Comprehensive documentation helps utilize features like multilingual classification and token control. The API server or Streamlit app can be deployed on various infrastructures such as TPU VMs, ensuring adaptability and growth. Access resources and community support for effective project development with advanced language model functionalities.

reverse-engineering-assistant

ReVa enhances reverse engineering by integrating a disassembler-agnostic AI assistant with a tool-driven approach, providing small tools that complement large language models (LLMs). Its advanced capabilities include chain-of-reasoning techniques and schema-directed inputs. Supporting both OpenAI and Ollama models for online and local inference, users benefit from its seamless integration with Ghidra, offering interactive 'reva-chat' sessions. ReVa breaks down complex tasks into manageable actions, reducing hallucinations and providing detailed insights into LLM reasoning.

chatgpt-translator

This open-source application leverages ChatGPT for automatic text translations, eliminating the need to specify source languages. Compatible with macOS, Windows, and Linux, it supports various languages and allows customization of shortcut keys and API domains. Ideal for developers and translators seeking a flexible text translation solution, contributions are welcome on GitHub.

Introducing an AI system that generates Wikipedia-like articles via extensive internet research and multi-perspective questioning. Suitable for academic and editorial use, it supports both individual and collaborative processes for effective knowledge curation, with over 70,000 users engaging in its feature previews.

MultiPLY is a multisensory embodied language model that interacts with 3D objects to gather sensory information like visual, audio, tactile, and thermal inputs. It integrates this data to strengthen the relationship between language, action, and perception by encoding scenes into object-centric representations. Sensory details become apparent through agent interactions utilizing specially designed tokens, enhancing language model capabilities for better 3D interaction fidelity.

DeepSeek-V2 is an advanced MoE language model featuring efficient operation with only a fraction of its total parameters engaged, leading to a 42.5% reduction in training costs and a 93.3% decrease in KV cache. Pretrained on a vast dataset and fine-tuned for excellence, it delivers superior performance on diverse benchmarks, including English and Chinese, coding, and long-form dialogue tasks. Discover innovations in its architecture and utilize it through Chat, API platforms, or local deployment for enhanced productivity.

The Interactive Composition Explorer (ICE) is a Python library enabling language model program analysis through execution trace visualization. Key features include multiple recipe modes, browser-based debugging, and language model agent creation. ICE allows parallel execution and component recipe reuse for tasks such as question-answering. Note: The API is actively evolving, may undergo changes, and is compatible with Python 3.9 and above, requiring a virtual environment and WSL for Windows.

llama3-Chinese-chat

The project presents llama3 in Chinese, aimed at fostering Chinese LLM learning and collaboration. It highlights community contributions to enhance PRs, develop datasets, and improve model features. The project also offers tutorials on model localization and deployment utilizing tools such as LMStudio. Featuring llama3.1 Chinese DPO and comprehensive documentation, it invites user participation in model testing and content expansion, providing an interactive platform for AI practitioners.

DeepSeekMoE 16B, utilizing a Mixture-of-Experts architecture, enhances computational efficiency with a reduction to 40% of operations. Matching the performance of models like LLaMA2 7B, its Base and Chat versions support English and Chinese, enabling deployment on a single GPU without quantization. Available under specific licensing for research and commercial applications.

Terms of Use Privacy Policy Advertising Services

Feedback Email: [email protected]