Awesome Korean LLM
The Awesome Korean LLM repository is a thoughtfully curated list of open-source Large Language Models (LLMs) specifically focusing on Korean. This collection offers a comprehensive overview of various models designed to facilitate the development and deployment of Korean language processing tools.
Overview of Korean Open-Source LLMs
This list is a goldmine for developers and researchers interested in Korean Natural Language Processing (NLP). It includes a wide array of models that differ in terms of size, creator, base model, commercial usability, and available weights.
Featured Models:
-
Polyglot-Ko:
- Sizes: 1.3B, 3.8B, 5.8B, 12.8B
- Creator: EleutherAI
- Base Model: GPT-NeoX
- Commercial Use: Allowed
- Weight Availability: Huggingface
-
KoAlpaca:
- Sizes: 7B, 13B, 30B, 65B / 5.8B, 12.8B
- Creator: beomi
- Base Models: llama / Polyglot-Ko
- Commercial Use: Not allowed for some, allowed for others
- Weight Availability: Huggingface
-
KuLLM:
- Sizes: 5.8B, 12.8B
- Creator: Korea University NLP & AI Lab
- Base Model: Polyglot-Ko
- Commercial Use: Allowed
- Weight Availability: Huggingface
...and many more, including models like KORani, K(G)OAT, KoVicuna, and Kollama among others.
Categories of Korean LLM Models
The models are categorized based on their underlying architecture, providing users with a clearer understanding of their technological foundation.
1. Polyglot-Ko Based Models:
- Models like KoAlpaca, KuLLM, and KORani are built on Polyglot-Ko, showcasing high adaptability for Korean language tasks due to their robust architecture.
2-1. Llama Based Models:
- This category includes models like Kollama and KORani, which leverage the llama architecture known for its efficiency in handling language tasks.
2-2. Llama-2 Based Models:
- Models such as Llama-2-Ko, keer multiple variants tailored to different computational needs and commercial uses.
3. Other Models:
- This includes an assortment of models like KoVicuna, KoGPT2, and KoreanLM, each with its unique features and specialties in processing Korean text.
Contributing to the Awesome Korean LLM
The project encourages contributions from anyone in the community who identifies errors or wants to add new Korean LLMs to the list. Contributions are facilitated through pull requests on the project's GitHub repository, fostering a collaborative and ever-evolving resource pool for Korean NLP enthusiasts and professionals.
Conclusion
The Awesome Korean LLM repository acts as a pivotal reference point for developers and researchers working with Korean text. By offering diverse model options along with information on their usability and technical specifications, it significantly advances the field of Korean language AI. It's a must-visit for anyone looking to expand their capabilities in NLP with a focus on Korean languages.