Project Icon

open-korean-instructions

Extensive Dataset Collection for Korean Language Model Training

Product DescriptionThis repository assembles a wide array of Korean language datasets essential for training language models, encompassing translated and GPT-generated data. It includes datasets like KoAlpaca and ShareGPT DeepL translations, supporting both single and multi-turn formats. Contributions for new data via PR are encouraged. This collection is a valuable asset for building instruction models, utilizing sources like Wikipedia data, ethical Q&A, and language feedback.
Project Details