Introduction to MiNLP
MiNLP, Xiaomi's Natural Language Processing (NLP) platform, is designed to empower various linguistic tasks with robust functionality. It consists of numerous modules covering lexical, syntactic, and semantic analysis, and has been widely implemented across company operations.
MiNLP-Tokenizer
The MiNLP-Tokenizer is a Chinese word segmentation tool that stands out for its continuous optimization and real-world application refinement. By November 2020, Xiaomi had officially released it as an open-source project, allowing a broader developer audience to leverage and contribute to its development.
Future Plans for Lexical Tools
Looking forward, Xiaomi planned an ambitious open-source agenda for its entire suite of lexical tools by the second quarter of 2021. These tools include part-of-speech tagging and named entity recognition, vital components for comprehensive NLP tasks. From the third quarter of 2021, the company aimed to steadily open-source its syntactic and some semantic analysis tools. This strategic plan was set to involve the developer community in creating a powerful and leading-edge NLP platform.
Structure Parsing with Duckling-Fork-Chinese
Another significant tool Xiaomi has developed is the Duckling-Fork-Chinese, a fork from Facebook's Duckling for the Java Virtual Machine (JVM). This tool is crucial for transforming text into structured data, especially prominent in parsing numbers and time references. In Xiaomi's "Xiao Ai" environment, this tool is extensively utilized to enhance functionality through accurate structured data interpretation.
Modules of MiNLP
- Chinese Word Segmentation: MiNLP-Tokenizer is currently available and can be explored through the MiNLP-Tokenizer portal.
- Part-of-Speech Tagging: Under development, expected availability announced soon.
- Named Entity Recognition: Coming soon.
- Dependency Parsing: Upcoming release.
- Structured Parsing: The duckling-fork-chinese tool is ready for use and can be found here.
MiNLP is an ambitious initiative by Xiaomi to advance the capabilities of natural language processing, offering a comprehensive suite of tools designed to assist developers in creating efficient, high-performance NLP applications. Through open-source collaboration, MiNLP not only enhances its own platform but also contributes to the global NLP ecosystem.