Named Entity Recognition for Chatbots
The Chatbot Named Entity Recognition (NER) framework is an innovative open-source tool designed to identify important elements within text messages specifically for conversational AI applications. Developed by the team at Haptik, this framework is designed to enhance the interaction capabilities of chatbots, particularly those that support Indian languages. It offers entity recognition for texts in languages such as English, Hindi, Gujarati, Marathi, Bengali, and Tamil, and their mixed forms.
Key Features
Multi-language Support
Chatbot NER is built with the focus on accommodating Indian languages, making it one of the few frameworks that support such a linguistic diversity in conversational contexts. It currently handles English and several Indian languages, including Hindi and Tamil, with ongoing efforts to include more regional dialects.
Entity Detection Capabilities
The framework is capable of detecting a variety of entities including time, date, numbers, phone numbers, emails, and more. It employs common patterns and NLP (Natural Language Processing) techniques to identify entities even in data-sparse environments, adapting effortlessly to the needs of conversational AI.
Supported Entity Types and Languages
- Time: Identifies time statements in a conversation, supporting all the languages mentioned.
- Date: Extracts date information seamlessly, useful in setting reminders or scheduling.
- Numbers: Detects numerical information such as prices or quantities.
- Phone Numbers and Emails: Recognize contact information to facilitate quick actions.
- Text: Custom text entity recognition using a database search or contextual models, though the latter is currently restricted to English.
- PNR Codes and Regex Patterns: Captures unique codes or follows patterns to extract specific information, mostly in English.
Framework Architecture
Chatbot NER is organized into four primary categories:
- Numeral: Handles anything related to numbers like budgets or sizes.
- Pattern: Uses regular expressions for detecting entities like phone numbers or email addresses.
- Temporal: Deals with time and date-related entities.
- Textual: Uses dictionaries and contextual models for identifying text-related entities such as cities or user locations.
Installation and API Usage
Installation instructions utilizing Docker are available, making it simple to set up Chatbot NER for various applications. The API structure is designed to be user-friendly, particularly for conversational AI systems, though its versatility allows for broader applications.
Contribution and Future Work
The Chatbot NER project is continuously evolving. Haptik encourages community involvement, inviting contributors to enhance the framework by adding training data or creating new detection patterns. Future plans include addressing architectural constraints to facilitate the integration of machine learning models and expanding the scope to incorporate new entities.
Contributors can refer to guidelines on how to submit their contributions, ensuring a smooth collaboration process that benefits the broader user community.
Conclusion
Chatbot NER stands out as a forward-thinking framework addressing the specific needs of conversational AI in diverse linguistic environments. With its robust capabilities, multi-language support, and open invitation for contribution, it is a significant step forward in enhancing chatbot interactions and broadening their reach across different language demographics.