Rasa NLU GQ
Rasa NLU GQ is a project that extends the capabilities of Rasa NLU (Natural Language Understanding), a tool designed to interpret and understand human language. For example, it can understand a sentence like:
"I'm looking for a Mexican restaurant in the center of town"
and extract structured data from it, such as:
intent: search_restaurant
entities:
- cuisine : Mexican
- location : center
Introduction
The modifications in Rasa NLU GQ are based mainly on the latest version of Rasa, with most changes focusing on updating components from the original rasa_nlu_gao
project. Instead of making complex alterations to Rasa's core code, components from the earlier version can now be loaded as add-ons, allowing developers to use the latest Rasa version with ongoing updates.
New Features
Rasa NLU GQ introduces several new features, enhancing the functionality of Rasa NLU:
-
Entity Recognition Models: Two new models for entity recognition have been added: bilstm+crf and idcnn+crf (dilated convolution). Here's how you can configure these models in your
yml
file:language: "zh" pipeline: - name: "JiebaTokenizer" - name: "CountVectorsFeaturizer" token_pattern: "(?u)\b\w+\b" - name: "EmbeddingIntentClassifier" - name: "rasa_nlu_gao.extractors.bilstm_crf_entity_extractor.BilstmCRFEntityExtractor" lr: 0.001 char_dim: 100 lstm_dim: 100 batches_per_epoch: 10 seg_dim: 20 num_segs: 4 batch_size: 200 tag_schema: "iobes" model_type: "bilstm" clip: 5 optimizer: "adam" dropout_keep: 0.5 steps_check: 100
-
Jieba Part-of-Speech Tagging Module: This module helps in identifying names, places, organizations, etc., using Jieba-supported parts of speech.
language: "zh" pipeline: - name: "JiebaTokenizer" - name: "CRFEntityExtractor" - name: "rasa_nlu_gao.extractors.jieba_pseg_extractor.JiebaPsegExtractor" part_of_speech: ["nr", "ns", "nt"] - name: "CountVectorsFeaturizer" OOV_token: oov token_pattern: "(?u)\b\w+\b" - name: "EmbeddingIntentClassifier"
-
Intent Editing Based on Entities: This allows the intent to be dynamically modified based on recognized entities.
language: "zh" pipeline: - name: "JiebaTokenizer" - name: "CRFEntityExtractor" - name: "JiebaPsegExtractor" - name: "CountVectorsFeaturizer" OOV_token: oov token_pattern: '(?u)\b\w+\b' - name: "EmbeddingIntentClassifier" - name: "rasa_nlu_gao.classifiers.entity_edit_intent.EntityEditIntent" entity: ["nr"] intent: ["enter_data"] min_confidence: 0
-
Bert Model for Word Vectors: There is support for extracting word vector features using the BERT model:
language: "zh" pipeline: - name: "JiebaTokenizer" - name: "rasa_nlu_gao.featurizers.bert_vectors_featurizer.BertVectorsFeaturizer" ip: '127.0.0.1' port: 5555 port_out: 5556 show_server_config: True timeout: 10000 - name: "EmbeddingIntentClassifier" - name: "CRFEntityExtractor"
-
Resource Allocation for CPU and GPU: Configurations allow you to manage CPU and GPU usage effectively, particularly important for components using TensorFlow.
language: "zh" pipeline: - name: "JiebaTokenizer" - name: "CountVectorsFeaturizer" token_pattern: '(?u)\b\w+\b' - name: "EmbeddingIntentClassifier" config_proto: { "device_count": 4, "inter_op_parallelism_threads": 0, "intra_op_parallelism_threads": 0, "allow_growth": True } - name: "rasa_nlu_gao.extractors.bilstm_crf_entity_extractor.BilstmCRFEntityExtractor" config_proto: { "device_count": 4, "inter_op_parallelism_threads": 0, "intra_op_parallelism_threads": 0, "allow_growth": True }
-
Embedding BERT Intent Classifier: Incorporating a classifier utilizing the Bert model:
language: "zh" pipeline: - name: "JiebaTokenizer" - name: "rasa_nlu_gao.featurizers.bert_vectors_featurizer.BertVectorsFeaturizer" ip: '127.0.0.1' port: 5555 port_out: 5556 show_server_config: True timeout: 10000 - name: "rasa_nlu_gao.classifiers.embedding_bert_intent_classifier.EmbeddingBertIntentClassifier" - name: "CRFEntityExtractor"
-
Advanced TensorFlow API Use: Employing TensorFlow's higher-level API for classifiers based on Bert embeddings, suitable for using
tf.estimator
,tf.data
,tf.example
, andtf.saved_model
.language: "zh" pipeline: - name: "JiebaTokenizer" - name: "rasa_nlu_gao.featurizers.bert_vectors_featurizer.BertVectorsFeaturizer" ip: '127.0.0.1' port: 5555 port_out: 5556 show_server_config: True timeout: 10000 - name: "rasa_nlu_gao.classifiers.embedding_bert_intent_estimator_classifier.EmbeddingBertIntentEstimatorClassifier" - name: "SpacyNLP" - name: "CRFEntityExtractor"
For detailed examples, you can explore the rasa_chatbot_cn repository.
Quick Installation
To get started with Rasa NLU GQ, you can install the package with pip:
pip install rasa-nlu-gao
Additional Information
For more resources, you can check out the following external links: