rasa_nlu_gq - Improve Natural Language Understanding through Enhanced Component Integrations

Rasa NLU GQ

Rasa NLU GQ is a project that extends the capabilities of Rasa NLU (Natural Language Understanding), a tool designed to interpret and understand human language. For example, it can understand a sentence like:

"I'm looking for a Mexican restaurant in the center of town"

and extract structured data from it, such as:

  intent: search_restaurant
  entities: 
    - cuisine : Mexican
    - location : center

Introduction

The modifications in Rasa NLU GQ are based mainly on the latest version of Rasa, with most changes focusing on updating components from the original rasa_nlu_gao project. Instead of making complex alterations to Rasa's core code, components from the earlier version can now be loaded as add-ons, allowing developers to use the latest Rasa version with ongoing updates.

New Features

Rasa NLU GQ introduces several new features, enhancing the functionality of Rasa NLU:

Entity Recognition Models: Two new models for entity recognition have been added: bilstm+crf and idcnn+crf (dilated convolution). Here's how you can configure these models in your yml file:

  language: "zh"

  pipeline:
  - name: "JiebaTokenizer"
  - name: "CountVectorsFeaturizer"
    token_pattern: "(?u)\b\w+\b"
  - name: "EmbeddingIntentClassifier"
  - name: "rasa_nlu_gao.extractors.bilstm_crf_entity_extractor.BilstmCRFEntityExtractor"
    lr: 0.001
    char_dim: 100
    lstm_dim: 100
    batches_per_epoch: 10
    seg_dim: 20
    num_segs: 4
    batch_size: 200
    tag_schema: "iobes"
    model_type: "bilstm"
    clip: 5
    optimizer: "adam"
    dropout_keep: 0.5
    steps_check: 100

Jieba Part-of-Speech Tagging Module: This module helps in identifying names, places, organizations, etc., using Jieba-supported parts of speech.

  language: "zh"

  pipeline:
  - name: "JiebaTokenizer"
  - name: "CRFEntityExtractor"
  - name: "rasa_nlu_gao.extractors.jieba_pseg_extractor.JiebaPsegExtractor"
    part_of_speech: ["nr", "ns", "nt"]
  - name: "CountVectorsFeaturizer"
    OOV_token: oov
    token_pattern: "(?u)\b\w+\b"
  - name: "EmbeddingIntentClassifier"

Intent Editing Based on Entities: This allows the intent to be dynamically modified based on recognized entities.

  language: "zh"

  pipeline:
  - name: "JiebaTokenizer"
  - name: "CRFEntityExtractor"
  - name: "JiebaPsegExtractor"
  - name: "CountVectorsFeaturizer"
    OOV_token: oov
    token_pattern: '(?u)\b\w+\b'
  - name: "EmbeddingIntentClassifier"
  - name: "rasa_nlu_gao.classifiers.entity_edit_intent.EntityEditIntent"
    entity: ["nr"]
    intent: ["enter_data"]
    min_confidence: 0

Bert Model for Word Vectors: There is support for extracting word vector features using the BERT model:

  language: "zh"

  pipeline:
  - name: "JiebaTokenizer"
  - name: "rasa_nlu_gao.featurizers.bert_vectors_featurizer.BertVectorsFeaturizer"
    ip: '127.0.0.1'
    port: 5555
    port_out: 5556
    show_server_config: True
    timeout: 10000
  - name: "EmbeddingIntentClassifier"
  - name: "CRFEntityExtractor"

Resource Allocation for CPU and GPU: Configurations allow you to manage CPU and GPU usage effectively, particularly important for components using TensorFlow.

  language: "zh"

  pipeline:
  - name: "JiebaTokenizer"
  - name: "CountVectorsFeaturizer"
    token_pattern: '(?u)\b\w+\b'
  - name: "EmbeddingIntentClassifier"
    config_proto: {
      "device_count": 4,
      "inter_op_parallelism_threads": 0,
      "intra_op_parallelism_threads": 0,
      "allow_growth": True
    }
  - name: "rasa_nlu_gao.extractors.bilstm_crf_entity_extractor.BilstmCRFEntityExtractor"
    config_proto: {
      "device_count": 4,
      "inter_op_parallelism_threads": 0,
      "intra_op_parallelism_threads": 0,
      "allow_growth": True
    }

Embedding BERT Intent Classifier: Incorporating a classifier utilizing the Bert model:

  language: "zh"

  pipeline:
  - name: "JiebaTokenizer"
  - name: "rasa_nlu_gao.featurizers.bert_vectors_featurizer.BertVectorsFeaturizer"
    ip: '127.0.0.1'
    port: 5555
    port_out: 5556
    show_server_config: True
    timeout: 10000
  - name: "rasa_nlu_gao.classifiers.embedding_bert_intent_classifier.EmbeddingBertIntentClassifier"
  - name: "CRFEntityExtractor"

Advanced TensorFlow API Use: Employing TensorFlow's higher-level API for classifiers based on Bert embeddings, suitable for using tf.estimator, tf.data, tf.example, and tf.saved_model.

language: "zh"

pipeline:
- name: "JiebaTokenizer"
- name: "rasa_nlu_gao.featurizers.bert_vectors_featurizer.BertVectorsFeaturizer"
  ip: '127.0.0.1'
  port: 5555
  port_out: 5556
  show_server_config: True
  timeout: 10000
- name: "rasa_nlu_gao.classifiers.embedding_bert_intent_estimator_classifier.EmbeddingBertIntentEstimatorClassifier"
- name: "SpacyNLP"
- name: "CRFEntityExtractor"

For detailed examples, you can explore the rasa_chatbot_cn repository.

Quick Installation

To get started with Rasa NLU GQ, you can install the package with pip:

pip install rasa-nlu-gao

Additional Information

For more resources, you can check out the following external links: