ltp - The Language Technology Platform for Comprehensive Chinese NLP

Introduction to the LTP Project

The Language Technology Platform (LTP) is a comprehensive toolset designed for processing Chinese natural language. It provides various modules for tasks such as word segmentation, part-of-speech tagging, syntax analysis, and more, thus serving as a powerful resource for researchers and developers in the field of Natural Language Processing (NLP).

Citation

For those who wish to utilize LTP in their work, it is suggested to cite the following paper:

@inproceedings{che-etal-2021-n,
    title = "N-{LTP}: An Open-source Neural Language Technology Platform for {C}hinese",
    author = "Che, Wanxiang  and
      Feng, Yunlong  and
      Qin, Libo  and
      Liu, Ting",
    booktitle = "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations",
    year = "2021",
    publisher = "Association for Computational Linguistics",
}

Key Features and Updates

Structural Changes

LTP has been restructured into two parts, enhancing both maintenance and training processes.
- Legacy Models: These are rewritten using Rust to meet the high-speed demands. They maintain accuracy comparable to previous versions but offer up to 17.17 times the speed when multi-threading is enabled. These models currently support only word segmentation, part-of-speech tagging, and named entity recognition.
- Deep Learning Models: Developed using PyTorch, these models support all six tasks: word segmentation, part-of-speech tagging, named entity recognition, semantic role labeling, syntax parsing, and semantic dependency parsing.

Other Improvements

Enhanced training methodologies have been incorporated, including training scripts for user-specific customization.
Improved configuration using Hydra, aiding users in tweaking model parameters for better performance.
The utilization of Rust for certain algorithms provides faster processing speeds.
Models are uploaded to the Huggingface Hub, allowing for rapid downloads and user-uploaded models for inference.

New Features

Introduction of a Pipeline API for efficient predictions and optimization.
New model support for the Huggingface architecture, leveraging community-driven improvements.

Quickstart Guide

Python Installation

To install LTP using Python, you can use the following commands:

# Option 1: Use Tsinghua source
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple torch transformers
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple ltp ltp-core ltp-extension

# Option 2: Set global source and install
pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple
pip install torch transformers
pip install ltp ltp-core ltp-extension

Usage Example

Here’s a basic example to demonstrate how LTP can be used in Python for NLP tasks:

import torch
from ltp import LTP

ltp = LTP("LTP/small")  
if torch.cuda.is_available():
    ltp.to("cuda")

ltp.add_word("汤姆去", freq=2)
ltp.add_words(["外套", "外衣"], freq=2)

output = ltp.pipeline(["他叫汤姆去拿外衣。"], tasks=["cws", "pos", "ner", "srl", "dep", "sdp", "sdpg"])

print(output.cws)
print(output.pos)
print(output.ner)

Rust Installation

For those interested in using Rust, here's a simple snippet:

use std::fs::File;
use ltp::{CWSModel, POSModel, NERModel, ModelSerde, Format, Codec};

fn main() -> Result<(), Box<dyn std::error::Error>> {
  let file = File::open("data/legacy-models/cws_model.bin")?;
  let cws: CWSModel = ModelSerde::load(file, Format::AVRO(Codec::Deflate))?;
  
  let words = cws.predict("他叫汤姆去拿外衣。")?;
  
  println!("{:?}", words);

  Ok(())
}

Model Performance

LTP models support multi-task frameworks offering deep learning and traditional approaches. The models vary from Base to Tiny, offering different levels of performance and speed. For example, LTP's Base model achieves up to 99.22% accuracy in word segmentation.

Downloading and Using Models

Models can be downloaded via Git or directly through compressed files using HTTP links or SSH. Here’s how to utilize the downloaded models:

from ltp import LTP

ltp = LTP("/path/to/base")

Building Wheel Packages

For those interested in building their own packages, the command is straightforward:

make bdist

Authors and Licensing

LTP is developed by a team led by Wanxiang Che and Yunlong Feng. It's open-source for academic institutions; however, commercial uses require a licensing agreement. Contributions to LTP should acknowledge the platform by citing the specified paper.

For more specific licensing details or inquiries, contact [email protected].

Explore the innovative and versatile capabilities of LTP to enhance your Chinese NLP projects with high performance and accuracy!