Introduction to the LTP Project
The Language Technology Platform (LTP) is a comprehensive toolset designed for processing Chinese natural language. It provides various modules for tasks such as word segmentation, part-of-speech tagging, syntax analysis, and more, thus serving as a powerful resource for researchers and developers in the field of Natural Language Processing (NLP).
Citation
For those who wish to utilize LTP in their work, it is suggested to cite the following paper:
@inproceedings{che-etal-2021-n,
title = "N-{LTP}: An Open-source Neural Language Technology Platform for {C}hinese",
author = "Che, Wanxiang and
Feng, Yunlong and
Qin, Libo and
Liu, Ting",
booktitle = "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations",
year = "2021",
publisher = "Association for Computational Linguistics",
}
Key Features and Updates
Structural Changes
- LTP has been restructured into two parts, enhancing both maintenance and training processes.
- Legacy Models: These are rewritten using Rust to meet the high-speed demands. They maintain accuracy comparable to previous versions but offer up to 17.17 times the speed when multi-threading is enabled. These models currently support only word segmentation, part-of-speech tagging, and named entity recognition.
- Deep Learning Models: Developed using PyTorch, these models support all six tasks: word segmentation, part-of-speech tagging, named entity recognition, semantic role labeling, syntax parsing, and semantic dependency parsing.
Other Improvements
- Enhanced training methodologies have been incorporated, including training scripts for user-specific customization.
- Improved configuration using Hydra, aiding users in tweaking model parameters for better performance.
- The utilization of Rust for certain algorithms provides faster processing speeds.
- Models are uploaded to the Huggingface Hub, allowing for rapid downloads and user-uploaded models for inference.
New Features
- Introduction of a Pipeline API for efficient predictions and optimization.
- New model support for the Huggingface architecture, leveraging community-driven improvements.
Quickstart Guide
Python Installation
To install LTP using Python, you can use the following commands:
# Option 1: Use Tsinghua source
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple torch transformers
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple ltp ltp-core ltp-extension
# Option 2: Set global source and install
pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple
pip install torch transformers
pip install ltp ltp-core ltp-extension
Usage Example
Here’s a basic example to demonstrate how LTP can be used in Python for NLP tasks:
import torch
from ltp import LTP
ltp = LTP("LTP/small")
if torch.cuda.is_available():
ltp.to("cuda")
ltp.add_word("汤姆去", freq=2)
ltp.add_words(["外套", "外衣"], freq=2)
output = ltp.pipeline(["他叫汤姆去拿外衣。"], tasks=["cws", "pos", "ner", "srl", "dep", "sdp", "sdpg"])
print(output.cws)
print(output.pos)
print(output.ner)
Rust Installation
For those interested in using Rust, here's a simple snippet:
use std::fs::File;
use ltp::{CWSModel, POSModel, NERModel, ModelSerde, Format, Codec};
fn main() -> Result<(), Box<dyn std::error::Error>> {
let file = File::open("data/legacy-models/cws_model.bin")?;
let cws: CWSModel = ModelSerde::load(file, Format::AVRO(Codec::Deflate))?;
let words = cws.predict("他叫汤姆去拿外衣。")?;
println!("{:?}", words);
Ok(())
}
Model Performance
LTP models support multi-task frameworks offering deep learning and traditional approaches. The models vary from Base to Tiny, offering different levels of performance and speed. For example, LTP's Base model achieves up to 99.22% accuracy in word segmentation.
Downloading and Using Models
Models can be downloaded via Git or directly through compressed files using HTTP links or SSH. Here’s how to utilize the downloaded models:
from ltp import LTP
ltp = LTP("/path/to/base")
Building Wheel Packages
For those interested in building their own packages, the command is straightforward:
make bdist
Authors and Licensing
LTP is developed by a team led by Wanxiang Che and Yunlong Feng. It's open-source for academic institutions; however, commercial uses require a licensing agreement. Contributions to LTP should acknowledge the platform by citing the specified paper.
For more specific licensing details or inquiries, contact [email protected].
Explore the innovative and versatile capabilities of LTP to enhance your Chinese NLP projects with high performance and accuracy!