🚀 Help me to become a full-time open-source developer by sponsoring me on GitHub
The Jieba Chinese Word Segmentation Implemented in Rust
Add it to your Cargo.toml:
[dependencies]
jieba-rs = "0.10"then you are good to go. If you are using Rust 2015 you have to extern crate jieba_rs to your crate root as well.
use jieba_rs::Jieba;
fn main() {
let jieba = Jieba::new();
let words = jieba.cut("我们中出了一个叛徒", false);
assert_eq!(words, vec!["我们", "中", "出", "了", "一个", "叛徒"]);
}default-dictfeature enables embedded dictionary, this features is enabled by defaulttfidffeature enables TF-IDF keywords extractortextrankfeature enables TextRank keywords extractor
[dependencies]
jieba-rs = { version = "0.10", features = ["tfidf", "textrank"] }cargo bench --all-features- jieba-rs 分词性能优化记录:提升 2.4 倍
- Making jieba-rs 2.4x faster
- Optimizing jieba-rs to be 33% faster than cppjieba
- 优化 jieba-rs 中文分词性能评测
- 最佳化 jieba-rs 中文斷詞性能測試
@node-rs/jiebaNodeJS bindingjieba-phpPHP bindingrjieba-pyPython bindingcang-jieChinese tokenizer for tantivytantivy-jiebaAn adapter that bridges between tantivy and jieba-rsjieba-wasmthe WebAssembly binding
This work is released under the MIT license. A copy of the license is provided in the LICENSE file.