Skip to content

togatoga/kanpyo

Repository files navigation

Kanpyo

Crates.io

Kanpyo is Japanese morphological analyzer written in Rust inspired by ikawaha/Kagome.

Caution

This is a work in progress. I would break the API without notice.

Installation

With Embedded Dictionary (Recommended)

The easiest way to install kanpyo is with the embedded dictionary. No additional setup required.

cargo install kanpyo --features mecab-ipadic

or from git:

cargo install --git https://github.com/togatoga/kanpyo kanpyo --features mecab-ipadic

The dictionary will be automatically downloaded from GitHub Releases during the build process and embedded into the binary.

Without Embedded Dictionary

If you prefer a smaller binary size or want to use a custom dictionary:

cargo install kanpyo

You need to build and install a dictionary manually:

cd kanpyo-dict
tar xvf resource/mecab-ipadic-2.7.0-20070801.tar.gz -C resource
cargo run --release --bin ipa-dict-builder -- --dict resource/mecab-ipadic-2.7.0-20070801

The dictionary is installed in the following directory:

  • Linux: $HOME/.config/kanpyo/
  • macOS: $HOME/Library/Application Support/kanpyo/
  • Windows: %APPDATA%\kanpyo\

You're ready to use kanpyo!

Usage

kanpyo --help
Japanese Morphological Analyzer

Usage: kanpyo [COMMAND]

Commands:
  tokenize  Tokenize input text
  graphviz  Output lattice in Graphviz format
  help      Print this message or the help of the given subcommand(s)

Options:
  -h, --help     Print help
  -V, --version  Print version

Tokenize

kanpyo tokenize "すもももももももものうち"          
すもも  名詞,一般,*,*,*,*,すもも,スモモ,スモモ
も      助詞,係助詞,*,*,*,*,も,モ,モ
もも    名詞,一般,*,*,*,*,もも,モモ,モモ
も      助詞,係助詞,*,*,*,*,も,モ,モ
もも    名詞,一般,*,*,*,*,もも,モモ,モモ
の      助詞,連体化,*,*,*,*,の,ノ,ノ
うち    名詞,非自立,副詞可能,*,*,*,うち,ウチ,ウチ
EOS

REPL mode

kanpyo
自然言語処理
自然    名詞,形容動詞語幹,*,*,*,*,自然,シゼン,シゼン
言語    名詞,一般,*,*,*,*,言語,ゲンゴ,ゲンゴ
処理    名詞,サ変接続,*,*,*,*,処理,ショリ,ショリ
EOS
形態素解析
形態素  名詞,一般,*,*,*,*,形態素,ケイタイソ,ケイタイソ
解析    名詞,サ変接続,*,*,*,*,解析,カイセキ,カイセキ
EOS

From piped standard input

echo "自然言語処理" | kanpyo
自然    名詞,形容動詞語幹,*,*,*,*,自然,シゼン,シゼン
言語    名詞,一般,*,*,*,*,言語,ゲンゴ,ゲンゴ
処理    名詞,サ変接続,*,*,*,*,処理,ショリ,ショリ
EOS

Graphviz

Print lattice in Graphviz format for debugging.

kanpyo graphviz "自然言語処理" | dot -Tpng -o lattice.png

lattice

TODO

  • Support various dictionaries(Sudachi, UniDic, neologd, etc.)
  • Support server mode
  • Support search mode
  • Tests for load dictionary and tokenize

About

Japanese Morphological Analyzer written in Rust

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages