Skip to content

dmgerman/yomikun

Repository files navigation

Yomikun — Japanese Reading Assistant for Emacs

Yomikun brings yomichan/migaku-style features to Emacs for learning Japanese.

  • Tokenize Japanese text using mecab (supports UniDic and IPAdic dictionaries)
  • Color-code words by grammatical type (nouns, verbs, particles, etc.)
  • Track learning status of words: known, unknown, learning, ignored
  • Look up definitions via tooltip (using myougiden)
  • Look up kanji information (using jamdict)
  • Detect compound terms
  • Jump to reference sites (jisho.org, kanjidamage.com)

See a demo here.

./screenshot.png

Quick Start

  1. Install the requirements (see below)
  2. Add to your Emacs config:
(use-package emacsql :straight t)
(use-package pos-tip :straight t)
(use-package yomikun
  :straight nil
  :load-path "~/.emacs.d/modules/yomikun"
  :config
  (setq yk-mecab-binary "/opt/homebrew/bin/mecab")   ;; or your mecab path
  (setq yk-mecab-dict-dir nil)                        ;; nil = use default dictionary
  (setq yk-mecab-dict-type 'ipadic)                   ;; or 'unidic
  (setq yk-db-status-file "~/jp-status.db")
  (setq yk-db-dict-file "~/dictionary.db"))
  1. Open a Japanese text file
  2. M-x yk-minor-mode — parses the buffer and activates keybindings

Requirements

Mecab

Yomikun uses mecab for morphological analysis. Two dictionary formats are supported:

IPAdic (easiest)

brew install mecab mecab-ipadic

Then set:

(setq yk-mecab-binary "mecab")
(setq yk-mecab-dict-type 'ipadic)

UniDic (more detailed tokenization)

Download from MecabUnidic releases. You need the homebrew mecab binary with the UniDic dictionary files:

(setq yk-mecab-binary "/opt/homebrew/bin/mecab")
(setq yk-mecab-dict-dir "/path/to/MecabUnidic/support")
(setq yk-mecab-dict-type 'unidic)

Verifying your setup

Run M-x yk-doctor to check that mecab, the dictionary, and the databases are all configured correctly.

Myougiden (dictionary lookups)

pip install myougiden

After installation, download the dictionary database as described on the myougiden page.

Verify:

myougiden --human お願い

Jamdict (kanji information)

pip install jamdict jamdict-data

After installation, verify the database location:

python3 -m jamdict info

Look for the Jamdict DB location line. If it says [OK], you’re set.

If JAMDICT_HOME shows [Missing], create the directory and config:

python3 -m jamdict config

The kanji-dict.py script (in ./other/kanji/) needs the database path. If jamdict installed the DB in a non-standard location (common with pip --user installs), edit the path in kanji-dict.py or set:

(setq yk-kanji-dict-command '("/path/to/kanji-dict.py"))

Verify it works:

kanji-dict.py 日本

Expected output: stroke count, grade, frequency, readings for each kanji.

Emacs packages

  • emacsql and emacsql-sqlite — SQLite database access (emacsql)
  • pos-tip — tooltip display near point

Both are available from MELPA.

Status database

Copy one of the JLPT word lists from ./db/ as your starting database:

cp db/tangoN5.db ~/jp-status.db

Or create an empty database with M-x yk-db-status-create.

Dictionary database

Decompress the quick-lookup dictionary:

bunzip2 -k db/dictionary.db.bz2
cp db/dictionary.db ~/dictionary.db

This database provides fast definitions for the cursor-sensor auto-help feature.

Usage

Basic workflow

  1. Open a Japanese text file
  2. M-x yk-minor-mode — automatically parses the buffer, finds compounds, and activates keybindings
  3. Navigate the text — unknown words are highlighted with a colored background
  4. Mark words as you learn them using the keybindings below
  5. Press RET on any word to see its dictionary definition

If the buffer has already been parsed, yk-minor-mode skips parsing and just activates the keybindings.

Processing commands

CommandDescription
yk-minor-modeActivate yomikun (auto-parses if needed)
yk-do-bufferParse the entire buffer
yk-do-regionParse the selected region
yk-do-all-compoundsFind compound terms (run after parsing)
yk-verify-bufferVerify parsing consistency
yk-doctorDiagnose mecab/database configuration

Minor mode keybindings

KeyCommandDescription
kyk-mark-at-point-as-knownMark word as known
uyk-mark-at-point-as-unknownMark word as unknown
lyk-mark-at-point-as-learningMark word as learning
iyk-mark-at-point-as-ignoredMark word as ignored
RETyk-define-at-pointShow dictionary definition
nyk-kanji-at-pointShow kanji information
jyk-jisho-at-pointLook up on jisho.org
myk-kanji-damage-at-pointLook up on kanjidamage.com
pyk-prop-at-pointShow morph properties (debug)
===yk-mark-sentence-at-pointSelect current sentence
xyk-disable-modeExit yk-minor-mode

Marking a word updates its status globally — all occurrences in the buffer change color immediately.

Color coding

Words are colored by grammatical type:

TypeJapaneseColor
Noun名詞String face
Verb動詞Steel blue
Adjective形容詞Orange
Adverb副詞Purple
Particle助詞Dark grey
Morpheme助動詞Magenta

Unknown words additionally get a pink background. Learning words get a green background. Compound terms are underlined with a red wavy line.

Auto-help

When cursor-sensor-mode is active (enabled automatically by yk-minor-mode), moving the cursor onto an unknown word shows a quick dictionary definition in a tooltip. This uses the quick-lookup dictionary database for speed.

Configuration

All settings are available via M-x customize-group RET yomikun.

VariableDefaultDescription
yk-mecab-binary"mecab"Path to mecab executable
yk-mecab-dict-dirnilMecab dictionary directory
yk-mecab-dict-typenil (auto-detect)'unidic or 'ipadic
yk-db-status-file"~/yk-status.db"Path to word status database
yk-db-dict-filenilPath to quick-lookup dictionary
yk-dict-command'("myougiden" "--human")Dictionary lookup command
yk-kanji-dict-command'("kanji-dict.py")Kanji lookup command
yk-tooltip-timeout10Tooltip display time (seconds)
yk-max-tokens-to-process10000Safety limit for token processing
yk-debugnilEnable debug messages

Module structure

FilePurpose
yomikun.elCore: minor mode, faces, overlays, compounds
yomikun-mecab.elMecab: dictionary registry, parsing, diagnostics
yomikun-db.elDatabase: status tracking, dictionary, memoization
yomikun-dict.elDictionary: lookups, tooltips, external commands

Word status

Yomikun tracks four learning states for each word:

StatusMeaningVisual
unknownNew word, not yet studiedColored background (pink)
learningCurrently studyingColored background (green)
knownFully learnedText color only
ignoredSkip (names, particles you know, etc.)Grey background

Words not in the database are treated as unknown.

The status database is a SQLite file with primary key (morph, mtype, surface). Pre-populated JLPT word lists (N3, N4, N5) are available in ./db/.

Limitations

  • Tooltip width may not perfectly match content on macOS Retina displays (pos-tip limitation)
  • Processing is synchronous — large files may take a few seconds
  • Compound detection requires the dictionary database
  • Tested on macOS; should work on Linux

Running tests

emacs --batch \
  -L . -L tests \
  -L /path/to/straight/build/buttercup \
  -l buttercup -f buttercup-run-discover

Tests require Buttercup. Integration tests require a working mecab installation.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors