Skip to content

Japan7/yohane

Repository files navigation

yohane

Yohane takes a song and its lyrics, optionally extracts the vocals, splits the syllables, and computes a forced alignment to generate karaoke in an Aegisub subtitle file (.ass).

Getting Started

Google Colab (Free GPU)

Open the provided notebook in Google Colab and use their free T4 GPUs:

Open In Colab

Hugging Face Space (CPU-only)

Open In HF Spaces

Local Environment

With uv (PyPI)

Requirements:

uvx --from git+https://github.com/Japan7/yohane.git[cli] --python 3.14 yohane --help

With pixi (Conda)

Requirement: pixi

git clone https://github.com/Japan7/yohane.git
cd yohane/
pixi run yohane --help

Caveats

  • Yohane's syllable splitting is only optimized for Japanese lyrics at the moment
  • Syllables at the end of lines are often shortened (Workaround: use our finetuned model)
  • Forced alignment can't deal with overlapping vocals
  • It is not fully accurate, you should still check and edit the result!

Recommended workflow

  1. Get the song and its lyrics
  2. Use the yohane notebook or the CLI locally to generate the karaoke file

In Aegisub:

  1. Load the .ass and the video
  2. Replace the Default style with your own
  3. Due to the normalization during the process, lines are lowercased and special characters have been removed: use the original lines in comments to fix the timed lines
  4. Subtitle > Select Lines… > check Comments and Set selection > OK and delete the selected lines
  5. Listen to each line and fix their End time
  6. Add a 1s karaoke lead-in to every line
  7. Iterate over each line in karaoke mode and merge/fix syllable timings

Sample

References