JiKen

JiKen is a simple kanji quiz that uses statistics and machine learning to accurately and quickly predict a user's knowledge level. I always found it tedious to get a good read of my current kanji level while studying using existing tests which either take forever or are terribly innaccurate so I made this.

The name JiKen is a bit of a play on words/kanji. It could be read as 字検 (letter test, similar to 漢検 the infamous official kanji test) or 事件 (incident). I left it in romaji for the ambiguity. KTest was just a working title, and kind of lame.

Host/Location

https://jiken.fly.dev/

Using Fly as a webhost, PlanetScale for MySQL, Redislabs for sessions.

Math

First thing to know to understand why this works so well is that kanji usage (and recognition) is not flat/random but has a relatively normal distribution and follows Ziph's Law. This allows us to make relatively sensible predictions of people's knowledge using a sigmoid function.

There are two main algorithms worth noting.

One predicts how many kanji you know (the graph) based on your answers. This is a Nelder-Mead regression algo with custom regularization: giving a lot of weight to the initial weights (safe assumption until data is collected), L2 reg (to avoid traps), some penalty to change between questions to give users a smooth experience. I also do bias correction as per https://cs.nyu.edu/~mohri/pub/bias.pdf since the questions selected are not random. No formal tuning methods were use, everything was done by hand until it felt good (the tuning target was to meet user expectations rather than simply being mathematically accurate).

The other algorithm ranks the difficulty of every kanji for future testing. If 100 people know "馬" but don't know "鹿" then the algorithm will shuffle the ranks around so that "鹿" is ranked lower, "馬" higher. This is called a Learning to rank algorithm: https://mlexplained.com/2019/05/27/learning-to-rank-explained-with-code/. Of course, this was again made more complicated by having biased sample selection.

Built With

Flask
SQLAlchemy (MySQL)
Redis (for sessions/ buffering)
APscheduler
Bootstrap
ChartJS
Genanki https://github.com/kerrickstaley/genanki (to generate anki files)
Datatables.js (for the missed word list)

Contact/Bugs

You can report bugs here or contact me via reddit, twitter #jiken, or e-mail.

Licensing/Contribute

Shoot me a message if you want to do something with this code.

Acknowledgments

Huge credit to the KANJIDIC team for the initial list of kanji and definitions

More Info

Open alpha reddit thread: https://www.reddit.com/r/LearnJapanese/comments/eq380w/made_an_app_that_tests_your_kanji_level_in_30/

Name		Name	Last commit message	Last commit date
Latest commit History 274 Commits
app		app
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
Procfile		Procfile
README.md		README.md
TODO.txt		TODO.txt
config.py		config.py
err.html		err.html
fly.toml		fly.toml
jiken.py		jiken.py
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

JiKen

Host/Location

https://jiken.fly.dev/

Math

Built With

Contact/Bugs

Licensing/Contribute

Acknowledgments

More Info

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

JiKen

Host/Location

https://jiken.fly.dev/

Math

Built With

Contact/Bugs

Licensing/Contribute

Acknowledgments

More Info

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages