Skip to content

Ambiwlans/JiKen

Repository files navigation

JiKen

JiKen is a simple kanji quiz that uses statistics and machine learning to accurately and quickly predict a user's knowledge level. I always found it tedious to get a good read of my current kanji level while studying using existing tests which either take forever or are terribly innaccurate so I made this.

The name JiKen is a bit of a play on words/kanji. It could be read as 字検 (letter test, similar to 漢検 the infamous official kanji test) or 事件 (incident). I left it in romaji for the ambiguity. KTest was just a working title, and kind of lame.

Host/Location

  • Using Fly as a webhost, PlanetScale for MySQL, Redislabs for sessions.

Math

First thing to know to understand why this works so well is that kanji usage (and recognition) is not flat/random but has a relatively normal distribution and follows Ziph's Law. This allows us to make relatively sensible predictions of people's knowledge using a sigmoid function.

There are two main algorithms worth noting.

One predicts how many kanji you know (the graph) based on your answers. This is a Nelder-Mead regression algo with custom regularization: giving a lot of weight to the initial weights (safe assumption until data is collected), L2 reg (to avoid traps), some penalty to change between questions to give users a smooth experience. I also do bias correction as per https://cs.nyu.edu/~mohri/pub/bias.pdf since the questions selected are not random. No formal tuning methods were use, everything was done by hand until it felt good (the tuning target was to meet user expectations rather than simply being mathematically accurate).

The other algorithm ranks the difficulty of every kanji for future testing. If 100 people know "馬" but don't know "鹿" then the algorithm will shuffle the ranks around so that "鹿" is ranked lower, "馬" higher. This is called a Learning to rank algorithm: https://mlexplained.com/2019/05/27/learning-to-rank-explained-with-code/. Of course, this was again made more complicated by having biased sample selection.

Built With

  • Flask
  • SQLAlchemy (MySQL)
  • Redis (for sessions/ buffering)
  • APscheduler
  • Bootstrap
  • ChartJS
  • Genanki https://github.com/kerrickstaley/genanki (to generate anki files)
  • Datatables.js (for the missed word list)

Contact/Bugs

You can report bugs here or contact me via reddit, twitter #jiken, or e-mail.

Licensing/Contribute

Shoot me a message if you want to do something with this code.

Acknowledgments

More Info

About

Kanji testing web app leveraging ML

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors