Tags: Mingu113/KanjiDictVN
Tags
feat: Huge rework
- Total rework of the fetching engine:
+ Instead of 8 threads in a pool processing the work sequentially,
the whole Kanji bank is divided into 8 (nearly) equal parts, and
each thread sequentially downloads each page in the chunks.
+ Instead of relying on web.archive.org as a proxy and fetch stuff
as old as 2018, we now have access to the latest data from the
original website using the TOR proxy.
+ New hvdic parsing logic. The messy code is replaced with an object
oriented approach. This allows type-safe scraping of the dictionary,
as well as serializing the whole hvdic as JSON or something else
to be used in the future.
+ The old WebArchiveClient is still kept as a useful reference (Don't
have the time and enthusiasm to make it a separate NuGet package
yet).
- Refreshed hvcache with the new pages obtained by this method.
- A new out_vn folder is built.
fix: Read multiple meanings from same source