Skip to content

Tags: Mingu113/KanjiDictVN

Tags

trungnt2910.hannom.20230110-134950.kanjidic2.zip

Toggle trungnt2910.hannom.20230110-134950.kanjidic2.zip's commit message

Verified

This commit was signed with the committer’s verified signature.
trungnt2910 Trung Nguyen
feat: Huge rework

- Total rework of the fetching engine:
  + Instead of 8 threads in a pool processing the work sequentially,
    the whole Kanji bank is divided into 8 (nearly) equal parts, and
    each thread sequentially downloads each page in the chunks.
  + Instead of relying on web.archive.org as a proxy and fetch stuff
    as old as 2018, we now have access to the latest data from the
    original website using the TOR proxy.
  + New hvdic parsing logic. The messy code is replaced with an object
    oriented approach. This allows type-safe scraping of the dictionary,
    as well as serializing the whole hvdic as JSON or something else
    to be used in the future.
  + The old WebArchiveClient is still kept as a useful reference (Don't
    have the time and enthusiasm to make it a separate NuGet package
    yet).
- Refreshed hvcache with the new pages obtained by this method.
- A new out_vn folder is built.

trungnt2910.hannom.20230109-152759.kanjidic2

Toggle trungnt2910.hannom.20230109-152759.kanjidic2's commit message

Verified

This commit was signed with the committer’s verified signature.
trungnt2910 Trung Nguyen
fix: Read multiple meanings from same source

trungnt2910.hannom.20230109-132817.kanjidic2

Toggle trungnt2910.hannom.20230109-132817.kanjidic2's commit message

Verified

This commit was signed with the committer’s verified signature.
trungnt2910 Trung Nguyen
feat: Initial commit