More than enough Hanzi (MteH)

A curated set of ~4,540 simplified Chinese characters for advanced language learners.

The MteH corpus is designed as an "endgame corpus" for advanced students. Basically, if you learn these characters, you're practically "done for life" studying simplified Chinese characters (congratulations!). Obviously, there are more simplified Chinese characters than this (in proper nouns, scientific terms, chengyu, Chinese history, online usernames, etc.), but at a certain point you've got to draw the line and say "this is my endgame".

Currently, MteH focuses entirely on simplified Chinese characters, especially those you’ll encounter in mainland China and in HSK exams.

MteH corpus (v0.1.3) (plain text)
Handwriting practice (PDFs to print out)
Extra characters (good to know, but not part of MteH)
- Repeated-component characters
- Periodic table of the elements
- Province abbreviations
- Characters/words using or related to 虫 (insects; lower life forms)
- Characters/words using or related to 鸟 (birds)
- Characters/words using or related to 鱼 (fish)
- Characters/words using or related to 木 (trees; wood)
- Rare characters in the wild

There is also an Anki Deck (here) which should already work, but should be thought of as a work-in-progress. (On a computer, AnkiDraw allows you to handwrite. On AnkiDroid, the in-built whiteboard feature enables handwriting.)

Summary

The MteH corpus is built to minimize "missing" characters; any characters not included are extremely rare or niche. Version v0.1.3 merges the following corpora:

#	Corpus	#chars	#used	Source / Reference
1	HSK 1.0	2,866	2,866	pre-2010, 11 levels
2	HSK 2.0	2,663	2,663	post-2010, 6 levels
3	HSK 3.0	3,000	3,000	2021 version 3.0 standards, 9 levels
4	HSK 3.1	3,088	3,088	2025 version 3.0 standards, 9 levels
5	TOCFL	3,027*	3,009	Taiwan's TOCFL 3100 + 33 traditional chars
6	K-5	1,817	1,812	K-5 word frequency
7	通用规范汉字表	3,500	3,495	Ministry of Education (2013)
8	现代汉语常用字表	3,500	3,491	Ministry of Education (1988)
9	primary school	2,468	2,467	China primary schools (2016)
10	Singapore	1,655	1,655	Singapore primary schools (2015)
11	Heisig	3,018	3,017	Heisig & Richardson, Remembering Simplified Hanzi I–II
12	Hoenig	2,177	2,159	Learn & Remember 2,178 Characters and Their Meanings
13	Jun Da	4,485*	4,254	modern Chinese corpus
14	SUBTLEX	4,462*	4,184	film and TV subtitle corpus
15	Tsai	4,329*	3,975	Usenet newsgroups (1993-1994)
16	Wikipedia	3,476*	3,221	Chinese Wikipedia
17	classical	1,968*	1,867	prior to the end of the Han dynasty
18	THUOCL	3,421*	3,222	mostly Sogou webpages
19	Leeds	4,230*	4,073	Internet corpus
20	BLCU	4,445*	4,089	"balanced", written Chinese
21	LWC	4,130*	3,961	Sina Weibo
22	food	1,182	1,101	food-related terms
23	species	4,086	3,211	species names
24	Chinese surnames	1,745	1,566	1,807 Chinese surnames
25	Chinese names	2,269	1,989	1,200,000 Chinese names
26	city-geo	1,277	1,133	mainland China city terms
27	company	4,363*	3,645	company proper nouns
28	med-orgs	4,826	3,731	medical organizations
29	chengyu convention	2,226	2,172	characters in "chengyu convention" chengyu
30	Xinhua	5,357	4,081	Xinhua chengyu and xiehouyu

Those marked * have extraction steps (documented in their respective readmes): selection of top-N words/characters, conversion from traditional to simplified.

Characters are ordered in Unicode order (excluding variants), grouping visually or structurally related forms as much as possible.

MteH also incorporates:

Character structure data and character drawings from Make Me a Hanzi and cjkvi-ids
Frequency data from Jun Da’s modern corpus
Images from Pexels, Wikimedia, etc.

Statistics and debug reports: missing chars; corpus histogram; debug; modifications; syllables.

Name		Name	Last commit message	Last commit date
Latest commit History 1,394 Commits
anki		anki
debug		debug
extra_chars		extra_chars
latex		latex
sources		sources
versions		versions
README.md		README.md
how_to_choose_a_good_hint.md		how_to_choose_a_good_hint.md
mteh.txt		mteh.txt
mteh_anki_deck.apkg		mteh_anki_deck.apkg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

More than enough Hanzi (MteH)

Summary

License

About

Uh oh!

Releases 3

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

More than enough Hanzi (MteH)

Summary

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages