Skip to content

Dictionary of pairs of Korean word and IPA crawled from Wiktionary (Korean edition)

License

MIT, CC-BY-SA-4.0 licenses found

Licenses found

MIT
LICENSE
CC-BY-SA-4.0
LICENSE-CC-BY-SA
Notifications You must be signed in to change notification settings

uniglot/korean-word-ipa-dictionary

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Korean Word-IPA Dictionary

1. Getting List of Word Entries

From the latest Kowiktionary dump, I got the list of every word in main namespace. After getting this list, I filtered out all entries which are not written in Hangul, and stored Korean word entries in the file kodict_entry.txt.

2. Crawling

By running crawl.py simultaneously on 11 subsets of kodict_entry.txt, which consist of 6000 words (except the last one), I extracted IPA information, forming a word-IPA dictionary for Korean language. After the crawling processes are all completed, I appended the results in alphabetical order, and deleted entries with no extracted IPA.

3. Converting IPA to X-SAMPA

From any word-IPA dictionary files, you can convert it to word-X-SAMPA dictionary.

from convert import Converter

conv = Converter()
conv.subst_dict(<NAME_OF_DICT>)

4. Licenses

You can make use of the results of scripts (i.e., .dict files and kodict_entry.txt file) under CC BY-SA. You can use the scripts under MIT License.

About

Dictionary of pairs of Korean word and IPA crawled from Wiktionary (Korean edition)

Topics

Resources

License

MIT, CC-BY-SA-4.0 licenses found

Licenses found

MIT
LICENSE
CC-BY-SA-4.0
LICENSE-CC-BY-SA

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages