日本新字體#494
Hidden character warning
Conversation
b593869 to
8cc894b
Compare
|
「非簡慣優先的簡慣字體」是什麼 |
日本2000年的《表外漢字字體表》列出了印刷標準用字,有些字還附有簡易慣用字體(簡慣字體)。 大部分字是印刷標準用字為標準,簡易慣用字體為可接受的變體;但「曽」「痩」「麺」三字例外,以簡慣字體優先,2010年的《改定常用漢字表》也加收此三字作為標準用字。 所謂「非簡慣優先的簡慣字體」,就是指並非以簡慣字體優先的字的簡慣字體,比如「醤」「鹸」。由於它們只是「可接受的變體」而非標準字體,因此預設轉換方案不轉換,但擴充轉換方案 t2jpx 的邏輯既然是「盡可能多使用新字體、類推字」,因此也對它們做轉換。 |
3eff2be to
9b0ec92
Compare
- 刪除錯誤的「遥=>遙」 - 扣除 Unicode 相容區的字 - ref: https://ja.wikipedia.org/wiki/%E4%BA%BA%E5%90%8D%E7%94%A8%E6%BC%A2%E5%AD%97
- 《表外漢字字體列表》大多使用康熙字典字形,少數使用日本新字體字形,有些也與 OpenCC 標準字體不同,須加入轉換。 - ref: https://zh.wikipedia.org/wiki/%E8%A1%A8%E5%A4%96%E6%BC%A2%E5%AD%97%E5%AD%97%E9%AB%94%E5%88%97%E8%A1%A8 - ref: https://ja.wiktionary.org/wiki/%E4%BB%98%E9%8C%B2:%E8%A1%A8%E5%A4%96%E6%BC%A2%E5%AD%97%E5%AD%97%E4%BD%93%E8%A1%A8%E3%81%AE%E6%BC%A2%E5%AD%97%E4%B8%80%E8%A6%A7
- 《表外漢字字體列表》以印刷標準字體為主,簡易慣用字體亦可接受,故取消強制轉換,移至 JPVariantsEx.txt。(明訂簡慣優先的且收錄於《改定常用漢字表》的「曽」「痩」「麺」除外) - 預設轉換方案 t2jp 不包括 JPVariantsEx.txt,另外增加包括擴充轉換的 t2jpx 方案。jp2t 則包括還原擴充轉換。
|
@BYVoid Any chance of seeing something like this merged? I understand that you don't think non-BPM 擴張新字體 should be part of the @danny0838 Have you been using a fork in the meantime? |
|
I am now developing StarCC, the next generation of OpenCC. @danny0838 Could you make a PR there? We can work together on this project. |
|
@ayaka14732 We are overloaded and probably won't be able to handle the cross-project compatibility shortly. You can port them from our project sts-lib, though. |
* 回復《常用漢字表》的舊字體轉新字體 - 扣除 Unicode 相容區的字 - ref: https://ja.wiktionary.org/wiki/%E4%BB%98%E9%8C%B2:%E5%B8%B8%E7%94%A8%E6%BC%A2%E5%AD%97%E3%81%AE%E4%B8%80%E8%A6%A7 * 補上《改定常用漢字表》對《常用漢字表》補充的舊字體轉新字體 - ref: https://zh.wikipedia.org/wiki/%E8%88%8A%E5%AD%97%E9%AB%94 * 將《人名用漢字表》中的異體字轉為日本標準字 - 刪除錯誤的「遥=>遙」 - 扣除 Unicode 相容區的字 - ref: https://ja.wikipedia.org/wiki/%E4%BA%BA%E5%90%8D%E7%94%A8%E6%BC%A2%E5%AD%97 * 取消《常用漢字表》既有字的轉換 * 修正 jp2t 一對多,使之正確轉為標準字 * 將 JPShinjitaiCharacters.txt 中的舊字體改回「常用漢字表」中的字形 同時刪除 JPShinjitaiCharacters.txt 對轉換無影響的「一對多」條目 去掉表中無對應的「両 -> 輛」 * 更正 2 条 JPVariants.txt 记录 --------- Co-authored-by: Danny Lin <danny0838@gmail.com>
Merge the character-level Japanese variant data into JPShinjitaiCharacters.txt and remove the legacy JPVariants.txt source file. The reverse t2jp dictionary is now generated from JPShinjitaiCharacters.txt at build time, so CMake, Bazel, and GYP only maintain one authoritative character mapping source. Replay this refactor on top of upstream master after 2de1f38, which imported the JPVariants.txt and JPShinjitaiCharacters.txt updates from danny0838's #494. The new upstream mappings are preserved by folding them into JPShinjitaiCharacters.txt in the jp2t direction, including the 常用漢字 and 人名用漢字 additions such as 併/倂, 挙/擧, 渋/澁, 闘/鬭, 鶏/鷄, and 麺/麵. Keep the ambiguity fixes from the original refactor: remove the 両 -> 輛 candidate, keep 弁 to 辨/辯/瓣, and do not regenerate the 庄 -> 莊 or 棱 -> 棱 jp2t entries. This leaves 莊 and 棱 to reverse to the preferred 荘 and 稜 candidates when t2jp is generated. Behavior was checked with scripts/compile_to_inline_config.py against the latest upstream base: t2jp has zero inline entry and conversion diffs, while jp2t only intentionally differs for 庄 (莊 -> 庄) plus the behavior-neutral removal of the 棱 self-mapping.
|
见 #1303 (comment) ;已部分 cherry-pick 并 merge。 |
…figs, with limited behavior cleanup (#1302) * Refactor Japanese Shinjitai dictionaries on latest JP base Merge the character-level Japanese variant data into JPShinjitaiCharacters.txt and remove the legacy JPVariants.txt source file. The reverse t2jp dictionary is now generated from JPShinjitaiCharacters.txt at build time, so CMake, Bazel, and GYP only maintain one authoritative character mapping source. Replay this refactor on top of upstream master after 2de1f38, which imported the JPVariants.txt and JPShinjitaiCharacters.txt updates from danny0838's #494. The new upstream mappings are preserved by folding them into JPShinjitaiCharacters.txt in the jp2t direction, including the 常用漢字 and 人名用漢字 additions such as 併/倂, 挙/擧, 渋/澁, 闘/鬭, 鶏/鷄, and 麺/麵. Keep the ambiguity fixes from the original refactor: remove the 両 -> 輛 candidate, keep 弁 to 辨/辯/瓣, and do not regenerate the 庄 -> 莊 or 棱 -> 棱 jp2t entries. This leaves 莊 and 棱 to reverse to the preferred 荘 and 稜 candidates when t2jp is generated. Behavior was checked with scripts/compile_to_inline_config.py against the latest upstream base: t2jp has zero inline entry and conversion diffs, while jp2t only intentionally differs for 庄 (莊 -> 庄) plus the behavior-neutral removal of the 棱 self-mapping. * Update CLI config descriptions for t2jp.json and jp2t.json
Uh oh!
There was an error while loading. Please reload this page.