Skip to content

日本新字體#494

Open
danny0838 wants to merge 14 commits into
BYVoid:masterfrom
danny0838:日本新字體

Hidden character warning

The head ref may contain hidden characters: "\u65e5\u672c\u65b0\u5b57\u9ad4"
Open

日本新字體#494
danny0838 wants to merge 14 commits into
BYVoid:masterfrom
danny0838:日本新字體

Conversation

@danny0838

@danny0838 danny0838 commented Jul 10, 2020

Copy link
Copy Markdown
Contributor
  • t2jp 方案比照 t2s 採兼容「OpenCC 標準體轉日本新字體」及「日本舊字體轉日本新字體」的做法,因此除了 OpenCC 標準體轉日本新字體以外,也補上《常用漢字表》等表舊字體轉新字體的部分。
  • 擴充轉換移至 JPVariantsEx.txt,包括《表外漢字字體表》非簡慣優先的簡慣字體(簡慣優先的直接放 JPVariants.txt)及擴張新字體。擴張新字體與日本標準無衝突的直接轉換,有衝突的預設不轉換,只作為第二候選字。預設轉換方案 t2jp 不包含 JPVariantsEx.txt,並增加包括擴充轉換的 t2jpx 方案。jp2t 則包括 JPVariantsEx.txt 的逆轉換。
  • 擴張新字體清單主要沿用 新增日本新字體。 #371,額外加了幾字。
  • 「龝」雖是「秋」的異體字而非 OpenCC 標準字,但考量 t2jpx 也包括舊字體轉新字體,源文本未必是嚴格的 OpenCC 標準字,因此仍予保留。

@danny0838 danny0838 force-pushed the 日本新字體 branch 18 times, most recently from b593869 to 8cc894b Compare July 12, 2020 14:05
@BYVoid

BYVoid commented Jul 13, 2020

Copy link
Copy Markdown
Owner

「非簡慣優先的簡慣字體」是什麼

@danny0838

danny0838 commented Jul 13, 2020

Copy link
Copy Markdown
Contributor Author

「非簡慣優先的簡慣字體」是什麼

日本2000年的《表外漢字字體表》列出了印刷標準用字,有些字還附有簡易慣用字體(簡慣字體)。

大部分字是印刷標準用字為標準,簡易慣用字體為可接受的變體;但「曽」「痩」「麺」三字例外,以簡慣字體優先,2010年的《改定常用漢字表》也加收此三字作為標準用字。

所謂「非簡慣優先的簡慣字體」,就是指並非以簡慣字體優先的字的簡慣字體,比如「醤」「鹸」。由於它們只是「可接受的變體」而非標準字體,因此預設轉換方案不轉換,但擴充轉換方案 t2jpx 的邏輯既然是「盡可能多使用新字體、類推字」,因此也對它們做轉換。

@danny0838 danny0838 force-pushed the 日本新字體 branch 2 times, most recently from 3eff2be to 9b0ec92 Compare July 16, 2020 12:28
danny0838 and others added 6 commits July 21, 2020 21:14
- 《表外漢字字體列表》以印刷標準字體為主,簡易慣用字體亦可接受,故取消強制轉換,移至 JPVariantsEx.txt。(明訂簡慣優先的且收錄於《改定常用漢字表》的「曽」「痩」「麺」除外)
- 預設轉換方案 t2jp 不包括 JPVariantsEx.txt,另外增加包括擴充轉換的 t2jpx 方案。jp2t 則包括還原擴充轉換。
- 「龝」為「秋」之異體字,為完整支援舊字體轉新字體而予保留
@maxmellen

Copy link
Copy Markdown

@BYVoid Any chance of seeing something like this merged?

I understand that you don't think non-BPM 擴張新字體 should be part of the t2jp preset, but I think having this separate t2jpx preset is a great way to separate "converting to the Japanese Standard" and "using Japanese shorthands as much as possible".
Either way, I would love to see a tool that lets me use 日本擴張新字體 as an alternate simplification scheme to 大陸簡化字.

@danny0838 Have you been using a fork in the meantime?

@ayaka14732

Copy link
Copy Markdown
Collaborator

I am now developing StarCC, the next generation of OpenCC.

@danny0838 Could you make a PR there? We can work together on this project.

@danny0838

Copy link
Copy Markdown
Contributor Author

@ayaka14732 We are overloaded and probably won't be able to handle the cross-project compatibility shortly. You can port them from our project sts-lib, though.

frankslin added a commit that referenced this pull request Jun 11, 2026
* 回復《常用漢字表》的舊字體轉新字體

- 扣除 Unicode 相容區的字
- ref: https://ja.wiktionary.org/wiki/%E4%BB%98%E9%8C%B2:%E5%B8%B8%E7%94%A8%E6%BC%A2%E5%AD%97%E3%81%AE%E4%B8%80%E8%A6%A7

* 補上《改定常用漢字表》對《常用漢字表》補充的舊字體轉新字體

- ref: https://zh.wikipedia.org/wiki/%E8%88%8A%E5%AD%97%E9%AB%94

* 將《人名用漢字表》中的異體字轉為日本標準字

- 刪除錯誤的「遥=>遙」
- 扣除 Unicode 相容區的字
- ref: https://ja.wikipedia.org/wiki/%E4%BA%BA%E5%90%8D%E7%94%A8%E6%BC%A2%E5%AD%97

* 取消《常用漢字表》既有字的轉換

* 修正 jp2t 一對多,使之正確轉為標準字

* 將 JPShinjitaiCharacters.txt 中的舊字體改回「常用漢字表」中的字形

同時刪除 JPShinjitaiCharacters.txt 對轉換無影響的「一對多」條目
去掉表中無對應的「両 -> 輛」

* 更正 2 条 JPVariants.txt 记录

---------

Co-authored-by: Danny Lin <danny0838@gmail.com>
frankslin added a commit that referenced this pull request Jun 11, 2026
Merge the character-level Japanese variant data into JPShinjitaiCharacters.txt and remove the legacy JPVariants.txt source file. The reverse t2jp dictionary is now generated from JPShinjitaiCharacters.txt at build time, so CMake, Bazel, and GYP only maintain one authoritative character mapping source.

Replay this refactor on top of upstream master after 2de1f38, which imported the JPVariants.txt and JPShinjitaiCharacters.txt updates from danny0838's #494. The new upstream mappings are preserved by folding them into JPShinjitaiCharacters.txt in the jp2t direction, including the 常用漢字 and 人名用漢字 additions such as 併/倂, 挙/擧, 渋/澁, 闘/鬭, 鶏/鷄, and 麺/麵.

Keep the ambiguity fixes from the original refactor: remove the 両 -> 輛 candidate, keep 弁 to 辨/辯/瓣, and do not regenerate the 庄 -> 莊 or 棱 -> 棱 jp2t entries. This leaves 莊 and 棱 to reverse to the preferred 荘 and 稜 candidates when t2jp is generated.

Behavior was checked with scripts/compile_to_inline_config.py against the latest upstream base: t2jp has zero inline entry and conversion diffs, while jp2t only intentionally differs for 庄 (莊 -> 庄) plus the behavior-neutral removal of the 棱 self-mapping.
@frankslin

Copy link
Copy Markdown
Collaborator

#1303 (comment) ;已部分 cherry-pick 并 merge。

frankslin added a commit that referenced this pull request Jun 11, 2026
…figs, with limited behavior cleanup (#1302)

* Refactor Japanese Shinjitai dictionaries on latest JP base

Merge the character-level Japanese variant data into JPShinjitaiCharacters.txt and remove the legacy JPVariants.txt source file. The reverse t2jp dictionary is now generated from JPShinjitaiCharacters.txt at build time, so CMake, Bazel, and GYP only maintain one authoritative character mapping source.

Replay this refactor on top of upstream master after 2de1f38, which imported the JPVariants.txt and JPShinjitaiCharacters.txt updates from danny0838's #494. The new upstream mappings are preserved by folding them into JPShinjitaiCharacters.txt in the jp2t direction, including the 常用漢字 and 人名用漢字 additions such as 併/倂, 挙/擧, 渋/澁, 闘/鬭, 鶏/鷄, and 麺/麵.

Keep the ambiguity fixes from the original refactor: remove the 両 -> 輛 candidate, keep 弁 to 辨/辯/瓣, and do not regenerate the 庄 -> 莊 or 棱 -> 棱 jp2t entries. This leaves 莊 and 棱 to reverse to the preferred 荘 and 稜 candidates when t2jp is generated.

Behavior was checked with scripts/compile_to_inline_config.py against the latest upstream base: t2jp has zero inline entry and conversion diffs, while jp2t only intentionally differs for 庄 (莊 -> 庄) plus the behavior-neutral removal of the 棱 self-mapping.

* Update CLI config descriptions for t2jp.json and jp2t.json
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants