日本新字體 by danny0838 · Pull Request #494 · BYVoid/OpenCC

danny0838 · 2020-07-10T15:59:18Z

t2jp 方案比照 t2s 採兼容「OpenCC 標準體轉日本新字體」及「日本舊字體轉日本新字體」的做法，因此除了 OpenCC 標準體轉日本新字體以外，也補上《常用漢字表》等表舊字體轉新字體的部分。
擴充轉換移至 JPVariantsEx.txt，包括《表外漢字字體表》非簡慣優先的簡慣字體（簡慣優先的直接放 JPVariants.txt）及擴張新字體。擴張新字體與日本標準無衝突的直接轉換，有衝突的預設不轉換，只作為第二候選字。預設轉換方案 t2jp 不包含 JPVariantsEx.txt，並增加包括擴充轉換的 t2jpx 方案。jp2t 則包括 JPVariantsEx.txt 的逆轉換。
擴張新字體清單主要沿用新增日本新字體。 #371，額外加了幾字。
「龝」雖是「秋」的異體字而非 OpenCC 標準字，但考量 t2jpx 也包括舊字體轉新字體，源文本未必是嚴格的 OpenCC 標準字，因此仍予保留。

BYVoid · 2020-07-13T06:01:24Z

「非簡慣優先的簡慣字體」是什麼

danny0838 · 2020-07-13T06:18:21Z

「非簡慣優先的簡慣字體」是什麼

日本2000年的《表外漢字字體表》列出了印刷標準用字，有些字還附有簡易慣用字體（簡慣字體）。

大部分字是印刷標準用字為標準，簡易慣用字體為可接受的變體；但「曽」「痩」「麺」三字例外，以簡慣字體優先，2010年的《改定常用漢字表》也加收此三字作為標準用字。

所謂「非簡慣優先的簡慣字體」，就是指並非以簡慣字體優先的字的簡慣字體，比如「醤」「鹸」。由於它們只是「可接受的變體」而非標準字體，因此預設轉換方案不轉換，但擴充轉換方案 t2jpx 的邏輯既然是「盡可能多使用新字體、類推字」，因此也對它們做轉換。

- 扣除 Unicode 相容區的字 - ref: https://ja.wiktionary.org/wiki/%E4%BB%98%E9%8C%B2:%E5%B8%B8%E7%94%A8%E6%BC%A2%E5%AD%97%E3%81%AE%E4%B8%80%E8%A6%A7

- ref: https://zh.wikipedia.org/wiki/%E8%88%8A%E5%AD%97%E9%AB%94

- 刪除錯誤的「遥=>遙」 - 扣除 Unicode 相容區的字 - ref: https://ja.wikipedia.org/wiki/%E4%BA%BA%E5%90%8D%E7%94%A8%E6%BC%A2%E5%AD%97

- ref: https://ja.wiktionary.org/wiki/%E4%BB%98%E9%8C%B2:%E5%B8%B8%E7%94%A8%E6%BC%A2%E5%AD%97%E3%81%AE%E4%B8%80%E8%A6%A7

- 《表外漢字字體列表》大多使用康熙字典字形，少數使用日本新字體字形，有些也與 OpenCC 標準字體不同，須加入轉換。 - ref: https://zh.wikipedia.org/wiki/%E8%A1%A8%E5%A4%96%E6%BC%A2%E5%AD%97%E5%AD%97%E9%AB%94%E5%88%97%E8%A1%A8 - ref: https://ja.wiktionary.org/wiki/%E4%BB%98%E9%8C%B2:%E8%A1%A8%E5%A4%96%E6%BC%A2%E5%AD%97%E5%AD%97%E4%BD%93%E8%A1%A8%E3%81%AE%E6%BC%A2%E5%AD%97%E4%B8%80%E8%A6%A7

- ref: https://ja.wikipedia.org/wiki/%E6%8B%A1%E5%BC%B5%E6%96%B0%E5%AD%97%E4%BD%93

- 《表外漢字字體列表》以印刷標準字體為主，簡易慣用字體亦可接受，故取消強制轉換，移至 JPVariantsEx.txt。（明訂簡慣優先的且收錄於《改定常用漢字表》的「曽」「痩」「麺」除外） - 預設轉換方案 t2jp 不包括 JPVariantsEx.txt，另外增加包括擴充轉換的 t2jpx 方案。jp2t 則包括還原擴充轉換。

…「卒卆」「貮貳」「蕟𫈴」.

- 「龝」為「秋」之異體字，為完整支援舊字體轉新字體而予保留

maxmellen · 2022-05-03T09:35:25Z

@BYVoid Any chance of seeing something like this merged?

I understand that you don't think non-BPM 擴張新字體 should be part of the t2jp preset, but I think having this separate t2jpx preset is a great way to separate "converting to the Japanese Standard" and "using Japanese shorthands as much as possible".
Either way, I would love to see a tool that lets me use 日本擴張新字體 as an alternate simplification scheme to 大陸簡化字.

@danny0838 Have you been using a fork in the meantime?

ayaka14732 · 2022-05-03T12:42:26Z

I am now developing StarCC, the next generation of OpenCC.

@danny0838 Could you make a PR there? We can work together on this project.

danny0838 · 2022-05-05T10:41:23Z

@ayaka14732 We are overloaded and probably won't be able to handle the cross-project compatibility shortly. You can port them from our project sts-lib, though.

* 回復《常用漢字表》的舊字體轉新字體 - 扣除 Unicode 相容區的字 - ref: https://ja.wiktionary.org/wiki/%E4%BB%98%E9%8C%B2:%E5%B8%B8%E7%94%A8%E6%BC%A2%E5%AD%97%E3%81%AE%E4%B8%80%E8%A6%A7 * 補上《改定常用漢字表》對《常用漢字表》補充的舊字體轉新字體 - ref: https://zh.wikipedia.org/wiki/%E8%88%8A%E5%AD%97%E9%AB%94 * 將《人名用漢字表》中的異體字轉為日本標準字 - 刪除錯誤的「遥=>遙」 - 扣除 Unicode 相容區的字 - ref: https://ja.wikipedia.org/wiki/%E4%BA%BA%E5%90%8D%E7%94%A8%E6%BC%A2%E5%AD%97 * 取消《常用漢字表》既有字的轉換 * 修正 jp2t 一對多，使之正確轉為標準字 * 將 JPShinjitaiCharacters.txt 中的舊字體改回「常用漢字表」中的字形同時刪除 JPShinjitaiCharacters.txt 對轉換無影響的「一對多」條目去掉表中無對應的「両 -> 輛」 * 更正 2 条 JPVariants.txt 记录 --------- Co-authored-by: Danny Lin <danny0838@gmail.com>

Merge the character-level Japanese variant data into JPShinjitaiCharacters.txt and remove the legacy JPVariants.txt source file. The reverse t2jp dictionary is now generated from JPShinjitaiCharacters.txt at build time, so CMake, Bazel, and GYP only maintain one authoritative character mapping source. Replay this refactor on top of upstream master after 2de1f38, which imported the JPVariants.txt and JPShinjitaiCharacters.txt updates from danny0838's #494. The new upstream mappings are preserved by folding them into JPShinjitaiCharacters.txt in the jp2t direction, including the 常用漢字 and 人名用漢字 additions such as 併/倂, 挙/擧, 渋/澁, 闘/鬭, 鶏/鷄, and 麺/麵. Keep the ambiguity fixes from the original refactor: remove the 両 -> 輛 candidate, keep 弁 to 辨/辯/瓣, and do not regenerate the 庄 -> 莊 or 棱 -> 棱 jp2t entries. This leaves 莊 and 棱 to reverse to the preferred 荘 and 稜 candidates when t2jp is generated. Behavior was checked with scripts/compile_to_inline_config.py against the latest upstream base: t2jp has zero inline entry and conversion diffs, while jp2t only intentionally differs for 庄 (莊 -> 庄) plus the behavior-neutral removal of the 棱 self-mapping.

frankslin · 2026-06-11T02:32:35Z

见 #1303 (comment) ；已部分 cherry-pick 并 merge。

…figs, with limited behavior cleanup (#1302) * Refactor Japanese Shinjitai dictionaries on latest JP base Merge the character-level Japanese variant data into JPShinjitaiCharacters.txt and remove the legacy JPVariants.txt source file. The reverse t2jp dictionary is now generated from JPShinjitaiCharacters.txt at build time, so CMake, Bazel, and GYP only maintain one authoritative character mapping source. Replay this refactor on top of upstream master after 2de1f38, which imported the JPVariants.txt and JPShinjitaiCharacters.txt updates from danny0838's #494. The new upstream mappings are preserved by folding them into JPShinjitaiCharacters.txt in the jp2t direction, including the 常用漢字 and 人名用漢字 additions such as 併/倂, 挙/擧, 渋/澁, 闘/鬭, 鶏/鷄, and 麺/麵. Keep the ambiguity fixes from the original refactor: remove the 両 -> 輛 candidate, keep 弁 to 辨/辯/瓣, and do not regenerate the 庄 -> 莊 or 棱 -> 棱 jp2t entries. This leaves 莊 and 棱 to reverse to the preferred 荘 and 稜 candidates when t2jp is generated. Behavior was checked with scripts/compile_to_inline_config.py against the latest upstream base: t2jp has zero inline entry and conversion diffs, while jp2t only intentionally differs for 庄 (莊 -> 庄) plus the behavior-neutral removal of the 棱 self-mapping. * Update CLI config descriptions for t2jp.json and jp2t.json

danny0838 force-pushed the 日本新字體 branch 18 times, most recently from b593869 to 8cc894b Compare July 12, 2020 14:05

danny0838 force-pushed the 日本新字體 branch 2 times, most recently from 3eff2be to 9b0ec92 Compare July 16, 2020 12:28

danny0838 added 8 commits July 21, 2020 21:14

回復《常用漢字表》的舊字體轉新字體

ea2dbb0

- 扣除 Unicode 相容區的字 - ref: https://ja.wiktionary.org/wiki/%E4%BB%98%E9%8C%B2:%E5%B8%B8%E7%94%A8%E6%BC%A2%E5%AD%97%E3%81%AE%E4%B8%80%E8%A6%A7

補上《改定常用漢字表》對《常用漢字表》補充的舊字體轉新字體

9f4620a

- ref: https://zh.wikipedia.org/wiki/%E8%88%8A%E5%AD%97%E9%AB%94

將《人名用漢字表》中的異體字轉為日本標準字

d0717e4

- 刪除錯誤的「遥=>遙」 - 扣除 Unicode 相容區的字 - ref: https://ja.wikipedia.org/wiki/%E4%BA%BA%E5%90%8D%E7%94%A8%E6%BC%A2%E5%AD%97

補充將標準字體轉為《常用漢字表》規定的字體

e869d89

- ref: https://ja.wiktionary.org/wiki/%E4%BB%98%E9%8C%B2:%E5%B8%B8%E7%94%A8%E6%BC%A2%E5%AD%97%E3%81%AE%E4%B8%80%E8%A6%A7

將 JIS 曾收錄但不再使用的擴張新字體轉為日本標準字體

76a0e8f

- ref: https://ja.wikipedia.org/wiki/%E6%8B%A1%E5%BC%B5%E6%96%B0%E5%AD%97%E4%BD%93

取消《常用漢字表》既有字的轉換

6ea3a10

修正 jp2t 一對多，使之正確轉為標準字

d60214e

danny0838 and others added 6 commits July 21, 2020 21:14

新增日本新字體。包括唯一標準的日本字形（如「粤」而不是「粵」）以及擴張新字體（如「𦜝」，標準字形是「臍」）。

4f38383

Deletion of four sets of nonstandard or inaccurate mapping pairs「龝穐」…

782255b

…「卒卆」「貮貳」「蕟𫈴」.

擴張新字體移至 JPVariantsEx.txt

dca5e9b

加收一些擴張新字體

2906f2d

- 「龝」為「秋」之異體字，為完整支援舊字體轉新字體而予保留

與日本標準衝突的擴張新字體改為預設不使用。取消把擴張新字體轉回日本標準字體

0a583e9

danny0838 force-pushed the 日本新字體 branch from 9b0ec92 to 0a583e9 Compare July 21, 2020 13:18

danny0838 mentioned this pull request Jun 13, 2022

新增轉換：驒→騨 #694

Merged

danny0838 mentioned this pull request Jun 1, 2026

修正並轉換更多日文新字體詞匯 #1265

Merged

This was referenced Jun 9, 2026

更正 JPVariants.txt 中的若干错误 #1294

Closed

關於 t2jp/jp2t 的方針探討 #1298

Closed

frankslin mentioned this pull request Jun 11, 2026

Cherrypick 5 of the first 8 commits in danny0838's #494 #1303

Merged

frankslin mentioned this pull request Jun 11, 2026

Refactor Japanese Shinjitai dictionaries, naming conventions, and configs, with limited behavior cleanup #1302

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

日本新字體#494

Hidden character warning

日本新字體#494
danny0838 wants to merge 14 commits into
BYVoid:masterfrom
danny0838:日本新字體

danny0838 commented Jul 10, 2020 •

edited

Loading

Uh oh!

BYVoid commented Jul 13, 2020

Uh oh!

danny0838 commented Jul 13, 2020 •

edited

Loading

Uh oh!

maxmellen commented May 3, 2022

Uh oh!

ayaka14732 commented May 3, 2022

Uh oh!

danny0838 commented May 5, 2022

Uh oh!

frankslin commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Uh oh!

Conversation

danny0838 commented Jul 10, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

BYVoid commented Jul 13, 2020

Uh oh!

danny0838 commented Jul 13, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maxmellen commented May 3, 2022

Uh oh!

ayaka14732 commented May 3, 2022

Uh oh!

danny0838 commented May 5, 2022

Uh oh!

frankslin commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

danny0838 commented Jul 10, 2020 •

edited

Loading

danny0838 commented Jul 13, 2020 •

edited

Loading