Japanese (draft)

Updated 20 September, 2021

This page gathers basic information about the Japanese writing system, which includes the use of Kanji, Hiragana, Katakana and Latin scripts. It aims (generally) to provide an overview of the orthography and typographic features, and (specifically) to advise how to write Japanese using Unicode.

More about using this page
Related pages.
Other script summaries.

Sample (Japanese)

Select part of this sample text to show a list of characters, with links to more details.
Change size:   28px

第1条 すべての人間は、生まれながらにして自由であり、かつ、尊厳と権利とについて平等である。人間は、理性と良心とを授けられており、互いに同胞の精神をもって行動しなければならない。

第2条 すべて人は、人種、皮膚の色、性、言語、宗教、政治上その他の意見、国民的もしくは社会的出身、財産、門地その他の地位又はこれに類するいかなる自由による差別をも受けることなく、この宣言に掲げるすべての権利と自由とを享有することができる。 さらに、個人の属する国又は地域が独立国であると、信託統治地域であると、非自治地域であると、又は他のなんらかの主権制限の下にあるとを問わず、その国又は地域の政治上、管轄上又は国際上の地位に基ずくいかなる差別もしてはならない。

Usage & history

The 'Japanese orthography' described here is a mixture of 4 scripts which are all used together for any Japanese text: Han (kanji), Hiragana, Katakana, and Latin. Together they are what is used to write the Japanese language.

日本語 nihoŋgo Japanese

"Kanji characters were introduced to Japan around the 3rd century, it is thought from Korea. Until the 7th or 8th century, the Japanese language was written exclusively in these Chinese characters. Initially these were used phonetically to represent similar-sounding Japanese syllables, regardless of their meaning in written Chinese. However, the process of writing Japanese solely in kanji was laborious; each symbol consisted of a number of strokes and only represented one syllable. Two simplified forms of writing began to emerge around the 7th century. The modern hiragana script developed from a simplified cursive style originally developed by women, who were discouraged from learning kanji, and katakana was developed by Buddhist scholars who wrote only one element of each kanji symbol as a form of shorthand."s

Sources: Scriptsource, Wikipedia

Basic features

Four scripts are used, mixed together to write Japanese: kanji (han), katakana, hiragana, and latin. Essentially, Japanese writing is a mixture of an ideographic and a syllabic script. Non-latin letters typically represent a spoken syllable. See the table to the right for a brief overview of features for the modern Japanese orthography. The character count reflects a typical set of characters needed for everyday reading and writing: there are thousands more kanji characters that could be added for other purposes.

Kanji characters are mostly derived from the Chinese Han script. They are used for word roots.

The term kana covers two syllabaries that are used with kanji characters (see Han) to write Japanese. See the table to the right for a brief overview of features, taken from the Script Comparison Table.

One syllabary is hiragana, the other katakana. In both cases, the repertoire includes 5 independent vowel sounds, one nasal sound, and the rest are consonant+vowel combinations. There are a small number of additional characters with particular functions, such a katakana lengthening mark, and a few small characters for representing medial glides.

Text can be written horizontally or vertically. The visual forms of characters don't interact.

Character index

Letters

Show

See:

  1. kanji
  2. lists

CJK compatibility characters

逸逸謁謁禍禍悔悔海海慨慨喝喝褐褐漢漢祈祈既既器器響響勤勤謹謹穀穀殺殺祉祉視視者者煮煮臭臭祝祝暑暑署署諸諸祥祥神神節節祖祖僧僧層層贈贈贈贈嘆嘆著著懲懲塚塚都都突突難難梅梅繁繁卑卑碑碑賓賓頻頻敏敏侮侮福福塀塀勉勉墨墨免免欄欄隆隆虜虜類類練練朗朗廊廊

Kana

あアいイうウえエおオかカきキくクけケこコさサしシすスせセそソたタちチつツてテとトなナにニぬヌねネのノはハひヒふフへヘほホまマみミむムめメもモやヤゆユよヨらラりリるルれレろロわワをヲんン␣がガぎギぐグげゲごゴざザじジずズぜゼぞゾだダぢヂづヅでデどドばバびビぶブべベぼボヴヷヺ␣ぱパぴピぷプぺペぽポ␣ぁァぃィぅゥぇェぉォゃャゅュょョゎヮっッー

Half-width katakana

・ヲァィゥェォャュョッーアイウエオカキクケコサシスセソタチツテトナニヌネノハヒフヘホマミムメモヤユヨラリルレロワン゙゚

Other

ゐ␣ゑ␣ヸ␣ヹ␣ー␣ヽ␣ヾ␣じ␣ず␣ぢ␣づ
ゐ␣ゑ␣ヸ␣ヹ␣ー␣ヽ␣ヾ␣じ␣ず␣ぢ␣づ

Combining marks

Show
゙␣゚

Punctuation

Show
!␣"␣%␣(␣)␣*␣,␣.␣/␣:␣;␣?␣@␣[␣\␣]␣{␣}␣§␣¶␣‐␣—␣―␣‖␣‘␣’␣“␣”␣†␣‡␣‥␣…␣‰␣′␣″␣※␣‼␣⁇␣⁈␣⁉␣⸺␣〃␣〈␣〉␣《␣》␣「␣」␣『␣』␣【␣】␣〔␣〕␣〖␣〗␣〜␣〝␣〟␣゠␣・␣!␣"␣#␣%␣&␣'␣(␣)␣*␣,␣-␣.␣/␣:␣;␣?␣@␣[␣\␣]␣{␣}
・␣、␣、␣。␣。␣「␣」␣゙␣゚␣゛␣゜

Symbols

Show

See: lists

゛␣゜

Other

Show
 

Phonology

These are sounds for the standard Japanese language.

Click on the sounds to reveal locations in this document where they are mentioned.

Phones in a lighter colour are non-native or allophones.

Vowel sounds

Plain vowels

i u e o a

Consonant sounds

labial dental alveolar post-
alveolar
palatal velar uvular glottal
stop p b t d       k ɡ    
affricate   t͡s d͡z   t͡ɕ d͡ʑ        
fricative ɸ   s z ɕ ʑ ç     h
nasal m   n   ɲ ŋ ɴ
approximant w       j    
trill/flap     r    

Structure

The Japanese language, unlike many neighouring languages, uses polysyllabic words, and has no tones. It is an agglutinative language, and doesn't use spaces or other characters to separate words.

Wikipediaw,#Sentences,_phrases_and_words has a nicely written summary of the structural characteristics:

Text (文章 bunshō) is composed of sentences ( bun), which are in turn composed of phrases (文節 bunsetsu), which are its smallest coherent components. Like Chinese and classical Korean, written Japanese does not typically demarcate words with spaces; its agglutinative nature further makes the concept of a word rather different from words in English. The reader identifies word divisions by semantic cues and a knowledge of phrase structure. Phrases have a single meaning-bearing word, followed by a string of suffixes, auxiliary verbs and particles to modify its meaning and designate its grammatical role. In the following example, phrases are indicated by vertical bars:

太陽が|東の|空に|昇る。
taiyō ga | higashi no | sora ni | noboru
sun SUBJECT | east POSSESSIVE | sky LOCATIVE | rise
The sun rises in the eastern sky.

Romanised Japanese text may add spaces between bunsetsu phrases, with hyphens separating suffixes (eg. higashi-no), or may also separate the suffixes using spaces (ie. higashi no).

Characters

Kanji characters

Kanji characters are mostly derived from the Chinese Han script. They are commonly used for word roots and compound words.

静電気

The compound word static electricity (seidenki), written with kanji.

Kanji characters are primarily constructed from characters that each represent a phonetic symbol. Some have pictographic origins that are still evident, whereas others have a more complicated structure.

In reforms in the mid 20th century, the Japanese repertoire was standardised on around 2,000 characters, however standardised computer character sets support a few thousand more.

The Jōyō kanji character set (常用漢字) is intended as a literacy baseline for those who have completed compulsory education, as well as a list of permitted characters and readings for use in official government documentswjy.

A second list, called Jinmeiyō kanji (人名用漢字), is a supplementary list of 863 characters that can legally be used in registered personal names in Japan.wj

The number of characters in these lists changes from time to time. The Wikipedia articles for Jōyō and Jinmeiyō kanji provide useful timelines indicating changes over the years.

In addition to the basic lists, there are a number of variants and traditional forms that need to be considered. Note also that the jōyō character set only provides a baseline for the educational process.

Show kanji characters in jōyō and jinmeiyō lists in 2020.
Jōyō kanji 亜 哀 挨 愛 曖 悪 握 圧 扱 宛 嵐 安 案 暗 以 衣 位 囲 医 依 委 威 為 畏 胃 尉 異 移 萎 偉 椅 彙 意 違 維 慰 遺 緯 域 育 一 壱 逸 茨 芋 引 印 因 咽 姻 員 院 淫 陰 飲 隠 韻 右 宇 羽 雨 唄 鬱 畝 浦 運 雲 永 泳 英 映 栄 営 詠 影 鋭 衛 易 疫 益 液 駅 悦 越 謁 閲 円 延 沿 炎 怨 宴 媛 援 園 煙 猿 遠 鉛 塩 演 縁 艶 汚 王 凹 央 応 往 押 旺 欧 殴 桜 翁 奥 横 岡 屋 億 憶 臆 虞 乙 俺 卸 音 恩 温 穏 下 化 火 加 可 仮 何 花 佳 価 果 河 苛 科 架 夏 家 荷 華 菓 貨 渦 過 嫁 暇 禍 靴 寡 歌 箇 稼 課 蚊 牙 瓦 我 画 芽 賀 雅 餓 介 回 灰 会 快 戒 改 怪 拐 悔 海 界 皆 械 絵 開 階 塊 楷 解 潰 壊 懐 諧 貝 外 劾 害 崖 涯 街 慨 蓋 該 概 骸 垣 柿 各 角 拡 革 格 核 殻 郭 覚 較 隔 閣 確 獲 嚇 穫 学 岳 楽 額 顎 掛 潟 括 活 喝 渇 割 葛 滑 褐 轄 且 株 釜 鎌 刈 干 刊 甘 汗 缶 完 肝 官 冠 巻 看 陥 乾 勘 患 貫 寒 喚 堪 換 敢 棺 款 間 閑 勧 寛 幹 感 漢 慣 管 関 歓 監 緩 憾 還 館 環 簡 観 韓 艦 鑑 丸 含 岸 岩 玩 眼 頑 顔 願 企 伎 危 机 気 岐 希 忌 汽 奇 祈 季 紀 軌 既 記 起 飢 鬼 帰 基 寄 規 亀 喜 幾 揮 期 棋 貴 棄 毀 旗 器 畿 輝 機 騎 技 宜 偽 欺 義 疑 儀 戯 擬 犠 議 菊 吉 喫 詰 却 客 脚 逆 虐 九 久 及 弓 丘 旧 休 吸 朽 臼 求 究 泣 急 級 糾 宮 救 球 給 嗅 窮 牛 去 巨 居 拒 拠 挙 虚 許 距 魚 御 漁 凶 共 叫 狂 京 享 供 協 況 峡 挟 狭 恐 恭 胸 脅 強 教 郷 境 橋 矯 鏡 競 響 驚 仰 暁 業 凝 曲 局 極 玉 巾 斤 均 近 金 菌 勤 琴 筋 僅 禁 緊 錦 謹 襟 吟 銀 区 句 苦 駆 具 惧 愚 空 偶 遇 隅 串 屈 掘 窟 熊 繰 君 訓 勲 薫 軍 郡 群 兄 刑 形 系 径 茎 係 型 契 計 恵 啓 掲 渓 経 蛍 敬 景 軽 傾 携 継 詣 慶 憬 稽 憩 警 鶏 芸 迎 鯨 隙 劇 撃 激 桁 欠 穴 血 決 結 傑 潔 月 犬 件 見 券 肩 建 研 県 倹 兼 剣 拳 軒 健 険 圏 堅 検 嫌 献 絹 遣 権 憲 賢 謙 鍵 繭 顕 験 懸 元 幻 玄 言 弦 限 原 現 舷 減 源 厳 己 戸 古 呼 固 股 虎 孤 弧 故 枯 個 庫 湖 雇 誇 鼓 錮 顧 五 互 午 呉 後 娯 悟 碁 語 誤 護 口 工 公 勾 孔 功 巧 広 甲 交 光 向 后 好 江 考 行 坑 孝 抗 攻 更 効 幸 拘 肯 侯 厚 恒 洪 皇 紅 荒 郊 香 候 校 耕 航 貢 降 高 康 控 梗 黄 喉 慌 港 硬 絞 項 溝 鉱 構 綱 酵 稿 興 衡 鋼 講 購 乞 号 合 拷 剛 傲 豪 克 告 谷 刻 国 黒 穀 酷 獄 骨 駒 込 頃 今 困 昆 恨 根 婚 混 痕 紺 魂 墾 懇 左 佐 沙 査 砂 唆 差 詐 鎖 座 挫 才 再 災 妻 采 砕 宰 栽 彩 採 済 祭 斎 細 菜 最 裁 債 催 塞 歳 載 際 埼 在 材 剤 財 罪 崎 作 削 昨 柵 索 策 酢 搾 錯 咲 冊 札 刷 刹 拶 殺 察 撮 擦 雑 皿 三 山 参 桟 蚕 惨 産 傘 散 算 酸 賛 残 斬 暫 士 子 支 止 氏 仕 史 司 四 市 矢 旨 死 糸 至 伺 志 私 使 刺 始 姉 枝 祉 肢 姿 思 指 施 師 恣 紙 脂 視 紫 詞 歯 嗣 試 詩 資 飼 誌 雌 摯 賜 諮 示 字 寺 次 耳 自 似 児 事 侍 治 持 時 滋 慈 辞 磁 餌 璽 鹿 式 識 軸 七 𠮟 失 室 疾 執 湿 嫉 漆 質 実 芝 写 社 車 舎 者 射 捨 赦 斜 煮 遮 謝 邪 蛇 尺 借 酌 釈 爵 若 弱 寂 手 主 守 朱 取 狩 首 殊 珠 酒 腫 種 趣 寿 受 呪 授 需 儒 樹 収 囚 州 舟 秀 周 宗 拾 秋 臭 修 袖 終 羞 習 週 就 衆 集 愁 酬 醜 蹴 襲 十 汁 充 住 柔 重 従 渋 銃 獣 縦 叔 祝 宿 淑 粛 縮 塾 熟 出 述 術 俊 春 瞬 旬 巡 盾 准 殉 純 循 順 準 潤 遵 処 初 所 書 庶 暑 署 緒 諸 女 如 助 序 叙 徐 除 小 升 少 召 匠 床 抄 肖 尚 招 承 昇 松 沼 昭 宵 将 消 症 祥 称 笑 唱 商 渉 章 紹 訟 勝 掌 晶 焼 焦 硝 粧 詔 証 象 傷 奨 照 詳 彰 障 憧 衝 賞 償 礁 鐘 上 丈 冗 条 状 乗 城 浄 剰 常 情 場 畳 蒸 縄 壌 嬢 錠 譲 醸 色 拭 食 植 殖 飾 触 嘱 織 職 辱 尻 心 申 伸 臣 芯 身 辛 侵 信 津 神 唇 娠 振 浸 真 針 深 紳 進 森 診 寝 慎 新 審 震 薪 親 人 刃 仁 尽 迅 甚 陣 尋 腎 須 図 水 吹 垂 炊 帥 粋 衰 推 酔 遂 睡 穂 随 髄 枢 崇 数 据 杉 裾 寸 瀬 是 井 世 正 生 成 西 声 制 姓 征 性 青 斉 政 星 牲 省 凄 逝 清 盛 婿 晴 勢 聖 誠 精 製 誓 静 請 整 醒 税 夕 斥 石 赤 昔 析 席 脊 隻 惜 戚 責 跡 積 績 籍 切 折 拙 窃 接 設 雪 摂 節 説 舌 絶 千 川 仙 占 先 宣 専 泉 浅 洗 染 扇 栓 旋 船 戦 煎 羨 腺 詮 践 箋 銭 潜 線 遷 選 薦 繊 鮮 全 前 善 然 禅 漸 膳 繕 狙 阻 祖 租 素 措 粗 組 疎 訴 塑 遡 礎 双 壮 早 争 走 奏 相 荘 草 送 倉 捜 挿 桑 巣 掃 曹 曽 爽 窓 創 喪 痩 葬 装 僧 想 層 総 遭 槽 踪 操 燥 霜 騒 藻 造 像 増 憎 蔵 贈 臓 即 束 足 促 則 息 捉 速 側 測 俗 族 属 賊 続 卒 率 存 村 孫 尊 損 遜 他 多 汰 打 妥 唾 堕 惰 駄 太 対 体 耐 待 怠 胎 退 帯 泰 堆 袋 逮 替 貸 隊 滞 態 戴 大 代 台 第 題 滝 宅 択 沢 卓 拓 託 濯 諾 濁 但 達 脱 奪 棚 誰 丹 旦 担 単 炭 胆 探 淡 短 嘆 端 綻 誕 鍛 団 男 段 断 弾 暖 談 壇 地 池 知 値 恥 致 遅 痴 稚 置 緻 竹 畜 逐 蓄 築 秩 窒 茶 着 嫡 中 仲 虫 沖 宙 忠 抽 注 昼 柱 衷 酎 鋳 駐 著 貯 丁 弔 庁 兆 町 長 挑 帳 張 彫 眺 釣 頂 鳥 朝 貼 超 腸 跳 徴 嘲 潮 澄 調 聴 懲 直 勅 捗 沈 珍 朕 陳 賃 鎮 追 椎 墜 通 痛 塚 漬 坪 爪 鶴 低 呈 廷 弟 定 底 抵 邸 亭 貞 帝 訂 庭 逓 停 偵 堤 提 程 艇 締 諦 泥 的 笛 摘 滴 適 敵 溺 迭 哲 鉄 徹 撤 天 典 店 点 展 添 転 塡 田 伝 殿 電 斗 吐 妬 徒 途 都 渡 塗 賭 土 奴 努 度 怒 刀 冬 灯 当 投 豆 東 到 逃 倒 凍 唐 島 桃 討 透 党 悼 盗 陶 塔 搭 棟 湯 痘 登 答 等 筒 統 稲 踏 糖 頭 謄 藤 闘 騰 同 洞 胴 動 堂 童 道 働 銅 導 瞳 峠 匿 特 得 督 徳 篤 毒 独 読 栃 凸 突 届 屯 豚 頓 貪 鈍 曇 丼 那 奈 内 梨 謎 鍋 南 軟 難 二 尼 弐 匂 肉 虹 日 入 乳 尿 任 妊 忍 認 寧 熱 年 念 捻 粘 燃 悩 納 能 脳 農 濃 把 波 派 破 覇 馬 婆 罵 拝 杯 背 肺 俳 配 排 敗 廃 輩 売 倍 梅 培 陪 媒 買 賠 白 伯 拍 泊 迫 剝 舶 博 薄 麦 漠 縛 爆 箱 箸 畑 肌 八 鉢 発 髪 伐 抜 罰 閥 反 半 氾 犯 帆 汎 伴 判 坂 阪 板 版 班 畔 般 販 斑 飯 搬 煩 頒 範 繁 藩 晩 番 蛮 盤 比 皮 妃 否 批 彼 披 肥 非 卑 飛 疲 秘 被 悲 扉 費 碑 罷 避 尾 眉 美 備 微 鼻 膝 肘 匹 必 泌 筆 姫 百 氷 表 俵 票 評 漂 標 苗 秒 病 描 猫 品 浜 貧 賓 頻 敏 瓶 不 夫 父 付 布 扶 府 怖 阜 附 訃 負 赴 浮 婦 符 富 普 腐 敷 膚 賦 譜 侮 武 部 舞 封 風 伏 服 副 幅 復 福 腹 複 覆 払 沸 仏 物 粉 紛 雰 噴 墳 憤 奮 分 文 聞 丙 平 兵 併 並 柄 陛 閉 塀 幣 弊 蔽 餅 米 壁 璧 癖 別 蔑 片 辺 返 変 偏 遍 編 弁 辛 便 勉 歩 保 哺 捕 補 舗 母 募 墓 慕 暮 簿 方 包 芳 邦 奉 宝 抱 放 法 泡 胞 俸 倣 峰 砲 崩 訪 報 蜂 豊 飽 褒 縫 亡 乏 忙 坊 妨 忘 防 房 肪 某 冒 剖 紡 望 傍 帽 棒 貿 貌 暴 膨 謀 頰 北 木 朴 牧 睦 僕 墨 撲 没 勃 堀 本 奔 翻 凡 盆 麻 摩 磨 魔 毎 妹 枚 昧 埋 幕 膜 枕 又 末 抹 万 満 慢 漫 未 味 魅 岬 密 蜜 脈 妙 民 眠 矛 務 無 夢 霧 娘 名 命 明 迷 冥 盟 銘 鳴 滅 免 面 綿 麺 茂 模 毛 妄 盲 耗 猛 網 目 黙 門 紋 問 冶 夜 野 弥 厄 役 約 訳 薬 躍 闇 由 油 喩 愉 諭 輸 癒 唯 友 有 勇 幽 悠 郵 湧 猶 裕 遊 雄 誘 憂 融 優 与 予 余 誉 預 幼 用 羊 妖 洋 要 容 庸 揚 揺 葉 陽 溶 腰 様 瘍 踊 窯 養 擁 謡 曜 抑 沃 浴 欲 翌 翼 拉 裸 羅 来 雷 頼 絡 落 酪 辣 乱 卵 覧 濫 藍 欄 吏 利 里 理 痢 裏 履 璃 離 陸 立 律 慄 略 柳 流 留 竜 粒 隆 硫 侶 旅 虜 慮 了 両 良 料 涼 猟 陵 量 僚 領 寮 療 瞭 糧 力 緑 林 厘 倫 輪 隣 臨 瑠 涙 累 塁 類 令 礼 冷 励 戻 例 鈴 零 霊 隷 齢 麗 暦 歴 列 劣 烈 裂 恋 連 廉 練 錬 呂 炉 賂 路 露 老 労 弄 郎 朗 浪 廊 楼 漏 籠 六 録 麓 論 和 話 賄 脇 惑 枠 湾 腕 2,138
Alternates 剥 叱 填 頬 4
Jōyō traditional variant forms (not including 61 compatibility forms that normalise to other characters) 亞 惡 壓 圍 醫 爲 壹 隱 榮 營 衞 驛 圓 鹽 緣 艷 應 歐 毆 櫻 奧 橫 溫 穩 假 價 畫 會 繪 壞 懷 槪 擴 殼 覺 學 嶽 樂 渴 罐 卷 陷 勸 寬 關 歡 觀 氣 歸 龜 僞 戲 犧 舊 據 擧 虛 峽 挾 狹 鄕 曉 區 驅 勳 薰 徑 莖 惠 揭 溪 經 螢 輕 繼 鷄 藝 擊 缺 硏 縣 儉 劍 險 圈 檢 獻 權 顯 驗 嚴 廣 效 恆 黃 鑛 號 國 黑 碎 濟 齋 劑 雜 參 棧 蠶 慘 贊 殘 絲 齒 兒 辭 濕 實 寫 舍 釋 壽 收 從 澁 獸 縱 肅 處 緖 敍 將 稱 涉 燒 證 奬 條 狀 乘 淨 剩 疊 繩 壤 孃 讓 釀 觸 囑 眞 寢 愼 盡 圖 粹 醉 穗 隨 髓 樞 數 瀨 聲 齊 靜 竊 攝 絕 專 淺 戰 踐 錢 潛 纖 禪 雙 壯 爭 莊 搜 插 巢 曾 瘦 裝 總 騷 增 藏 臟 卽 屬 續 墮 對 體 帶 滯 臺 瀧 擇 澤 擔 單 膽 團 斷 彈 遲 癡 蟲 晝 鑄 廳 徵 聽 敕 鎭 遞 鐵 點 轉 傳 燈 當 黨 盜 稻 鬭 德 獨 讀 屆 貳 惱 腦 霸 拜 廢 賣 麥 發 髮 拔 晚 蠻 祕 濱 甁 拂 佛 倂 竝 餠 邊 變 辨 瓣 辯 步 寶 豐 襃 沒 飜 每 萬 滿 麵 默 彌 譯 藥 與 豫 餘 譽 搖 樣 謠 來 賴 亂 覽 龍 兩 獵 綠 淚 壘 禮 勵 戾 靈 齡 曆 歷 戀 鍊 爐 勞 郞 樓 錄 灣 305
The compatibility forms 逸 謁 禍 悔 海 慨 喝 褐 漢 祈 既 器 響 勤 謹 穀 殺 祉 視 者 煮 臭 祝 暑 署 諸 祥 神 節 祖 僧 層 贈 贈 嘆 著 懲 塚 都 突 難 梅 繁 卑 碑 賓 頻 敏 侮 福 塀 勉 墨 免 欄 隆 虜 類 練 朗 廊 61
Jinmeiyō kanji 丑 丞 乃 之 乎 也 云 亘 些 亦 亥 亨 亮 仔 伊 伍 伽 佃 佑 伶 侃 侑 俄 俠 俣 俐 倭 俱 倦 倖 偲 傭 儲 允 兎 兜 其 冴 凌 凜 凧 凪 凰 凱 函 劉 劫 勁 勺 勿 匁 匡 廿 卜 卯 卿 厨 厩 叉 叡 叢 叶 只 吾 吞 吻 哉 哨 啄 哩 喬 喧 喰 喋 嘩 嘉 嘗 噌 噂 圃 圭 坐 尭 坦 埴 堰 堺 堵 塙 壕 壬 夷 奄 奎 套 娃 姪 姥 娩 嬉 孟 宏 宋 宕 宥 寅 寓 寵 尖 尤 屑 峨 峻 崚 嵯 嵩 嶺 巌 巫 已 巳 巴 巷 巽 帖 幌 幡 庄 庇 庚 庵 廟 廻 弘 弛 彗 彦 彪 彬 徠 忽 怜 恢 恰 恕 悌 惟 惚 悉 惇 惹 惺 惣 慧 憐 戊 或 戟 托 按 挺 挽 掬 捲 捷 捺 捧 掠 揃 摑 摺 撒 撰 撞 播 撫 擢 孜 敦 斐 斡 斧 斯 於 旭 昂 昊 昏 昌 昴 晏 晃 晒 晋 晟 晦 晨 智 暉 暢 曙 曝 曳 朋 朔 杏 杖 杜 李 杭 杵 杷 枇 柑 柴 柘 柊 柏 柾 柚 桧 栞 桔 桂 栖 桐 栗 梧 梓 梢 梛 梯 桶 梶 椛 梁 棲 椋 椀 楯 楚 楕 椿 楠 楓 椰 楢 楊 榎 樺 榊 榛 槙 槍 槌 樫 槻 樟 樋 橘 樽 橙 檎 檀 櫂 櫛 櫓 欣 欽 歎 此 殆 毅 毘 毬 汀 汝 汐 汲 沌 沓 沫 洸 洲 洵 洛 浩 浬 淵 淳 渚 淀 淋 渥 渾 湘 湊 湛 溢 滉 溜 漱 漕 漣 澪 濡 瀕 灘 灸 灼 烏 焰 焚 煌 煤 煉 熙 燕 燎 燦 燭 燿 爾 牒 牟 牡 牽 犀 狼 猪 獅 玖 珂 珈 珊 珀 玲 琢 琉 瑛 琥 琶 琵 琳 瑚 瑞 瑶 瑳 瓜 瓢 甥 甫 畠 畢 疋 疏 皐 皓 眸 瞥 矩 砦 砥 砧 硯 碓 碗 碩 碧 磐 磯 祇 祢 祐 祷 禄 禎 禽 禾 秦 秤 稀 稔 稟 稜 穣 穹 穿 窄 窪 窺 竣 竪 竺 竿 笈 笹 笙 笠 筈 筑 箕 箔 篇 篠 簞 簾 籾 粥 粟 糊 紘 紗 紐 絃 紬 絆 絢 綺 綜 綴 緋 綾 綸 縞 徽 繫 繡 纂 纏 羚 翔 翠 耀 而 耶 耽 聡 肇 肋 肴 胤 胡 脩 腔 脹 膏 臥 舜 舵 芥 芹 芭 芙 芦 苑 茄 苔 苺 茅 茉 茸 茜 莞 荻 莫 莉 菅 菫 菖 萄 菩 萌 萊 菱 葦 葵 萱 葺 萩 董 葡 蓑 蒔 蒐 蒼 蒲 蒙 蓉 蓮 蔭 蔣 蔦 蓬 蔓 蕎 蕨 蕉 蕃 蕪 薙 蕾 蕗 藁 薩 蘇 蘭 蝦 蝶 螺 蟬 蟹 蠟 衿 袈 袴 裡 裟 裳 襖 訊 訣 註 詢 詫 誼 諏 諄 諒 謂 諺 讃 豹 貰 賑 赳 跨 蹄 蹟 輔 輯 輿 轟 辰 辻 迂 迄 辿 迪 迦 這 逞 逗 逢 遥 遁 遼 邑 祁 郁 鄭 酉 醇 醐 醍 醬 釉 釘 釧 銑 鋒 鋸 錘 錐 錆 錫 鍬 鎧 閃 閏 閤 阿 陀 隈 隼 雀 雁 雛 雫 霞 靖 鞄 鞍 鞘 鞠 鞭 頁 頌 頗 顚 颯 饗 馨 馴 馳 駕 駿 驍 魁 魯 鮎 鯉 鯛 鰯 鱒 鱗 鳩 鳶 鳳 鴨 鴻 鵜 鵬 鷗 鷲 鷺 鷹 麒 麟 麿 黎 黛 鼎 633
Jinmeiyō variants 亙 凛 巖 堯 晄 檜 槇 渚 猪 琢 禰 祐 禱 祿 禎 穰 萠 遙 18

CJK compatibility characters

The Jōyō traditional forms include 61 kanji shapes that Unicode includes in the CJK Compatibility Ideographs block. Normalisation operations (which in some systems may happen automatically, or during things such as cut & paste) convert them to characters in the main CJK block. This makes them unstable, and best avoided. The following list shows the compatibility character shape to the left, and the normalised shape to the right.

逸逸␣謁謁␣禍禍␣悔悔␣海海␣慨慨␣喝喝␣褐褐␣漢漢␣祈祈␣既既␣器器␣響響␣勤勤␣謹謹␣穀穀␣殺殺␣祉祉␣視視␣者者␣煮煮␣臭臭␣祝祝␣暑暑␣署署␣諸諸␣祥祥␣神神␣節節␣祖祖␣僧僧␣層層␣贈贈␣贈贈␣嘆嘆␣著著␣懲懲␣塚塚␣都都␣突突␣難難␣梅梅␣繁繁␣卑卑␣碑碑␣賓賓␣頻頻␣敏敏␣侮侮␣福福␣塀塀␣勉勉␣墨墨␣免免␣欄欄␣隆隆␣虜虜␣類類␣練練␣朗朗␣廊廊

Kana syllabaries

Japanese uses two syllabaries: hiragana and katakana.

Katakana characters are used for foreign loan words, such as the word 'text'.

テキスト

The word text (tekisuto), written with katakana syllables.

Hiragana is used for indigenous Japanese words, such as the verb 'to be'.

です

The word for to be (desu), written with hiragana syllables.

It is also used for grammatical endings after a word root written using kanji characters.

集まります

The word for to collect (atsumarimasu), with the verb root (atsu) written using a kanji character, and the remainder in hiragana expresssing the grammatical present-tense.

The basic syllabary includes 5 independent vowel sounds, one nasal sound, and the rest are consonant+vowel combinations. In these lists we show hiragana (first) and katakana (second) together.

あア␣いイ␣うウ␣えエ␣おオ␣かカ␣きキ␣くク␣けケ␣こコ␣さサ␣しシ␣すス␣せセ␣そソ␣たタ␣ちチ␣つツ␣てテ␣とト␣なナ␣にニ␣ぬヌ␣ねネ␣のノ␣はハ␣ひヒ␣ふフ␣へヘ␣ほホ␣まマ␣みミ␣むム␣めメ␣もモ␣やヤ␣ゆユ␣よヨ␣らラ␣りリ␣るル␣れレ␣ろロ␣わワ␣をヲ␣んン

Voiced consonants are indicated by attaching a dakuten mark (looks like a quote mark) to the unvoiced shape. Unicode provides precomposed code points for every combination of syllable+dakuten.

がガ␣ぎギ␣ぐグ␣げゲ␣ごゴ␣ざザ␣じジ␣ずズ␣ぜゼ␣ぞゾ␣だダ␣ぢヂ␣づヅ␣でデ␣どド␣ばバ␣びビ␣ぶブ␣べベ␣ぼボ␣ヴ␣ヷ␣ヺ

The ‘p’ sound is indicated in a similar way by the use of a han-dakuten (half-dakuten).

ぱパ␣ぴピ␣ぷプ␣ぺペ␣ぽポ

The Unicode hiragana block does contain code points for dakuten combining marks and modifiers, but these are not used in normal text.

The basic set is completed by a number of small forms used for medial glides, foreign sounds, and gemination, and a vowel lengthener.

ぁァ␣ぃィ␣ぅゥ␣ぇェ␣ぉォ␣ゃャ␣ゅュ␣ょョ␣ゎヮ␣っッ␣ー

Small versions of や, ゆ, and よ are used to form syllables such as きゃ kya kja きゅ kya kja きょ kyo kjo

[U+30C3 KATAKANA LETTER SMALL TU] is used to lengthen a consonant sound.

[U+30FC KATAKANA-HIRAGANA PROLONGED SOUND MARK] is used to elongate vowel sounds. This elongation is phonemically significant. It is used predominantly with katakana, but occasionally also with hiragana.uk,720

Yotsugana

Over time, certain voiced sounds have merged in several important dialects, as shown in fig_yotsugana.

Tokyo (standard)d͡ʑi~ʑi d͡zɯᵝ~zɯᵝ
South Tohokud͡zɯᵝ
Kōchi (Hata, Tosa) di~d͡zi ʑi dɯᵝ~d͡zɯᵝzɯᵝ
Kagoshima d͡ʑiʑi d͡zɯᵝzɯᵝ
Okinawad͡ʑi
Yotsugana pronunciation around Japan. (source wy.)

The orthographic reform shortly after World War 2 recommended the use of only [U+3058 HIRAGANA LETTER ZI] and [U+305A HIRAGANA LETTER ZU], except in circumstances where an unvoiced sound has become voiced because of:wy

  1. compounding (rendaku), eg. 神無月 kannad͡zuki October (lunar month)which combines かん, , and つき, would be written in hiragana as かんなづき
  2. repetition, eg. t͡sud͡zuki continuationis written in hiragana asつづく

Archaic characters

A number of characters in the kana blocks are no longer used in modern text.

These characters were dropped by an orthographic reform shortly after World War 2.

ゐヰ␣ゑヱ␣ヸ␣ヹ

Halfwidth katakana

Unicode has a set of halfwidth katakana forms for legacy encoding roundtrips. In principle, these characters should not be used. The normal, fullsized characters should be used instead.

・␣ヲ␣ァ␣ィ␣ゥ␣ェ␣ォ␣ャ␣ュ␣ョ␣ッ␣ー␣ア␣イ␣ウ␣エ␣オ␣カ␣キ␣ク␣ケ␣コ␣サ␣シ␣ス␣セ␣ソ␣タ␣チ␣ツ␣テ␣ト␣ナ␣ニ␣ヌ␣ネ␣ノ␣ハ␣ヒ␣フ␣ヘ␣ホ␣マ␣ミ␣ム␣メ␣モ␣ヤ␣ユ␣ヨ␣ラ␣リ␣ル␣レ␣ロ␣ワ␣ン␣゙␣゚

Text direction

Text can be written horizontally, left to right, or vertically with lines progressing from right to left. Vertically set text is still common in Japan; most novels, newspapers and magazines are set vertically.j,#h-note-15

Sometimes, vertically set text may contain sections or items that are set horizontally. For example, in newspapers, headings are normally set horizontally above body of an article which is set vertically, and captions are usually horizontal.

fig_mixed_direction shows pages from a magazine that mix directions on the page.

Vertically set pages with mixed direction text. (Click on image for larger size.)

Older horizontally set texts in Japanese also ran right to left.

If your browser supports vertical text, you can change the direction of the text sample here.

第7条 すべての人は、法の下において平等であり、また、いかなる差別もなしに法の平等な保護を受ける権利を有する。すべての人は、この宣言に違反するいかなる差別に対しても、また、そのような差別をそそのかすいかなる行為に対しても、平等な保護を受ける権利を有する。

It should be noted, however, that different conventions are applied for horizontal and vertical text, for example in terms of characters used and treatment of embedded romaji and numerals. Apart from the question of what gets rotated and what does not, the two writing modes may show different preferences for emphasis marks, brackets, numbers, and so forth. This means that it is not usually appropriate to simply switch the direction of the text without making additional changes.

In vertical text (only) decisions have to be made about how to present embedded romaji text and numbers. Romaji typically runs down the page, with proportionally-spaced characters rotated 90º to the right. However, acronyms are often written using upright, fullwidth characters.

Romaji text with characters rotated (left), and an acronym with upright letters (right).

Numbers, and sometimes text, may also run horizontally within a vertical line. This is most common with double-digit numbers, such as in dates. The width of the horizontal text should not normally exceed the width of the surrounding vertical text (ie. it should fit in the width of a character space). This is referred to as tate chu yoko.

Numbers arranged horizontally within a vertical line.

Glyph shaping & positioning

This section brings together information about the following topics: writing styles; cursive text; context-based shaping; context-based positioning; baselines, line height, etc.; font styles; case & other character transforms.

You can experiment with examples using the Japanese character app.

The Japanese scripts are not cursive, and involve no context-based shaping or positioning.

The orthography has no case distinction.

By default, all kanji, hiragana, katakana, and punctuation characters are drawn inside a character frame that is square and the same size for all characters. The box containing the actual symbol is called the letter face, and there should be some space left between the letter face and the character frame. There may be variations, particularly for small kana, punctuation, etc., in the size of the letter face.

Because of the regularity of the character frame size, it can be used to measure the size of the text area or other parts of a page (horizontally or vertically).

Character frame and letter face.

In principle, Japanese characters are set solid, ie. with no space between the character frames. However, text alignment and justification can make adjustments to the placement of characters in the direction of the line flow. See justification and letterspace.

Context-based positioning

Characters such as small kana and punctuation occupy different locations within the character frame in horizontal and vertical text.

Positioning of small ょ and the full stop in horizontal and vertical character frames.

fig_small_kana shows how in horizontal text small kana are centred horizontally in the character frame but are vertically below centre; in vertical text they are centred vertically, but aligned right.

The full stop also switches from bottom-left in horizontal text, to top-right in vertical.

These are differences that cannot be produced by rotating glyphs, but require special glyphs in the font which are applied when the directional context is detected.

Baselines & inline alignment

The standard baseline for kanji and kana characters is slightly lower than the alphabetic baseline used for Latin characters. Mixed script text needs to align baselines correctly.

Japanese characters have not ascenders or descenders, but occupy the square space described earlier.

Font styles

tbd

Transforming characters

Japanese kanji and kana is a monocameral orthography, and no transforms are needed to convert between different forms of a given letter.

However, transforms may be applied to convert between half-width and full-width kana characters.

Punctuation & inline features

Grapheme boundaries

Since there are no combining marks or decompositions, graphemes correspond to individual characters for kanji and kana.

Unicode grapheme clusters can be applied to Japanese text without problems. There are no special issues related to operations that use grapheme clusters as their basic unit of text.

Word boundaries

Japanese rarely uses spaces. In the sample text there are gaps around punctuation, but these are produced by a lack of 'ink' in parts of the square character glyphs.

You can verify this by clicking on this example. The character list popup shows that only four characters make up this sequence, and none are spaces.

い。(こ

Gaps of this kind may also be reduced during justification and line alignment.

In general, word boundaries are not important for line-wrapping, however occasionally text such as headings may be wrapped at word boundaries in order to better balance the text.

Word boundaries are identified when users select screen text, eg. by double-clicking inside a word. Heuristics and dictionaries are needed to identify the boundaries of words in such situations. Note, also, that words in Japanese are very often a mixture of kanji characters followed by hiragana. The word boundary detection needs to treat the various scripts as a unified orthography.

Phrase & section boundaries

Japanese uses the following separators at the sentence level and below.j,#differences_in_vertical_and_horizontal_composition_in_use_of_punctuation_marks Some of the punctuation looks like that for Latin (eg. parentheses, commas, and full stops), but the width of the punctuation is likely to include significant amounts of white space, so that punctuation characters occupy the same space as han characters.

    H V
phrase [U+FF0C FULLWIDTH COMMA]  Bottom left Bottom right
[U+3001 IDEOGRAPHIC COMMA] Bottom left Bottom right
[U+FF1A FULLWIDTH COLON]  Bottom right Bottom right
[U+FF1B FULLWIDTH SEMICOLON]  Bottom right Bottom right
sentence [U+3002 IDEOGRAPHIC FULL STOP] Bottom left Bottom right
[U+FF0E FULLWIDTH FULL STOP] Bottom left Bottom right
exclamation [U+FF01 FULLWIDTH EXCLAMATION MARK]  Bottom right Right
question [U+FF1F FULLWIDTH QUESTION MARK]  Bottom right Right

[U+3001 IDEOGRAPHIC COMMA] and [U+3002 IDEOGRAPHIC FULL STOP] are the norm for vertical text, however two alternative conventions as applied to horizontal text: especially in books that mix Japanese and western text, such as books on science and technology, the former may be replaced by [U+FF0C FULLWIDTH COMMA] and [U+FF0E FULLWIDTH FULL STOP]. Often, however, the ideographic full stop is retained, since it is more visible and looks better (this convention has been adopted for Japanese official publications).j,#differences_in_vertical_and_horizontal_composition_in_use_of_punctuation_marks

As the table shows, these punctuation marks require dedicated glyphs in the font, and cannot be achieved by simply rotating the glyph.

Japanese also uses the following doubled exclamation/question marks. They remain upright in vertical text.

[U+203C DOUBLE EXCLAMATION MARK]
[U+2047 DOUBLE QUESTION MARK]
[U+2048 QUESTION EXCLAMATION MARK] 
[U+2049 EXCLAMATION QUESTION MARK] 

Other punctuation used to separate phrases or items includes:

  H V
[U+2E3A TWO-EM DASH]  Bottom right Bottom right
—— [U+2014 EM DASH + U+2014 EM DASH] Bottom right Bottom right

If EM DASH characters are used, they are used in pairs.

Parentheses & brackets

For general parentheses and bracketing in text, Japanese uses:

    H V
[U+FF08 FULLWIDTH LEFT PARENTHESIS] [U+FF09 FULLWIDTH RIGHT PARENTHESIS] Right aligned Left aligned Bottom aligned
Top aligned
[U+FF3B FULLWIDTH LEFT SQUARE BRACKET]  [U+FF3D FULLWIDTH RIGHT SQUARE BRACKET]  Right aligned Left aligned -
[U+3014 LEFT TORTOISE SHELL BRACKET]  [U+3015 RIGHT TORTOISE SHELL BRACKET]  - Bottom aligned
Top aligned

[U+3014 LEFT TORTOISE SHELL BRACKET] and its closing partner are the vertical equivalent of [U+FF3B FULLWIDTH LEFT SQUARE BRACKET], which is used in horizontal text.

Although there are a number of other bracket characters (listed just below), they are less commonly used.

【␣】␣〖␣〗␣{␣}

Quotations

Japanese uses different quote marks for horizontal and vertical writing. The default quote marks are:

    H V
[U+201C LEFT DOUBLE QUOTATION MARK] [U+201D RIGHT DOUBLE QUOTATION MARK] Top=right aligned Top-left aligned -
[U+300C LEFT CORNER BRACKET] [U+300D RIGHT CORNER BRACKET] Top=right aligned Top-left aligned Bottom-right aligned
Top-left aligned
[U+301D REVERSED DOUBLE PRIME QUOTATION MARK] [U+301F LOW DOUBLE PRIME QUOTATION MARK] - Top=right aligned Top-left aligned

When an additional quote is embedded within the first, the quote marks are:

    H V
[U+2018 LEFT SINGLE QUOTATION MARK] [U+2019 RIGHT SINGLE QUOTATION MARK] Top-right aligned Bottom-left aligned -
[U+300E LEFT WHITE CORNER BRACKET] [U+300F RIGHT WHITE CORNER BRACKET] - Bottom-right aligned
Top-left aligned

Emphasis

Japanese sometimes uses katakana characters to create visual emphasis. uk,720

Japanese Layout Requirements lists the following ways of showing emphasis in Japanese.

  1. Select a different typeface (eg. a Gothic font in Mincho text).
  2. Use [U+300C LEFT CORNER BRACKET] and [U+300D RIGHT CORNER BRACKET] or [U+3008 LEFT ANGLE BRACKET] and [U+3009 RIGHT ANGLE BRACKET].
  3. Change the colour.
  4. Underline.
  5. Boten marks (also known as emphasis marks).

(1) and (2) are popular approaches. (5) is not as common, but is a traditional approach with some value attached.

Different boten marks are used in horizontal and vertical text. Typically, bullets are used above characters in horizontal text, and sesame dots are used to the right of characters in vertical text.j,#composition_of_emphasis_dots

The boten mark is centre-aligned with the base characters, and doesn't normally appear alongside full stops, commas, or brackets.j,#composition_of_emphasis_dots

Horizontal text and boten marks. Horizontal text and boten marks.
Boten marks used for emphasis in horizontal and vertical text. (source)

Embedded text in other languages would have boten marks displayed on the same side as for Japanese.

Abbreviation, ellipsis & repetition

tbd

Inline notes & annotations

Japanese has a few ways of representing inline notes and annotations.

Ruby

Various ways of arranging inter-linear annotations alongside text fall under the rubrique of ruby (named from the British print size originally used for the annotations). These include mono-ruby, jukugo-ruby, and group-ruby, and they are described in detail below.

Ruby is commonly used to indicate the pronunciation of ideographic characters used in Japanese, as it cannot usually be guessed and so can pose difficulties for those learning the language. For these cases, mono-ruby is most commonly used, however a variant, jukugo-ruby, is sometimes applied to compound nouns (which are called jukugo in Japanese). Where sequences of kanji characters do not have the same pronunciation as the sum of their parts (called jukuji), group ruby is used to represent the sound,j,#h-note-109 eg. いなか inaka田舎 countrysideor きょう kyō今日 today

Ruby annotations are also used to provide brief indications of the meaning of words or characters. These annotations typically use the group-ruby approach. The most typical example of this is attaching ruby text to a kanji compound word to indicate a corresponding loan word in katakana (see fig_group_ruby).j,#id221 Group ruby is also used to indicate the reading or the meaning of a Western word used in base text, or where a synonymous Western word in Latin characters is attached as a ruby annotation to a Japanese word (see Figure 112).

The rest of this section describes features that are generally common to all forms of ruby, before we move on to examine the differences in following subsections.

All annotations appear within the standard inter-line space for the page, and don't create extra line height if they only appear on a single line. The inter-line space is usually set at an appropriate size to accommodate annotations.

Unlike Chinese, it is common to find annotations applied just to specific words, rather than annotating the whole text.

Ruby annotations normally appear above horizontal lines of text, and to the right of vertical lines. Occasionally, both phonetic and semantic annotations are applied to the same base text, in which case the annotations appear on both sides of the base. A typical scenario in these cases would be to have mono-ruby above/right of the base, and group-ruby below/left.j,#choice_of_sides_for_ruby_with_respect_to_base_characters

東南(とうなん)(たつみ)の方角
Double-sided ruby.

The character frame of kana annotations is usually half that of the base character. Occasionally, annotations are compressed in one direction (depending of direction of writing) so that 3 fit over a single base character.j,#fig2_3_10 In large text (12pt or more), such as headings, the size of the annotation may be less that half that of the base.j,#fig2_3_11

Mono-ruby

Usually applied to kanji base characters, each base character is associated individually with an annotation.

Annotations are normally centred over the base character in horizontal text, and with the middle of the base character in vertical text. (called nakatsuki). An alternative, used only in vertical text, is to align the annotation with the top of the character frame of the base character (katatsuki)j,#id227, as in the righthand example in fig_jukugo.

推/すい/理/り/小/しょう/説/せつ (detective novel) 推/すい/理/り/小/しょう/説/せつ (detective novel) vertical
Mono-ruby for 推理小説 detective novel, in horizontal and vertical text (colouring added for illustrative purposes).

Since the annotation characters are usually 1/2 the size of the base characters, 3-character annotations require more space that the underlying kanji. Internally to the sequence, this will produce a gap between the base characters, since annotations cannot overlap (see fig_mono_ruby).

At either end of the sequence, either a gap is opened up between the base character with the long annotation and its neighbour (see fig_overhang), or the annotation may overhang the neighbouring base characters. Simpler implementations produce gaps, but allow annotations to overhang any blank parts of adjacent fullwidth punctuation characters. More sophisticated applications may allow overlap of kana or other characters, though never kanjij,#232, but may also have to deal more complicated algorithms, such as balancing space on either side of the ruby sequence, or deciding what can and cannot be overlapped, and to what extent.j,#id229 j,#adjustments_of_ruby_with_length_longer_than_that_of_the_base_characters


Alternative ways of dealing with potential overhang either side of the ruby sequence.

At line start or line end, long annotations do not protrude past the line edge – meaning that there will be a gap between the base character and the line edge.

Gaps produced at line end and line start by wide annotations.

Lines can be broken in the middle of a sequence of mono-ruby annotations, since an associated base and annotation are kept together.

Group ruby

Applies when the base is a sequence of characters, mapped to a single annotation. The base can be a sequence of either kanji or other characters, as can be the annotation.

When the annotation is shorter than the base, and the annotation is composed of kana or kanji characters, they are typically spread out with two units of equal spacing between each character and one at either end. The end space should never exceed half the width of a base character.j,#positioning_of_groupruby_with_respect_to_base_characters

When the base is shorter than the annotation, the inverse applies.

顧客/クライアント 模型/モデル
模型 mokei model and 顧客 kokjaku client with katakana group-ruby annotations indicating loan word alternatives.

If the annotation or the base is not kanji or kana, the text is set solid and centred relative to the other component (see fig_latin_ruby).

編集者/editor editor/エデイター
Group-ruby involving non-Japanese text (the right-hand example is 編集者 henʃuːʃa editor ).

Overhang behaviour is the same as described for mono-ruby, as is the handling at line ends when the annotation is longer than the base.

Unlike a sequence of mono-ruby, there is no line-break opportunity inside a group-ruby.

Jukugo-ruby

Where compound nouns (jukugo) occur, special rules for arrangement of annotation characters (so-called jukugo-ruby) can make it appear that they are evenly distributed across the word (see the lefthand example in fig_jukugo), but there are rules about how much and what type of overhang are allowed, which sometimes lead to gaps (see the righthand example of fig_jukugo).

橋頭堡/きようとうほ (beachhead) 思春期/ししゅんき (puberty)
Two examples of distributed annotations in jukugo-ruby. On the right, a gap appears in the annotation because of the rules about overhang.

An important feature of jukugo-ruby is that where the full compound noun doesn't fit at the end of a line the base characters wrap one-by-one in the normal way, taking with them the appropriate annotations. The annotation for a single base character is never split across a line break.

It is up to the author whether a word that is actually a sequence or 2 compound nouns is treated as a single jukugo ruby, or as two separate ones.

There are numerous options for overhang and arrangement of jukugo-ruby annotations. They are discussed in detail in JLReq.

Inline ruby

Where text sizes are too small for ruby characters to be easily read, the ruby annotation is typically rendered after the base text, in parentheses.

Inline annotations should normally correspond to full words, even if the sequence of base characters would otherwise be represented using mono-ruby. For example, the inline representation of the word 東京 Tokyo should be displayed inline as 東京(とうきょう)and not 東(とう)京(きょう)

Warichu

Warichu is a method of adding notes right alongside the relevant text, used particularly in study guides, travel guides, reference books, encyclopedias and manuals. It is generally only used in vertical text, although it is occasionally used in horizontal text for study guides and encyclopedias.

The note is usually surrounded by parentheses (or rarely just spaces), and the text of the note is half the size of the main text and arranged in two parallel lines. The two parallel lines are usually set with no inter-line spacing.

Two examples of warichu.

The warichu lines should be as close to equal in length as possible, given the normal wrapping rules, and if there is a difference, the initial line (right side) should be the longer.

In the rare event that the warichu text breaks across more than one line (see fig_warichu on the right), both lines of the warichu on the first line of the main text should be read completely before continuing to the remainder of the note. The characters in memory follow the normal reading sequence (and use normal characters, too), but the application needs to rearrange the visual order around the line break.

Other inline ranges

tbd

Other punctuation

CLDR 31 lists the following punctuation characters for Japanese. First the fullwidth forms of normal characters.

-␣,␣、␣;␣:␣!␣?␣.␣'␣"␣(␣)␣[␣]␣{␣}␣@␣*␣/␣\␣&␣#␣%

Then the halfwidth forms.

・␣、␣、␣。␣。␣「␣」

And finally, the other punctuation.

-␣‐␣—␣―␣〜␣,␣、␣;␣:␣!␣?␣.␣‥␣…␣。␣‘␣’␣"␣“␣”␣(␣)␣[␣]␣{␣}␣〈␣〉␣《␣》␣「␣」␣『␣』␣【␣】␣〔␣〕␣‖␣§␣¶␣@␣*␣/␣\␣&␣#␣%␣‰␣†␣‡␣′␣″␣〃␣※

The katakana block contains two additional punctuation marks.

・␣゠

[U+30FB KATAKANA MIDDLE DOT] is used to separate words when writing non-Japanese phrases.uk,720

[U+30A0 KATAKANA-HIRAGANA DOUBLE HYPHEN] is a delimiter occasionally used in analyzed Katakana or Hiragana textual material. 

The hiragana block contains some combining and modifier characters used to represent dakuten and han-dakuten for compatibility with older systems.

゙␣゚␣゛␣゜

The kana blocks each have two marks that are used to indicate repetition of a syllable – one for syllables with unvoiced consonants and another for voiced. The table below shows the hiragana first, then the katakana. In both cases there is a character for repetition of ordinary syllables, and one for repetition of syllables with dakuten.

ゝ␣ゞ␣ヽ␣ヾ

Unicode also has ͏Ideographic Space [U+3000 IDEOGRAPHIC SPACE] for occasions where it is needed.

Line & paragraph layout

Line breaking & hyphenation

Lines are normally wrapped between characters – word boundaries usually have no significance for the wrapping. However, occasionally there is a preference to wrap text at word boundaries, eg. to better balance headings.

Japanese should also take into account a few rules (kinsoku rules) which dictate what characters cannot appear at the end or start of a line.

Show (default) line-breaking properties for non-kanji characters in the Chinese orthography described here.

Kanji characters have the ID character property.

Hyphenation

There is no hyphenation at line-breaks for Japanese text.

Text alignment & justification

The preferred arrangement of characters on a line is solid set, ie. each character frame immediately follows the previous one, each with the same width. In principle, in books where the width of the text area on a page is set by counting characters and fixed, paragraphs composed of kanji and kana characters don't need to be justified. Lines break as soon as the line is full of characters, and the whole paragraph has grid lines vertically and horizontally between the characters.

However, a number of factors may introduce a need to introduce justification, from time to time. One such would be punctuation that pulls the last character of the previous line with it to the next line, so that it doesn't begin a line on its own. Another would be web-based text where windows can be stretched, resulting in a situation where the width of a line no longer exactly corresponds to the sum of the width of all the characters on that line. Other situations include lines where proportionally-spaced romaji text breaks the grid effect.

Japanese justifies text using a complex set of rules which adjust the space between characters on a line. Some characters are adjusted before others.

In situations where a set of lines each contains self-contained text, the line content may be stretched to fit the line width, for example in table cells. In this case it is typical to set the first and last characters at the line start and end, respectively, and then apply equal amounts of spacing between all remaining characters. This can result in large gaps, including lines where the two characters are arranged at opposite line ends with nothing between. See fig_distributed_spacing.

Distributed spacing example.
Evenly distributed spaces across a line in a table. Source j,#id25

Use the control below to see how your browser justifies the text sample here.

すべての人は、法の下において平等であり、また、いかなる差別もなしに法の平等な保護を受ける権利を有する。すべての人は、この宣言に違反するいかなる差別に対しても、また、そのような差別をそそのかすいかなる行為に対しても、平等な保護を受ける権利を有する。

Letter spacing

Letter-spacing is used to achieve balance between items with large and small numbers of characters, such as headings, running heads, and captions. When expanding text, equal amounts of space are added between the character frames of the item with the smaller number of characters.

Examples of headings where letter spacing has been applied.

Reducing inter-character spacing. Although solid set text is normally best for readability, in large print sizes, such as for magazine headings, it may be desirable to reduce the distance between certain characters. This is typically done by reducing the distance between adjacent letter faces.

Sometimes, text may also be kerned by overlapping the character frames by a regular amount across a whole line.

Counters, lists, etc.

You can experiment with counter styles using the Counter styles converter. Patterns for using these styles in CSS can be found in Ready-made Counter Styles, and we use the names of those patterns here to refer to the various styles.

Japanese text uses a number of different counter styles. Some of the more common include full-width European numbers, which in vertical text stand upright. Unicode has various sets of numbers that can be useful here.

For the dotted-decimal numeric style Unicode provides precomposed characters from 1 to 20.

⒈␣⒉␣⒊␣⒋␣⒌␣⒍␣⒎␣⒏␣⒐␣⒑␣⒒␣⒓␣⒔␣⒕␣⒖␣⒗␣⒘␣⒙␣⒚␣⒛

For the circled-decimal numeric style Unicode provides characters from 1 to 50.

⓪␣①␣②␣③␣④␣⑤␣⑥␣⑦␣⑧␣⑨␣⑩␣⑪␣⑫␣⑬␣⑭␣⑮␣⑯␣⑰␣⑱␣⑲␣⑳␣㉑␣㉒␣㉓␣㉔␣㉕␣㉖␣㉗␣㉘␣㉙␣㉚␣㉛␣㉜␣㉝␣㉞␣㉟␣㊱␣㊲␣㊳␣㊴␣㊵␣㊶␣㊷␣㊸␣㊹␣㊺␣㊻␣㊼␣㊽␣㊾␣㊿

The Japanese orthography also uses kanji or kana characters to create 1 fixed, 4 alphabetic, and 2 additive styles.

Fixed

The circled-katakana fixed style uses the following letters. The suffix is a space, and the numbers run from 1 to 47.

㋐␣㋑␣㋒␣㋓␣㋔␣㋕␣㋖␣㋗␣㋘␣㋙␣㋚␣㋛␣㋜␣㋝␣㋞␣㋟␣㋠␣㋡␣㋢␣㋣␣㋤␣㋥␣㋦␣㋧␣㋨␣㋩␣㋪␣㋫␣㋬␣㋭␣㋮␣㋯␣㋰␣㋱␣㋲␣㋳␣㋴␣㋵␣㋶␣㋷␣㋸␣㋹␣㋺␣㋻␣㋼␣㋽␣㋾

Alphabetic

The alphabetic styles all use [U+3001 IDEOGRAPHIC COMMA] as a suffix (with no following space). The iroha style ordering is based on the order of characters in a pangram poem dating from the Heian era (794–1179).

The hiragana alphabetic style uses these 48 hiragana characters in the order in which they are typically arranged.

あ␣い␣う␣え␣お␣か␣き␣く␣け␣こ␣さ␣し␣す␣せ␣そ␣た␣ち␣つ␣て␣と␣な␣に␣ぬ␣ね␣の␣は␣ひ␣ふ␣へ␣ほ␣ま␣み␣む␣め␣も␣や␣ゆ␣よ␣ら␣り␣る␣れ␣ろ␣わ␣ゐ␣ゑ␣を␣ん

Examples:

あ␣い␣う␣え␣さ␣に␣む␣わ␣いそ␣えほ␣かゐ␣けし

The hiragana-iroha alphabetic style uses 47 hiragana characters in the order shown just below.

い␣ろ␣は␣に␣ほ␣へ␣と␣ち␣り␣ぬ␣る␣を␣わ␣か␣よ␣た␣れ␣そ␣つ␣ね␣な␣ら␣む␣う␣ゐ␣の␣お␣く␣や␣ま␣け␣ふ␣こ␣え␣て␣あ␣さ␣き␣ゆ␣め␣み␣し␣ゑ␣ひ␣も␣せ␣す

Examples:

い␣ろ␣は␣に␣る␣ら␣こ␣ひ␣ろれ␣にえ␣とに␣りな

The katakana alphabetic style uses these 48 hiragana characters in the order in which they are typically arranged.

ア␣イ␣ウ␣エ␣オ␣カ␣キ␣ク␣ケ␣コ␣サ␣シ␣ス␣セ␣ソ␣タ␣チ␣ツ␣テ␣ト␣ナ␣ニ␣ヌ␣ネ␣ノ␣ハ␣ヒ␣フ␣ヘ␣ホ␣マ␣ミ␣ム␣メ␣モ␣ヤ␣ユ␣ヨ␣ラ␣リ␣ル␣レ␣ロ␣ワ␣ヰ␣ヱ␣ヲ␣ン

Examples:

ア␣イ␣ウ␣エ␣サ␣ニ␣ム␣ワ␣イソ␣エホ␣カヰ␣ケシ

The katakana-iroha alphabetic style uses 47 hiragana characters in the order shown just below.

イ␣ロ␣ハ␣ニ␣ホ␣ヘ␣ト␣チ␣リ␣ヌ␣ル␣ヲ␣ワ␣カ␣ヨ␣タ␣レ␣ソ␣ツ␣ネ␣ナ␣ラ␣ム␣ウ␣ヰ␣ノ␣オ␣ク␣ヤ␣マ␣ケ␣フ␣コ␣エ␣テ␣ア␣サ␣キ␣ユ␣メ␣ミ␣シ␣ヱ␣ヒ␣モ␣セ␣ス

Examples:

イ␣ロ␣ハ␣ニ␣ル␣ラ␣コ␣ヒ␣ロレ␣ニエ␣トニ␣リナ

Additive

The Japanese additive styles have a range -9,999 to 9,999 and use kanji characters only. The suffix is [U+3001 IDEOGRAPHIC COMMA] (with no following space), and negative numbers are preceded by マイナス.

The japanese-informal additive style uses these letters.

九千␣八千␣七千␣六千␣五千␣四千␣三千␣二千␣千␣九百␣八百␣七百␣六百␣五百␣四百␣三百␣二百␣百␣九十␣八十␣七十␣六十␣五十␣四十␣三十␣二十␣十␣九␣八␣七␣六␣五␣四␣三␣二␣一␣〇

Examples:

一␣二␣三␣四␣十一␣二十二␣三十三␣四十四␣百十一␣二百二十二␣三百三十三␣四百四十四

The japanese-formal additive style uses these letters.

九阡␣八阡␣七阡␣六阡␣伍阡␣四阡␣参阡␣弐阡␣壱阡␣九百␣八百␣七百␣六百␣伍百␣四百␣参百␣弐百␣壱百␣九拾␣八拾␣七拾␣六拾␣伍拾␣四拾␣参拾␣弐拾␣壱拾␣九␣八␣七␣六␣伍␣四␣参␣弐␣壱␣零

Examples:

壱␣弐␣参␣四␣壱拾壱␣弐拾弐␣参拾参␣四拾四␣壱百壱拾壱␣弐百弐拾弐␣参百参拾参␣四百四拾四

Prefixes and suffixes

The most common suffix is [U+3001 IDEOGRAPHIC COMMA]. The fixed styles have no prefix/suffix.

Examples:

一、 二、 三、 四、 五、
あ、 い、 う、 え、 お、
Separator for Japanese list counters.

Styling initials

Large paragraph-initial characters can easily be found in Japanese content. The character typically fills a box that is the height (or width, in vertically-set text) of 2-4 lines.

An enlarged initial character at the beginning of a paragraph.

Page & book layout

This section is for any features that are specific to Japanese and that relate to the following topics: general page layout & progression; grids & tables; notes, footnotes, etc; forms & user interaction; page numbering, running headers, etc.

General page layout & progression

Book binding & reading direction

Books, magazines, et cetera, that are vertically set have the front cover on the right, and pages turn to the right as you read.

Horizontal books are bound on the left, and vertical on the right.

Page layout

Rather than specifying margins and then filling the space between with the body of the text, Japanese text areas will usually be defined by specifying the width and height of the text area as a number of characters, and then determining the size of the margins based on what remains of the page size. This is possible because Japanese characters are drawn in square character frames, all the same size.

In fact, the calculations also include an inter-line space. This inter-line space must be set for the whole page at a size that is large enough to accommodate any ruby annotations or other items that may protrude into the line gap. Therefore, the line height doesn't change for individual lines that have ruby annotations.

Defining layout of the text area in this way creates a virtual grid, to which some things snap. For example, headings may be indented by a given number of character spaces, and are centred on a given number of lines in the grid. Page headers and footers may also correspond to aspects of the text area grid for positioning.

Column layout

Columns in vertically set text run horizontally from right to left.

Columns run horizontally in vertically-set text.

The title for this content runs horizontally across the top of the columns. This is a common approach. Note that although the columns are read RTL, the heading is LTR.

Page numbering, running headers, etc

Page headers and footers typically run horizontally on vertically set pages.

Example of a page format in vertical writing mode. Source j,#elements_of_page_formats.

Character lists

Version 13.0 of the Unicode Standard has the following blocks dedicated to the Japanese script (numbers in lists are non-ASCII only):

Apart from ASCII characters, the Japanese orthography described here uses 2,136 characters (and 11 more, used infrequently) from the following Unicode blocks:

Languages using the Japanese scripts

According to ScriptSource, the Japanese scripts are used for the following languages:

References