Kanji, Hiragana, Katakana, & Latin orthography notes

Updated 25 April, 2024

This page brings together basic information about the Japanese writing system, which includes the use of Kanji, Hiragana, Katakana and Latin scripts, and its use for the Japanese language. It aims to provide a brief, descriptive summary of the modern, printed orthography and typographic features, and to advise how to write Japanese using Unicode.

Referencing this document

Richard Ishida, Japanese (Kanji, Hiragana, Katakana, & Latin) Orthography Notes, 25-Apr-2024, https://r12a.github.io/scripts/jpan/ja


Select part of this sample text to show a list of characters, with links to more details.
Change size:   28px

第1条 すべての人間は、生まれながらにして自由であり、かつ、尊厳と権利とについて平等である。人間は、理性と良心とを授けられており、互いに同胞の精神をもって行動しなければならない。

第2条 すべて人は、人種、皮膚の色、性、言語、宗教、政治上その他の意見、国民的もしくは社会的出身、財産、門地その他の地位又はこれに類するいかなる自由による差別をも受けることなく、この宣言に掲げるすべての権利と自由とを享有することができる。 さらに、個人の属する国又は地域が独立国であると、信託統治地域であると、非自治地域であると、又は他のなんらかの主権制限の下にあるとを問わず、その国又は地域の政治上、管轄上又は国際上の地位に基ずくいかなる差別もしてはならない。

Usage & history

The 'Japanese orthography' described here is a mixture of 4 scripts which are all used together for any Japanese text: Han (kanji), Hiragana, Katakana, and Latin. Together they are what is used to write the Japanese language.


"Kanji characters were introduced to Japan around the 3rd century, it is thought from Korea. Until the 7th or 8th century, the Japanese language was written exclusively in these Chinese characters. Initially these were used phonetically to represent similar-sounding Japanese syllables, regardless of their meaning in written Chinese. However, the process of writing Japanese solely in kanji was laborious; each symbol consisted of a number of strokes and only represented one syllable. Two simplified forms of writing began to emerge around the 7th century. The modern hiragana script developed from a simplified cursive style originally developed by women, who were discouraged from learning kanji, and katakana was developed by Buddhist scholars who wrote only one element of each kanji symbol as a form of shorthand."s

Sources: Scriptsource, Wikipedia

Basic features

Four scripts are used, mixed together to write Japanese: kanji (han), katakana, hiragana, and latin. Essentially, Japanese writing is a mixture of an ideographic and a syllabic script. Non-latin letters typically represent a spoken syllable. See the table to the right for a brief overview of features for the modern Japanese orthography. The character count reflects a typical set of characters needed for everyday reading and writing: there are thousands more kanji characters that could be added for other purposes.

Text can be written horizontally or vertically. The visual forms of characters don't interact, but rotated and alternative glyph forms are needed to enable the switch between directions.

Words are not separated by spaces or any other character. There is no case distinction. The visual forms of characters don't interact.

❯ characters

Kanji characters are mostly derived from the Chinese Han script. They are used for word roots.

The term kana covers two syllabaries that are used with kanji characters (see Han) to write Japanese. See the table to the right for a brief overview of features, taken from the Script Comparison Table.

One syllabary is hiragana, the other katakana. In both cases, the repertoire includes 5 independent vowel sounds, one nasal sound, and the rest are consonant+vowel combinations. There are a small number of additional characters with particular functions, such a katakana lengthening mark, and a few small characters for representing medial glides.

The Latin (romaji) characters and much of the punctuation corresponding to the ASCII range is available in fullwidth sizes that match the dimensions of the kanji and kana.

Character index




  1. kanji
  2. lists



Modifier letters


CJK compatibility characters


Half-width katakana




Combining marks








See: lists




See: lists



Items to show in lists


These are sounds for the standard Japanese language.

Click on the sounds to reveal locations in this document where they are mentioned.

Phones in a lighter colour are non-native or allophones.

Vowel sounds

Plain vowels

i u e o a

Consonant sounds

labial alveolar post-
palatal velar uvular glottal
stop p b t d     k ɡ    
affricate   t͡s d͡z t͡ɕ d͡ʑ        
fricative ɸ s z ɕ ʑ ç     h
nasal m n   ɲ ŋ ɴ
approximant w     j    
trill/flap   r    


Japanese is not a tonal language.


The Japanese language, unlike many neighouring languages, uses polysyllabic words, and has no tones. It is an agglutinative language, and doesn't use spaces or other characters to separate words.

Wikipediaw,#Sentences,_phrases_and_words has a nicely written summary of the structural characteristics:

Text (文章 bunshō) is composed of sentences ( bun), which are in turn composed of phrases (文節 bunsetsu), which are its smallest coherent components. Like Chinese and classical Korean, written Japanese does not typically demarcate words with spaces; its agglutinative nature further makes the concept of a word rather different from words in English. The reader identifies word divisions by semantic cues and a knowledge of phrase structure. Phrases have a single meaning-bearing word, followed by a string of suffixes, auxiliary verbs and particles to modify its meaning and designate its grammatical role. In the following example, phrases are indicated by vertical bars:

taiyō ga | higashi no | sora ni | noboru
sun SUBJECT | east POSSESSIVE | sky LOCATIVE | rise
The sun rises in the eastern sky.

Romanised Japanese text may add spaces between bunsetsu phrases, with hyphens separating suffixes (eg. higashi-no), or may also separate the suffixes using spaces (ie. higashi no).

It is common for Japanese words to repeat morphemes, such as mukashi mukashi once upon a time. Often, this introduces a feature called rendaku, whereby the initial consonant of the repeated sound changes, such as in hitobito people.


Kanji characters

Kanji characters are mostly derived from the Chinese Han script. They are commonly used for word roots and compound words.

The compound word static electricity (seidenki), written with kanji.


Kanji characters are primarily constructed from characters that each represent a phonetic symbol. Some have pictographic origins that are still evident, whereas others have a more complicated structure.

In reforms in the mid 20th century, the Japanese repertoire was standardised on around 2,000 core characters, however standardised computer character sets support a few thousand more.

The Jōyō kanji character set (常用漢字) is intended as a literacy baseline for those who have completed compulsory education, as well as a list of permitted characters and readings for use in official government documentswjy.

A second list, called Jinmeiyō kanji (人名用漢字), is a supplementary list of 863 characters that can legally be used in registered personal names in Japan.wj

The number of characters in these lists changes from time to time. The Wikipedia articles for Jōyō and Jinmeiyō kanji provide useful timelines indicating changes over the years.

In addition to the basic lists, there are a number of variants and traditional forms that need to be considered. Note also that the jōyō character set only provides a baseline for the educational process. School leavers still have more characters to learn to achieve a working competency.

Show kanji characters in jōyō and jinmeiyō lists in 2020.
Jōyō kanji 亜 哀 挨 愛 曖 悪 握 圧 扱 宛 嵐 安 案 暗 以 衣 位 囲 医 依 委 威 為 畏 胃 尉 異 移 萎 偉 椅 彙 意 違 維 慰 遺 緯 域 育 一 壱 逸 茨 芋 引 印 因 咽 姻 員 院 淫 陰 飲 隠 韻 右 宇 羽 雨 唄 鬱 畝 浦 運 雲 永 泳 英 映 栄 営 詠 影 鋭 衛 易 疫 益 液 駅 悦 越 謁 閲 円 延 沿 炎 怨 宴 媛 援 園 煙 猿 遠 鉛 塩 演 縁 艶 汚 王 凹 央 応 往 押 旺 欧 殴 桜 翁 奥 横 岡 屋 億 憶 臆 虞 乙 俺 卸 音 恩 温 穏 下 化 火 加 可 仮 何 花 佳 価 果 河 苛 科 架 夏 家 荷 華 菓 貨 渦 過 嫁 暇 禍 靴 寡 歌 箇 稼 課 蚊 牙 瓦 我 画 芽 賀 雅 餓 介 回 灰 会 快 戒 改 怪 拐 悔 海 界 皆 械 絵 開 階 塊 楷 解 潰 壊 懐 諧 貝 外 劾 害 崖 涯 街 慨 蓋 該 概 骸 垣 柿 各 角 拡 革 格 核 殻 郭 覚 較 隔 閣 確 獲 嚇 穫 学 岳 楽 額 顎 掛 潟 括 活 喝 渇 割 葛 滑 褐 轄 且 株 釜 鎌 刈 干 刊 甘 汗 缶 完 肝 官 冠 巻 看 陥 乾 勘 患 貫 寒 喚 堪 換 敢 棺 款 間 閑 勧 寛 幹 感 漢 慣 管 関 歓 監 緩 憾 還 館 環 簡 観 韓 艦 鑑 丸 含 岸 岩 玩 眼 頑 顔 願 企 伎 危 机 気 岐 希 忌 汽 奇 祈 季 紀 軌 既 記 起 飢 鬼 帰 基 寄 規 亀 喜 幾 揮 期 棋 貴 棄 毀 旗 器 畿 輝 機 騎 技 宜 偽 欺 義 疑 儀 戯 擬 犠 議 菊 吉 喫 詰 却 客 脚 逆 虐 九 久 及 弓 丘 旧 休 吸 朽 臼 求 究 泣 急 級 糾 宮 救 球 給 嗅 窮 牛 去 巨 居 拒 拠 挙 虚 許 距 魚 御 漁 凶 共 叫 狂 京 享 供 協 況 峡 挟 狭 恐 恭 胸 脅 強 教 郷 境 橋 矯 鏡 競 響 驚 仰 暁 業 凝 曲 局 極 玉 巾 斤 均 近 金 菌 勤 琴 筋 僅 禁 緊 錦 謹 襟 吟 銀 区 句 苦 駆 具 惧 愚 空 偶 遇 隅 串 屈 掘 窟 熊 繰 君 訓 勲 薫 軍 郡 群 兄 刑 形 系 径 茎 係 型 契 計 恵 啓 掲 渓 経 蛍 敬 景 軽 傾 携 継 詣 慶 憬 稽 憩 警 鶏 芸 迎 鯨 隙 劇 撃 激 桁 欠 穴 血 決 結 傑 潔 月 犬 件 見 券 肩 建 研 県 倹 兼 剣 拳 軒 健 険 圏 堅 検 嫌 献 絹 遣 権 憲 賢 謙 鍵 繭 顕 験 懸 元 幻 玄 言 弦 限 原 現 舷 減 源 厳 己 戸 古 呼 固 股 虎 孤 弧 故 枯 個 庫 湖 雇 誇 鼓 錮 顧 五 互 午 呉 後 娯 悟 碁 語 誤 護 口 工 公 勾 孔 功 巧 広 甲 交 光 向 后 好 江 考 行 坑 孝 抗 攻 更 効 幸 拘 肯 侯 厚 恒 洪 皇 紅 荒 郊 香 候 校 耕 航 貢 降 高 康 控 梗 黄 喉 慌 港 硬 絞 項 溝 鉱 構 綱 酵 稿 興 衡 鋼 講 購 乞 号 合 拷 剛 傲 豪 克 告 谷 刻 国 黒 穀 酷 獄 骨 駒 込 頃 今 困 昆 恨 根 婚 混 痕 紺 魂 墾 懇 左 佐 沙 査 砂 唆 差 詐 鎖 座 挫 才 再 災 妻 采 砕 宰 栽 彩 採 済 祭 斎 細 菜 最 裁 債 催 塞 歳 載 際 埼 在 材 剤 財 罪 崎 作 削 昨 柵 索 策 酢 搾 錯 咲 冊 札 刷 刹 拶 殺 察 撮 擦 雑 皿 三 山 参 桟 蚕 惨 産 傘 散 算 酸 賛 残 斬 暫 士 子 支 止 氏 仕 史 司 四 市 矢 旨 死 糸 至 伺 志 私 使 刺 始 姉 枝 祉 肢 姿 思 指 施 師 恣 紙 脂 視 紫 詞 歯 嗣 試 詩 資 飼 誌 雌 摯 賜 諮 示 字 寺 次 耳 自 似 児 事 侍 治 持 時 滋 慈 辞 磁 餌 璽 鹿 式 識 軸 七 𠮟 失 室 疾 執 湿 嫉 漆 質 実 芝 写 社 車 舎 者 射 捨 赦 斜 煮 遮 謝 邪 蛇 尺 借 酌 釈 爵 若 弱 寂 手 主 守 朱 取 狩 首 殊 珠 酒 腫 種 趣 寿 受 呪 授 需 儒 樹 収 囚 州 舟 秀 周 宗 拾 秋 臭 修 袖 終 羞 習 週 就 衆 集 愁 酬 醜 蹴 襲 十 汁 充 住 柔 重 従 渋 銃 獣 縦 叔 祝 宿 淑 粛 縮 塾 熟 出 述 術 俊 春 瞬 旬 巡 盾 准 殉 純 循 順 準 潤 遵 処 初 所 書 庶 暑 署 緒 諸 女 如 助 序 叙 徐 除 小 升 少 召 匠 床 抄 肖 尚 招 承 昇 松 沼 昭 宵 将 消 症 祥 称 笑 唱 商 渉 章 紹 訟 勝 掌 晶 焼 焦 硝 粧 詔 証 象 傷 奨 照 詳 彰 障 憧 衝 賞 償 礁 鐘 上 丈 冗 条 状 乗 城 浄 剰 常 情 場 畳 蒸 縄 壌 嬢 錠 譲 醸 色 拭 食 植 殖 飾 触 嘱 織 職 辱 尻 心 申 伸 臣 芯 身 辛 侵 信 津 神 唇 娠 振 浸 真 針 深 紳 進 森 診 寝 慎 新 審 震 薪 親 人 刃 仁 尽 迅 甚 陣 尋 腎 須 図 水 吹 垂 炊 帥 粋 衰 推 酔 遂 睡 穂 随 髄 枢 崇 数 据 杉 裾 寸 瀬 是 井 世 正 生 成 西 声 制 姓 征 性 青 斉 政 星 牲 省 凄 逝 清 盛 婿 晴 勢 聖 誠 精 製 誓 静 請 整 醒 税 夕 斥 石 赤 昔 析 席 脊 隻 惜 戚 責 跡 積 績 籍 切 折 拙 窃 接 設 雪 摂 節 説 舌 絶 千 川 仙 占 先 宣 専 泉 浅 洗 染 扇 栓 旋 船 戦 煎 羨 腺 詮 践 箋 銭 潜 線 遷 選 薦 繊 鮮 全 前 善 然 禅 漸 膳 繕 狙 阻 祖 租 素 措 粗 組 疎 訴 塑 遡 礎 双 壮 早 争 走 奏 相 荘 草 送 倉 捜 挿 桑 巣 掃 曹 曽 爽 窓 創 喪 痩 葬 装 僧 想 層 総 遭 槽 踪 操 燥 霜 騒 藻 造 像 増 憎 蔵 贈 臓 即 束 足 促 則 息 捉 速 側 測 俗 族 属 賊 続 卒 率 存 村 孫 尊 損 遜 他 多 汰 打 妥 唾 堕 惰 駄 太 対 体 耐 待 怠 胎 退 帯 泰 堆 袋 逮 替 貸 隊 滞 態 戴 大 代 台 第 題 滝 宅 択 沢 卓 拓 託 濯 諾 濁 但 達 脱 奪 棚 誰 丹 旦 担 単 炭 胆 探 淡 短 嘆 端 綻 誕 鍛 団 男 段 断 弾 暖 談 壇 地 池 知 値 恥 致 遅 痴 稚 置 緻 竹 畜 逐 蓄 築 秩 窒 茶 着 嫡 中 仲 虫 沖 宙 忠 抽 注 昼 柱 衷 酎 鋳 駐 著 貯 丁 弔 庁 兆 町 長 挑 帳 張 彫 眺 釣 頂 鳥 朝 貼 超 腸 跳 徴 嘲 潮 澄 調 聴 懲 直 勅 捗 沈 珍 朕 陳 賃 鎮 追 椎 墜 通 痛 塚 漬 坪 爪 鶴 低 呈 廷 弟 定 底 抵 邸 亭 貞 帝 訂 庭 逓 停 偵 堤 提 程 艇 締 諦 泥 的 笛 摘 滴 適 敵 溺 迭 哲 鉄 徹 撤 天 典 店 点 展 添 転 塡 田 伝 殿 電 斗 吐 妬 徒 途 都 渡 塗 賭 土 奴 努 度 怒 刀 冬 灯 当 投 豆 東 到 逃 倒 凍 唐 島 桃 討 透 党 悼 盗 陶 塔 搭 棟 湯 痘 登 答 等 筒 統 稲 踏 糖 頭 謄 藤 闘 騰 同 洞 胴 動 堂 童 道 働 銅 導 瞳 峠 匿 特 得 督 徳 篤 毒 独 読 栃 凸 突 届 屯 豚 頓 貪 鈍 曇 丼 那 奈 内 梨 謎 鍋 南 軟 難 二 尼 弐 匂 肉 虹 日 入 乳 尿 任 妊 忍 認 寧 熱 年 念 捻 粘 燃 悩 納 能 脳 農 濃 把 波 派 破 覇 馬 婆 罵 拝 杯 背 肺 俳 配 排 敗 廃 輩 売 倍 梅 培 陪 媒 買 賠 白 伯 拍 泊 迫 剝 舶 博 薄 麦 漠 縛 爆 箱 箸 畑 肌 八 鉢 発 髪 伐 抜 罰 閥 反 半 氾 犯 帆 汎 伴 判 坂 阪 板 版 班 畔 般 販 斑 飯 搬 煩 頒 範 繁 藩 晩 番 蛮 盤 比 皮 妃 否 批 彼 披 肥 非 卑 飛 疲 秘 被 悲 扉 費 碑 罷 避 尾 眉 美 備 微 鼻 膝 肘 匹 必 泌 筆 姫 百 氷 表 俵 票 評 漂 標 苗 秒 病 描 猫 品 浜 貧 賓 頻 敏 瓶 不 夫 父 付 布 扶 府 怖 阜 附 訃 負 赴 浮 婦 符 富 普 腐 敷 膚 賦 譜 侮 武 部 舞 封 風 伏 服 副 幅 復 福 腹 複 覆 払 沸 仏 物 粉 紛 雰 噴 墳 憤 奮 分 文 聞 丙 平 兵 併 並 柄 陛 閉 塀 幣 弊 蔽 餅 米 壁 璧 癖 別 蔑 片 辺 返 変 偏 遍 編 弁 辛 便 勉 歩 保 哺 捕 補 舗 母 募 墓 慕 暮 簿 方 包 芳 邦 奉 宝 抱 放 法 泡 胞 俸 倣 峰 砲 崩 訪 報 蜂 豊 飽 褒 縫 亡 乏 忙 坊 妨 忘 防 房 肪 某 冒 剖 紡 望 傍 帽 棒 貿 貌 暴 膨 謀 頰 北 木 朴 牧 睦 僕 墨 撲 没 勃 堀 本 奔 翻 凡 盆 麻 摩 磨 魔 毎 妹 枚 昧 埋 幕 膜 枕 又 末 抹 万 満 慢 漫 未 味 魅 岬 密 蜜 脈 妙 民 眠 矛 務 無 夢 霧 娘 名 命 明 迷 冥 盟 銘 鳴 滅 免 面 綿 麺 茂 模 毛 妄 盲 耗 猛 網 目 黙 門 紋 問 冶 夜 野 弥 厄 役 約 訳 薬 躍 闇 由 油 喩 愉 諭 輸 癒 唯 友 有 勇 幽 悠 郵 湧 猶 裕 遊 雄 誘 憂 融 優 与 予 余 誉 預 幼 用 羊 妖 洋 要 容 庸 揚 揺 葉 陽 溶 腰 様 瘍 踊 窯 養 擁 謡 曜 抑 沃 浴 欲 翌 翼 拉 裸 羅 来 雷 頼 絡 落 酪 辣 乱 卵 覧 濫 藍 欄 吏 利 里 理 痢 裏 履 璃 離 陸 立 律 慄 略 柳 流 留 竜 粒 隆 硫 侶 旅 虜 慮 了 両 良 料 涼 猟 陵 量 僚 領 寮 療 瞭 糧 力 緑 林 厘 倫 輪 隣 臨 瑠 涙 累 塁 類 令 礼 冷 励 戻 例 鈴 零 霊 隷 齢 麗 暦 歴 列 劣 烈 裂 恋 連 廉 練 錬 呂 炉 賂 路 露 老 労 弄 郎 朗 浪 廊 楼 漏 籠 六 録 麓 論 和 話 賄 脇 惑 枠 湾 腕 2,138
Alternates 剥 叱 填 頬 4
Jōyō traditional variant forms (not including 61 compatibility forms that normalise to other characters) 亞 惡 壓 圍 醫 爲 壹 隱 榮 營 衞 驛 圓 鹽 緣 艷 應 歐 毆 櫻 奧 橫 溫 穩 假 價 畫 會 繪 壞 懷 槪 擴 殼 覺 學 嶽 樂 渴 罐 卷 陷 勸 寬 關 歡 觀 氣 歸 龜 僞 戲 犧 舊 據 擧 虛 峽 挾 狹 鄕 曉 區 驅 勳 薰 徑 莖 惠 揭 溪 經 螢 輕 繼 鷄 藝 擊 缺 硏 縣 儉 劍 險 圈 檢 獻 權 顯 驗 嚴 廣 效 恆 黃 鑛 號 國 黑 碎 濟 齋 劑 雜 參 棧 蠶 慘 贊 殘 絲 齒 兒 辭 濕 實 寫 舍 釋 壽 收 從 澁 獸 縱 肅 處 緖 敍 將 稱 涉 燒 證 奬 條 狀 乘 淨 剩 疊 繩 壤 孃 讓 釀 觸 囑 眞 寢 愼 盡 圖 粹 醉 穗 隨 髓 樞 數 瀨 聲 齊 靜 竊 攝 絕 專 淺 戰 踐 錢 潛 纖 禪 雙 壯 爭 莊 搜 插 巢 曾 瘦 裝 總 騷 增 藏 臟 卽 屬 續 墮 對 體 帶 滯 臺 瀧 擇 澤 擔 單 膽 團 斷 彈 遲 癡 蟲 晝 鑄 廳 徵 聽 敕 鎭 遞 鐵 點 轉 傳 燈 當 黨 盜 稻 鬭 德 獨 讀 屆 貳 惱 腦 霸 拜 廢 賣 麥 發 髮 拔 晚 蠻 祕 濱 甁 拂 佛 倂 竝 餠 邊 變 辨 瓣 辯 步 寶 豐 襃 沒 飜 每 萬 滿 麵 默 彌 譯 藥 與 豫 餘 譽 搖 樣 謠 來 賴 亂 覽 龍 兩 獵 綠 淚 壘 禮 勵 戾 靈 齡 曆 歷 戀 鍊 爐 勞 郞 樓 錄 灣 305
The compatibility forms 逸 謁 禍 悔 海 慨 喝 褐 漢 祈 既 器 響 勤 謹 穀 殺 祉 視 者 煮 臭 祝 暑 署 諸 祥 神 節 祖 僧 層 贈 贈 嘆 著 懲 塚 都 突 難 梅 繁 卑 碑 賓 頻 敏 侮 福 塀 勉 墨 免 欄 隆 虜 類 練 朗 廊 61
Jinmeiyō kanji 丑 丞 乃 之 乎 也 云 亘 些 亦 亥 亨 亮 仔 伊 伍 伽 佃 佑 伶 侃 侑 俄 俠 俣 俐 倭 俱 倦 倖 偲 傭 儲 允 兎 兜 其 冴 凌 凜 凧 凪 凰 凱 函 劉 劫 勁 勺 勿 匁 匡 廿 卜 卯 卿 厨 厩 叉 叡 叢 叶 只 吾 吞 吻 哉 哨 啄 哩 喬 喧 喰 喋 嘩 嘉 嘗 噌 噂 圃 圭 坐 尭 坦 埴 堰 堺 堵 塙 壕 壬 夷 奄 奎 套 娃 姪 姥 娩 嬉 孟 宏 宋 宕 宥 寅 寓 寵 尖 尤 屑 峨 峻 崚 嵯 嵩 嶺 巌 巫 已 巳 巴 巷 巽 帖 幌 幡 庄 庇 庚 庵 廟 廻 弘 弛 彗 彦 彪 彬 徠 忽 怜 恢 恰 恕 悌 惟 惚 悉 惇 惹 惺 惣 慧 憐 戊 或 戟 托 按 挺 挽 掬 捲 捷 捺 捧 掠 揃 摑 摺 撒 撰 撞 播 撫 擢 孜 敦 斐 斡 斧 斯 於 旭 昂 昊 昏 昌 昴 晏 晃 晒 晋 晟 晦 晨 智 暉 暢 曙 曝 曳 朋 朔 杏 杖 杜 李 杭 杵 杷 枇 柑 柴 柘 柊 柏 柾 柚 桧 栞 桔 桂 栖 桐 栗 梧 梓 梢 梛 梯 桶 梶 椛 梁 棲 椋 椀 楯 楚 楕 椿 楠 楓 椰 楢 楊 榎 樺 榊 榛 槙 槍 槌 樫 槻 樟 樋 橘 樽 橙 檎 檀 櫂 櫛 櫓 欣 欽 歎 此 殆 毅 毘 毬 汀 汝 汐 汲 沌 沓 沫 洸 洲 洵 洛 浩 浬 淵 淳 渚 淀 淋 渥 渾 湘 湊 湛 溢 滉 溜 漱 漕 漣 澪 濡 瀕 灘 灸 灼 烏 焰 焚 煌 煤 煉 熙 燕 燎 燦 燭 燿 爾 牒 牟 牡 牽 犀 狼 猪 獅 玖 珂 珈 珊 珀 玲 琢 琉 瑛 琥 琶 琵 琳 瑚 瑞 瑶 瑳 瓜 瓢 甥 甫 畠 畢 疋 疏 皐 皓 眸 瞥 矩 砦 砥 砧 硯 碓 碗 碩 碧 磐 磯 祇 祢 祐 祷 禄 禎 禽 禾 秦 秤 稀 稔 稟 稜 穣 穹 穿 窄 窪 窺 竣 竪 竺 竿 笈 笹 笙 笠 筈 筑 箕 箔 篇 篠 簞 簾 籾 粥 粟 糊 紘 紗 紐 絃 紬 絆 絢 綺 綜 綴 緋 綾 綸 縞 徽 繫 繡 纂 纏 羚 翔 翠 耀 而 耶 耽 聡 肇 肋 肴 胤 胡 脩 腔 脹 膏 臥 舜 舵 芥 芹 芭 芙 芦 苑 茄 苔 苺 茅 茉 茸 茜 莞 荻 莫 莉 菅 菫 菖 萄 菩 萌 萊 菱 葦 葵 萱 葺 萩 董 葡 蓑 蒔 蒐 蒼 蒲 蒙 蓉 蓮 蔭 蔣 蔦 蓬 蔓 蕎 蕨 蕉 蕃 蕪 薙 蕾 蕗 藁 薩 蘇 蘭 蝦 蝶 螺 蟬 蟹 蠟 衿 袈 袴 裡 裟 裳 襖 訊 訣 註 詢 詫 誼 諏 諄 諒 謂 諺 讃 豹 貰 賑 赳 跨 蹄 蹟 輔 輯 輿 轟 辰 辻 迂 迄 辿 迪 迦 這 逞 逗 逢 遥 遁 遼 邑 祁 郁 鄭 酉 醇 醐 醍 醬 釉 釘 釧 銑 鋒 鋸 錘 錐 錆 錫 鍬 鎧 閃 閏 閤 阿 陀 隈 隼 雀 雁 雛 雫 霞 靖 鞄 鞍 鞘 鞠 鞭 頁 頌 頗 顚 颯 饗 馨 馴 馳 駕 駿 驍 魁 魯 鮎 鯉 鯛 鰯 鱒 鱗 鳩 鳶 鳳 鴨 鴻 鵜 鵬 鷗 鷲 鷺 鷹 麒 麟 麿 黎 黛 鼎 633
Jinmeiyō variants 亙 凛 巖 堯 晄 檜 槇 渚 猪 琢 禰 祐 禱 祿 禎 穰 萠 遙 18

CJK compatibility characters

The Jōyō traditional forms include 60 kanji shapes that Unicode includes in the CJK Compatibility Ideographs block. Normalisation operations (which in some systems may happen automatically, or during things such as cut & paste) convert them to characters in the main CJK block. This makes them unstable, and best avoided. The following list shows the compatibility character shape to the left, and the normalised shape to the right.


Kana syllabaries

Japanese uses two syllabaries: hiragana and katakana. The vowel sounds u and i are often elided between non-voiced consonants, or at the end of a word.

Katakana characters are typically used for foreign loan words and names, such as the word 'text'. They are also used for things such as scientific names of plants and animals, onomatopoeic sounds, telegrams, and some female names.

The word text, written with katakana syllables.


Hiragana is used for indigenous Japanese words, such as the verb 'to be'.

The word for to be desu, written with hiragana syllables.


It is also used for grammatical endings after a word root written using kanji characters.

The word for to collect atsumarimasu, with the verb root atsu written using a kanji character, and the remainder in hiragana expresssing the grammatical present-tense.


The basic syllabary includes 5 independent vowel sounds, one nasal sound, and the rest are consonant+vowel combinations. In these lists we show hiragana (first) and katakana (second) together.


Voiced consonants are indicated by attaching a dakuten mark (looks like a quote mark) to the unvoiced shape. Unicode provides precomposed code points for every combination of syllable+dakuten.


The ‘p’ sound is indicated in a similar way by the use of a han-dakuten (half-dakuten).


The Unicode hiragana block does contain separate code points for dakuten combining marks and modifiers, but these are not normally used in text. However, if Unicode NFD normalisation is applied to text, the dakuten and han-dakuten are split from the base and the combining marks are used.


Long vowels


Various strategies are used to represent long vowels, and they tend to differ between hiragana and katakana. This elongation is phonemically significant.

In hiragana, the long vowels , , , and are written by adding a corresponding vowel.





In words of Chinese origin may be written 'ei'.


The long is usually written 'ou', but is sometimes written 'oo'.



In katakana, long vowels are indicated using . This character is used predominantly with katakana, but occasionally also with hiragana.uk,720




In a few exceptions, katakana uses a similar approach to hiragana.



The more common grammatical particles are spelled in an idiosynchratic way. The topic marker wa is written using . The object marker o is written using . And the location marker e is written using .

Small kana

The basic set of kana syllables is completed by a number of small forms used for medial glides, foreign sounds, and gemination, and a vowel lengthener.


Small versions of や, ゆ, and よ are used to form syllables such as きゃ kya kʲa きゅ kya kʲu きょ kyo kʲo

and are used to lengthen a following consonant sound.


It is also used to represent a glottal stop in a broken-off word.


The small vowel syllables shown above are typically used for transcribing unusual sounds, such as lengthening a preceding vowel, or transliterating foreign sounds, without creating a new syllable.






Over time, certain voiced sounds have merged in several important dialects, as shown in fig_yotsugana.

Tokyo (standard)d͡ʑi~ʑi d͡zɯᵝ~zɯᵝ
South Tohokud͡zɯᵝ
Kōchi (Hata, Tosa) di~d͡zi ʑi dɯᵝ~d͡zɯᵝzɯᵝ
Kagoshima d͡ʑiʑi d͡zɯᵝzɯᵝ
Yotsugana pronunciation around Japan. (source wy.)

The orthographic reform shortly after World War 2 recommended the use of only and , except in circumstances where an unvoiced sound has become voiced because of:wy

  1. compounding (rendaku), eg. 神無月 which combines かん, , and つき, would be written in hiragana as かんなづき
  2. repetition, eg. is written in hiragana asつづく

Archaic characters

A number of characters in the kana blocks are no longer used in modern text, except in counter styles (see lists).

These characters were dropped by an orthographic reform shortly after World War 2.


Halfwidth katakana

Unicode has a set of halfwidth katakana forms for legacy encoding roundtrips. In principle, these characters should not be used. The normal, fullsized characters should be used instead.


Text direction

Text can be written horizontally, left to right, or vertically with lines progressing from right to left. Vertically set text is still common in Japan; most novels, newspapers and magazines are set vertically.j,#h-note-15

Sometimes, vertically set text may contain sections or items that are set horizontally. For example, in newspapers, headings are normally set horizontally above the body of an article which is set vertically, and captions are usually horizontal.

fig_mixed_direction shows pages from a magazine that mix directions on the page.

Vertically set pages with mixed direction text.
Click on image for larger size.

Older horizontally set texts in Japanese also ran right to left.

Different conventions are applied for horizontal and vertical text, for example in terms of characters used and treatment of embedded romaji and numerals. Apart from the question of what gets rotated and what does not, the two writing modes may show different preferences for emphasis marks, brackets, numbers, and so forth. This means that it is not usually appropriate to simply switch the direction of the text without making additional changes.

In vertical text (only) decisions have to be made about how to present embedded romaji text and numbers. Romaji typically runs down the page, with proportionally-spaced characters rotated 90º to the right. However, acronyms are often written using upright, fullwidth characters.

の editor は のGNPは
Romaji text with characters rotated (left), and an acronym with upright letters (right).

Numbers, and sometimes text, may also run horizontally within a vertical line. This is most common with double-digit numbers, such as in dates. The width of the horizontal text should not normally exceed the width of the surrounding vertical text (ie. it should fit in the width of a character space). This is referred to as tate chu yoko.

Numbers arranged horizontally within a vertical line.

Glyph shaping & positioning

Experiment with examples using the Japanese character app.

The Japanese scripts are not cursive, and when using precomposed kana (which is the norm) involve no context-based shaping or positioning.

The orthography has no case distinction.

By default, all kanji, hiragana, katakana, and punctuation characters are drawn inside a character frame that is square and the same size for all characters. The box containing the actual symbol is called the letter face, and there should be some space left between the letter face and the character frame. There may be variations, particularly for small kana, punctuation, etc., in the size of the letter face.

Because of the regularity of the character frame size, it can be used to measure the size of the text area or other parts of a page (horizontally or vertically).

Character frame and letter face.

In principle, Japanese characters are set solid, ie. with no space between the character frames. However, text alignment and justification can make adjustments to the placement of characters in the direction of the line flow. See justification and letterspace.

Font styles

The kanji characters are derived from Han characters originally used in Chinese. Many of the Japanese and Chinese characters are unified to the same code point in the Unicode repertoire, however over time small but systematic, language-related changes have appeared in the glyph shapes of some characters compared to their Chinese equivalents. It is important to choose fonts that present the user with the correct glyphs. fig_ja_zh_fonts provides some examples.

The same code points, displayed with a Japanese font (top) and Chinese font (bottom).

Besides the need to choose fallback fonts that match the language of the text, Japanese also has some recognisable font styles. Two well-known font styles are often called Mincho and Gothic. The former has strokes with fine gradations of stroke width, whereas the latter has darker strokes with little gradation. For fallback on the Web, these styles are usually equated with serif and sans-serif, respectively, although serifs are not actually involved.

尊厳と権利とについて平等である 尊厳と権利とについて平等である
The same text displayed using the Hiragino Mincho Pro font (top) and the Hiragino Kaku Gothic Pro font (bottom).

Another useful type of font style relates to the endings of Gothic font strokes, which can be flat or rounded.

A typical Gothic font has strokes with squared-off endings (top), but sometimes a font with rounded stroke endings is preferred (bottom).

Context-based shaping & positioning

Horizontal vs. vertical transformation. Characters such as small kana and punctuation occupy different locations within the character frame in horizontal and vertical text.

Positioning of small ょ and the full stop in horizontal and vertical character frames.

fig_small_kana shows how in horizontal text small kana are centred horizontally in the character frame but are vertically below centre; in vertical text they are centred vertically, but aligned right.

The full stop also switches from bottom-left in horizontal text, to top-right in vertical.

These are differences that cannot be produced by rotating glyphs, but require special glyphs in the font which are applied when the directional context is detected.

Positioning of decomposed diacritics. When kana use a dakuten or han-dakuten there can be significant overlap with the base character. See fig_kerning_nfd.

Syllabic characters where the dakuten or han-dakuten overlaps the base.

This overlap needs to be unchanged whether the diacritics are part of a single glyph or are separate code points in decomposed text. In the latter case, careful positioning of the diacritics is required.

Shaping of punctuation. Many punctuation marks need to have different shapes for Japanese and non-Japanese text. Often these differences are due to the fact that punctuation for Japanese is based on the em-box, rather than the Latin baseline, cap-height, or x-height. A description of many such differences can be found in Ken Lunde's Proposal to add standardized variation sequences.

Transforming characters

Japanese kanji and kana is a monocameral orthography, and no transforms are needed to convert between different case forms for a given letter. However, romaji characters are cased.

Other transforms may be applied to convert between half-width and full-width characters. This can be useful for converting to and from fullwidth Latin and punctuation, and is sometimes useful for converting small kana characters to full-sized versions.

The latter transformation is common for ruby text (see inlinenotes), where small kana are converted visually to full-sized to aid with readability of the text, given that ruby text is written in small character sizes.

To achieve this in web pages use the text-transform CSS property@CSS Text specification,https://www.w3.org/TR/css-text-3/#transforming in your style sheet with the following values.

full-width (Not yet supported by browsers.)@CSS Text specification,https://www.w3.org/TR/css-text-3/#transforming
Transforms all ASCII characters into fullwidth forms.
full-size-kana (Not yet supported by browsers.)@CSS Text specification,https://www.w3.org/TR/css-text-3/#transforming
Converts all small Kana characters to the equivalent full-size Kana.

Eg. the following converts the visual appearance (only) of small kana in ruby text to fullwidth characters, except in headings (where the characters are larger):
rt { font-size: 50%; text-transform: full-size-kana; }
:is(h1, h2, h3, h4) rt { text-transform: none; /* unset for large text*/ }

Typographic units

Word boundaries

Japanese rarely uses spaces. In the sample text there are gaps around punctuation, but these are produced by a lack of 'ink' in parts of the square character glyphs.

You can verify this by clicking on this example. The character list popup shows that only four characters make up this sequence, and none are spaces.


Gaps of this kind may also be reduced during justification and line alignment.

In general, word boundaries are not important for line-wrapping, however occasionally text such as headings may be wrapped at word boundaries in order to better balance the text.

Word boundaries are identified when users select screen text, eg. by double-clicking inside a word. Heuristics and dictionaries are needed to identify the boundaries of words in such situations. Note, also, that words in Japanese are very often a mixture of kanji characters followed by hiragana. The word boundary detection needs to treat the various scripts as a unified orthography.


Since there are no combining marks or decompositions in typical Japanese text, graphemes correspond to individual characters for kanji and kana.

The only 2 combining characters listed in this page are 3099 and 309A (which are rarely used), and are not used together. Therefore they simply follow the Unicode rule of combining characters following the base character they are attached to.

Unicode grapheme clusters can therefore be applied to Japanese text without problems. There are no special issues related to operations that use grapheme clusters as their basic unit of text.

Codepoint order

The ordering of codepoints in a Japanese grapheme is generally not relevant, because graphemes are usually single, syllablic code points. When combining characters are used, there is usually just one.

Punctuation & inline features

Phrase & section boundaries

Japanese uses the following separators at the sentence level and below.j,#differences_in_vertical_and_horizontal_composition_in_use_of_punctuation_marks Some of the punctuation looks like that for Latin (eg. parentheses, commas, and full stops), but the width of the punctuation is likely to include significant amounts of white space, so that punctuation characters occupy the same space as han characters.

    H V
phrase Bottom left Bottom right
Bottom left Bottom right
Bottom right Bottom right
Bottom right Bottom right
sentence Bottom left Bottom right
Bottom left Bottom right
exclamation Bottom right Right
question Bottom right Right

and are the norm for vertical text, however two alternative conventions as applied to horizontal text: especially in books that mix Japanese and western text, such as books on science and technology, the former may be replaced by and . Often, however, the ideographic full stop is retained, since it is more visible and looks better (this convention has been adopted for Japanese official publications).j,#differences_in_vertical_and_horizontal_composition_in_use_of_punctuation_marks

As the table shows, these punctuation marks require dedicated glyphs in the font, and cannot be achieved by simply rotating the glyph.

Japanese also uses the following doubled exclamation/question marks. They remain upright in vertical text.

Other punctuation used to separate phrases or items includes:

  H V
Bottom right Bottom right
—— Bottom right Bottom right

If EM DASH characters are used, they are used in pairs.

Bracketed text

For general parentheses and bracketing in text, Japanese uses:

    H V
Right aligned Left aligned Bottom aligned
Top aligned
Right aligned Left aligned -
- Bottom aligned
Top aligned

and its closing partner are the vertical equivalent of , which is used in horizontal text.

Although there are a number of other bracket characters (listed just below), they are less commonly used.


Quotations & citations

Japanese uses different quote marks for horizontal and vertical writing. The default quote marks are:

    H V
Top=right aligned Top-left aligned -
Top=right aligned Top-left aligned Bottom-right aligned
Top-left aligned
- Top=right aligned Top-left aligned

When an additional quote is embedded within the first, the quote marks are:

    H V
Top-right aligned Bottom-left aligned -
- Bottom-right aligned
Top-left aligned


Japanese sometimes uses katakana characters to create visual emphasis. uk,720

もうダメだ moː dame da it's too late!

Japanese Layout Requirements lists the following ways of showing emphasis in Japanese.

  1. Select a different typeface (eg. a Gothic font in Mincho text).
  2. Use and or and .
  3. Change the colour.
  4. Underline.
  5. Boten marks (also known as emphasis marks).

Note that this list doesn't include italicisation or bolding of text. (1) and (2) are popular approaches. (5) is not as common, but is a traditional approach with some value attached.

Different boten marks are used in horizontal and vertical text. Typically, bullets are used above characters in horizontal text, and sesame dots are used to the right of characters in vertical text.j,#composition_of_emphasis_dots

The boten mark is centre-aligned with the base characters in horizontal text, and middle-aligned in vertical text, and doesn't normally appear alongside full stops, commas, or brackets.j,#composition_of_emphasis_dots

Horizontal text and boten marks. Horizontal text and boten marks.
Boten marks used for emphasis in horizontal and vertical text. (source)

Embedded text in other languages would have boten marks displayed on the same side as for Japanese.

Abbreviation, ellipsis & repetition




Japanese has a number of logograms used as abbreviations.

is a reasonably common shorthand for the character , which is a counter for months, places or provisions. It is pronounced ka or ko, and is not related to the larger kana , which is pronounced ke. See also @Wikipedia,https://en.wikipedia.org/wiki/Small_ke.



is primarily used as a short form of ʃime from the verb 閉める. For example, it can be used as follows in place of the word 締切.@Wiktionary,https://en.wiktionary.org/wiki/〆#Japanese


is a ligature of the word より.

is a shorthand for the word .

is derived from a semi-pictogram for a small wooden measuring box called masu. It then moved on to represent a shorthand for the grammatical ending for the present tense verb, which has the same sound.



Japanese has a number of iteration marks that repeat the previous syllable or word. The repeated sound may differ slightly due to rendaku sound changes.

and are used to repeat kanji characters; the former for horizontal text, and the latter (rare) for vertical text.

人々 島〻

There are separate marks for hiragana and katakana, and within that division there are separate marks for syllables that begin with a voiced stop. and are used to denote hiragana ordinary and voiced stop repetitions, respectively. The katakana equivalents are and .

かゝし たゞし バナヽ

It is not common, but it is possible to find horizontal text that repeats the iteration mark in order to repeat multiple characters.


See also @Wikipedia,https://en.wikipedia.org/wiki/Iteration_mark#Japanese.


Japanese also has a set of graphemes to indicate repetition of multiple characters, although they are mostly obsolete these days.@Wikipedia,https://en.wikipedia.org/wiki/Iteration_mark#Japanese They are only used in vertical text, and they take up 2 character spaces.

The Unicode Standard also provides half forms, which can be combined to span the 2 character distance.

いろ〳〵 離れ〲 わざ〱 まあ〱
Iterators for multiple syllables in vertical text. The last item uses 2 half glyphs.
details まあ〱 わざ〱 離れ〲 いろ〳〵

Inline notes & annotations

Japanese has a few ways of representing inline notes and annotations.


Various ways of arranging inter-linear annotations alongside text fall under the rubrique of ruby (named from the British print size originally used for the annotations). These include mono-ruby, jukugo-ruby, and group-ruby, and they are described in detail below.

Ruby is commonly used to indicate the pronunciation of ideographic characters used in Japanese, as it cannot usually be guessed and so can pose difficulties for those learning the language. For these cases, mono-ruby is most commonly used, however a variant, jukugo-ruby, is sometimes applied to compound nouns (which are called jukugo in Japanese).

Where sequences of kanji characters do not have the same pronunciation as the sum of their parts (called jukuji), such as the following two words, group ruby is used to represent the sound.j,#h-note-109

Click on the words to see their composition.

田舎 今日

Ruby annotations are also used to provide brief indications of the meaning of words or characters. These annotations typically use the group-ruby approach. The most typical example of this is attaching ruby text to a kanji compound word to indicate a corresponding loan word in katakana (see fig_group_ruby).j,#id221 Group ruby is also used to indicate the reading or the meaning of a Western word used in base text, or where a synonymous Western word in Latin characters is attached as a ruby annotation to a Japanese word (see Figure 112).

The rest of this section describes features that are generally common to all forms of ruby, before we move on to examine the differences in following subsections.

All annotations appear within the standard inter-line space for the page, and don't create extra line height if they only appear on a single line. The inter-line space is usually set at an appropriate size to accommodate annotations.

Unlike Chinese, it is common to find annotations applied just to specific words, rather than annotating the whole text.

Ruby annotations normally appear above horizontal lines of text, and to the right of vertical lines. Occasionally, both phonetic and semantic annotations are applied to the same base text, in which case the annotations appear on both sides of the base. A typical scenario in these cases would be to have mono-ruby above/right of the base, and group-ruby below/left.j,#choice_of_sides_for_ruby_with_respect_to_base_characters

Double-sided ruby.

The character frame of kana annotations is usually half that of the base character. Occasionally, annotations are compressed in one direction (depending of direction of writing) so that 3 fit over a single base character.j,#fig2_3_10 In large text (12pt or more), such as headings, the size of the annotation may be less that half that of the base.j,#fig2_3_11


Usually applied to kanji base characters, each base character is associated individually with an annotation.

Annotations are normally centred over the base character in horizontal text, and with the middle of the base character in vertical text. (called nakatsuki). An alternative, used only in vertical text, is to align the annotation with the top of the character frame of the base character (katatsuki)j,#id227, as in the righthand example in fig_jukugo.

推/すい/理/り/小/しょう/説/せつ (detective novel) 推/すい/理/り/小/しょう/説/せつ (detective novel) vertical
Mono-ruby for detective novel, in horizontal and vertical text (colouring added for illustrative purposes).


Since the annotation characters are usually 1/2 the size of the base characters, 3-character annotations require more space that the underlying kanji. Internally to the sequence, this will produce a gap between the base characters, since annotations cannot overlap (see fig_mono_ruby).

At either end of the sequence, either a gap is opened up between the base character with the long annotation and its neighbour (see fig_overhang), or the annotation may overhang the neighbouring base characters. Simpler implementations produce gaps, but allow annotations to overhang any blank parts of adjacent fullwidth punctuation characters. More sophisticated applications may allow overlap of kana or other characters, though never kanjij,#232, but may also have to deal more complicated algorithms, such as balancing space on either side of the ruby sequence, or deciding what can and cannot be overlapped, and to what extent.j,#id229 j,#adjustments_of_ruby_with_length_longer_than_that_of_the_base_characters

Alternative ways of dealing with potential overhang either side of the ruby sequence.

At line start or line end, long annotations do not protrude past the line edge – meaning that there will be a gap between the base character and the line edge.

Gaps produced at line end and line start by wide annotations.

Lines can be broken in the middle of a sequence of mono-ruby annotations, since an associated base and annotation are kept together.

Group ruby

Applies when the base is a sequence of characters, mapped to a single annotation. The base can be a sequence of either kanji or other characters, as can be the annotation.

When the annotation is shorter than the base, and the annotation is composed of kana or kanji characters, they are typically spread out with two units of equal spacing between each character and one at either end. The end space should never exceed half the width of a base character.j,#positioning_of_groupruby_with_respect_to_base_characters

When the base is shorter than the annotation, the inverse applies.

顧客/クライアント 模型/モデル
模型 mokei model and 顧客 kokjaku client with katakana group-ruby annotations indicating loan word alternatives.

If the annotation or the base is not kanji or kana, the text is set solid and centred relative to the other component (see fig_latin_ruby).

編集者/editor editor/エデイター
Group-ruby involving non-Japanese text.

編集者 henʃuːʃa editor

Overhang behaviour is the same as described for mono-ruby, as is the handling at line ends when the annotation is longer than the base.

Unlike a sequence of mono-ruby, there is no line-break opportunity inside a group-ruby.


Where compound nouns (jukugo) occur, special rules for arrangement of annotation characters (so-called jukugo-ruby) can make it appear that they are evenly distributed across the word (see the lefthand example in fig_jukugo), but there are rules about how much and what type of overhang are allowed, which sometimes lead to gaps (see the righthand example of fig_jukugo).

橋頭堡/きようとうほ (beachhead) 思春期/ししゅんき (puberty)
Two examples of distributed annotations in jukugo-ruby. On the right, a gap appears in the annotation because of the rules about overhang.

An important feature of jukugo-ruby is that where the full compound noun doesn't fit at the end of a line the base characters wrap one-by-one in the normal way, taking with them the appropriate annotations. The annotation for a single base character is never split across a line break.

It is up to the author whether a word that is actually a sequence or 2 compound nouns is treated as a single jukugo ruby, or as two separate ones.

There are numerous options for overhang and arrangement of jukugo-ruby annotations. They are discussed in detail in JLReq.

Inline ruby

Where text sizes are too small for ruby characters to be easily read, the ruby annotation is typically rendered after the base text, in parentheses.

Inline annotations should normally correspond to full words, even if the sequence of base characters would otherwise be represented using mono-ruby. For example, the inline representation of the word 東京 should be displayed inline as 東京(とうきょう) and not 東(とう)京(きょう)


Warichu is a method of adding notes right alongside the relevant text, used particularly in study guides, travel guides, reference books, encyclopedias and manuals. It is generally only used in vertical text, although it is occasionally used in horizontal text for study guides and encyclopedias.

The note is usually surrounded by parentheses (or rarely just spaces), and the text of the note is half the size of the main text and arranged in two parallel lines. The two parallel lines are usually set with no inter-line spacing.

Two examples of warichu.

The warichu lines should be as close to equal in length as possible, given the normal wrapping rules, and if there is a difference, the initial line (right side) should be the longer.

In the rare event that the warichu text breaks across more than one line (see fig_warichu on the right), both lines of the warichu on the first line of the main text should be read completely before continuing to the remainder of the note. The characters in memory follow the normal reading sequence (and use normal characters, too), but the application needs to rearrange the visual order around the line break.

Other inline features

Other punctuation

CLDR 31 lists the following punctuation characters for Japanese. First the fullwidth forms of normal characters.


Then the halfwidth forms.


And finally, the other punctuation.


The katakana block contains two additional punctuation marks.


is used to separate words when writing non-Japanese phrases.uk,720

is a delimiter occasionally used in analyzed Katakana or Hiragana textual material. 

The hiragana block contains some combining and modifier characters used to represent dakuten and han-dakuten for compatibility with older systems.


The kana blocks each have two marks that are used to indicate repetition of a syllable – one for syllables with unvoiced consonants and another for voiced. The table below shows the hiragana first, then the katakana. In both cases there is a character for repetition of ordinary syllables, and one for repetition of syllables with dakuten.


Unicode also has 3000 for occasions where it is needed.

Line & paragraph layout

Line breaking & hyphenation

Lines are normally wrapped between characters – word boundaries usually have no significance for the wrapping. However, occasionally there is a preference to wrap text at word boundaries, eg. to better balance headings.

Line start/end rules

Kanji characters have the ID line-break property, which means that lines can ordinarily break before and after and between pairs of ideographic characters. Note that this class also includes characters other than Han ideographs.

Kinsoku rules. Japanese should also take into account a few rules (called kinsoku rules) which dictate what characters cannot appear at the end or start of a line. The set of characters affected by these rules varies slightly from application to application, but fig_kinsoku_start and fig_kinsoku_start show examples of the kinds of punctuation involved.

’ ” ) 》 〉 】 〗 〕 ] }
。 . 、 , ・ : ; ? ! ヽ ヾ ゝ ゞ 々 % º

Characters typically not permitted at the start of a line.

‘ “ ( [ 〔 《 〈 【 {
¥ $ £

Characters typically not permitted at the end of a line.

Show (default) line-breaking properties for non-kanji characters in the Japanese orthography described here.

There are a number of ways to handle these characters:

  1. Wrap the previous character to the next line with the punctuation.

  2. Leave the punctuation character protruding into the margin (if there is one).

  3. Ignore the kinsoku rules.


Where a gap appears at the end of a line, full justification is usually restored by adding space across the line (see justification).

Small kana. These kinsoku rules may also be used to prevent small kana characters appearing alone at the start of a line. However, this is much more likely to reflect the preferences of the author. For example, the rule may be ignored in narrow newspaper columns.

Kinsoku rules used to prevent a small kana character appearing alone at the start of a line.


There is no hyphenation at line-breaks for Japanese text.

Text alignment & justification

The preferred arrangement of characters on a line is solid set, ie. each character frame immediately follows the previous one, each with the same width. In principle, in books where the width of the text area on a page is set by counting characters and fixed, paragraphs composed of kanji and kana characters don't need to be justified. Lines break as soon as the line is full of characters, and the whole paragraph has grid lines vertically and horizontally between the characters.

However, a number of factors may introduce a need to introduce justification, from time to time. One such would be punctuation that pulls the last character of the previous line with it to the next line, so that it doesn't begin a line on its own. Another would be web-based text where windows can be stretched, resulting in a situation where the width of a line no longer exactly corresponds to the sum of the width of all the characters on that line. Other situations include lines where proportionally-spaced romaji text breaks the grid effect.

Japanese justifies text using a complex set of rules which adjust the space between characters on a line. Some characters are adjusted before others. Typically in character-based justification, rules are applied to different types of character in successive waves. For example, the algorithm may attempt to reduce the spacing around punctuation first, and only when more adjustment is needed turn to adjusting the spacing between ideographs.

In situations where a set of lines each contains self-contained text, the line content may be stretched to fit the line width, for example in table cells. In this case it is typical to set the first and last characters at the line start and end, respectively, and then apply equal amounts of spacing between all remaining characters. This can result in large gaps, including lines where the two characters are arranged at opposite line ends with nothing between. See fig_distributed_spacing.

Distributed spacing example.
Evenly distributed spaces across a line in a table. Sourcej,#id25

Text spacing

This section looks at ways in which spacing is applied between characters over and above that which is introduced during justification.

Inter-character letter-spacing

Letter-spacing is used to achieve balance between items with large and small numbers of characters, such as headings, running heads, and captions. When expanding text, equal amounts of space are added between the character frames of the item with the smaller number of characters.

Examples of headings where letter spacing has been applied.

Reducing inter-character spacing. Although solid set text is normally best for readability, in large print sizes, such as for magazine headings, it may be desirable to reduce the distance between certain characters. This is typically done by reducing the distance between adjacent letter faces.

Sometimes, text may also be kerned by overlapping the character frames by a regular amount across a whole line.

Spacing around alphabetic or numeric phrases

When a run of romaji or ASCII numerals appears in text, it is often set off from the surrounding kanji/kana letters by a small space.

The amount of spacing can vary. JLREQj,#id209 suggests a ¼em space, but sometimes other spaces may be appropriate, such as ⅙em.gh

A Japanese paragraph where text-spacing has and has not been applied. (source)

Such spacing is not needed when the phrase is followed or preceded by punctuation that already has built in space. It also doesn't appear at the line start/end.

To achieve this in web pages use the text-autospace CSS property in your style sheet; don't use space characters. For full details of the options available see the CSS spec.

By default, the browser should insert a gap automatically between runs of ideographs and runs of both non-ideographic letters/numerals. The size of the gap is dependent on the browser, however the CSS spec suggests 1/8 of the width of an ideographic character.

There are 2 ways in which CSS can add gaps. If the text already contains gaps produced using ordinary space characters the CSS will, by default, only add gaps where there are no spaces. If, on the other hand, you want to reduce the width of the those space-based gaps, or apply even spacing throughout, then use the replace value. text-autospace:ideograph-alpha ideograph-numeric replace; will remove any space characters and replace them with a standard width gap, while also creating gaps where the space character hadn't been used.

To remove all synthesised gaps (but leave any manually-typed space characters in place) use text-autospace:no-autospace.

The other values can be used to tweak the results as follows.

normal (Not yet supported by browsers.)§
Restores the default setting of the browser, ie. it creates a gap between runs of ideographs and runs of non-ideographic letters or numerals., but only inserts a gap where an ordinary space has not been used.
ideograph-alpha or ideograph-numeric (Not yet supported by browsers.)§
These 2 values can be used individually to exclude the other option..
auto (Not yet supported by browsers.)§
Gives over control of autospace behaviour to the browser, assuming that the browser has implemented special rules that differ from the normal case. Note that if you use this value your text may look different from browser to browser.

Most of the time you will probably want to use the following:
text-autospace: ideograph-alpha ideograph-numeric replace;

Spacing around punctuation

Punctuation such as full stop, comma, parentheses, etc. normally has built-in space associated with it because the ink takes up only a part of the em square. However, in some situations, the blank space is not appropriate.

When text is arranged on a strict grid pattern, none of this space removal applies.

Sequences of punctuation. One such example is when multiple punctuation marks appear side by side. fig_text_space_adjacent shows how space can be removed between a fullwidth comma and fullwidth bracket to reduce large blank spaces. It shouldn't be necessary to use halfwidth characters for this; you should use normal characters and the application should remove the appropriate amount of space automatically.

Space has been removed on the left to make the text more readable.

It is not yet possible to control this in web pages, but the CSS Text spec proposes a way forward using the text-spacing CSS property§. The relevant property values are:

trim-adjacent (Not yet supported by browsers.)
Collapses spacing between punctuation glyphs.
space-adjacent (Not yet supported by browsers.)
Keeps the space before fullwidth opening punctuation when not at the start of the line. Keeps the space after fullwidth closing punctuation when not at the end of the line.

Eg. the following collapses spaces between punctuation marks:
text-spacing: trim-adjacent;

Line-initial punctuation. Similarly, space may be removed from punctuation at the start or the end of a line. If we use a bracket as an example, the ink of the bracket should be flush with the line start when that bracket occurs inside a paragraph. Where paragraphs are separated by a blank line, the bracket at the start of the first line should also be flush with the left edge of the text.

It is common, however, to have no blank line before a Japanese paragraph, but instead indent the paragraph's first line. Usually this indent is the width of one fullwidth character. If the line begins with a punctuation such as a bracket, the empty space that usually precedes a fullwidth bracket is still dropped, but the line is set so that the glyph hangs into the indent (which, visually, looks like it is preceded by a half-width space). fig_text_space_para shows examples of this.

Space is removed from a bracket at the start of a line, but not at the beginning of a paragraph. (source)

It is not yet possible to achieve this in web pages, but the CSS Text spec proposes a way forward using the text-spacing CSS property§.

A typical way of setting styling for indented paragraphs would therefore include something like this

p {
  margin: 0;
  text-indent: 1em;
  text-spacing: trim-start;
  hanging-punctuation: first;

The relevant property values are:

text-space: trim-start (Not yet supported by browsers.)
Sets fullwidth opening punctuation flush (ie. removes the leading space from fullwidth glyphs) at the start of each line.
text-space: space-start (Not yet supported by browsers.)
Keeps the space before all fullwidth opening punctuation at the beginning of every line (ie. full-width glyphs).
text-indent: 1em
Indents the first line of a paragraph by 1 character space.
hanging-punctuation: first (Not yet supported by browsers.)
An opening bracket or quote at the start of the first formatted line of an element hangs. This applies to all characters in the Unicode categories Ps, Pf, Pi plus the ASCII quote marks ' U+0027 APOSTROPHE and " U+0022 QUOTATION MARK.

In some cases, the paragraph-start line indentation has been achieved by adding a fullwidth bracket at the start of the paragraph (rather than indentation), while removing the leading space from other brackets in the paragraph. Indentations for lines that don't begin with a bracket-like punctuation will typically use an ideographic space character rather than styling to create the indent (because line indentation doesn't behave differently depending on whether a line starts with a bracket). This approach is not recommended, because it impedes the ability of authors to change behaviour simply through changing the styling, but to provide a workaround for legacy text in this situation, CSS proposes another value:

space-first (Not yet supported by browsers, and not recommended for normal use.)
Behaves as space-start on the first line the block container and each line after a forced line break but as trim-start on all other lines.

Line-final punctuation. It is often useful to remove trailing space from a fullwidth punctuation glyph if it allows that character to fit at the end of a line (rather than wrapping it to the next line).

Again, it is not yet possible to achieve this in web pages, but the CSS Text spec proposes a way forward using the text-spacing CSS property§. The relevant property values are:

allow-end (Not yet supported by browsers.)
Removes trailing space from fullwidth closing punctuation at the end of each line if it does not otherwise fit prior to justification; otherwise set the punctuation with full-width glyphs. .
trim-end (Not yet supported by browsers.)
Removes trailing space from fullwidth closing punctuation at the end of every line.
space-end (Not yet supported by browsers.)
Keeps the space after all fullwidth closing punctuation at the end of every line (ie. full-width glyphs).

Baselines, line height, etc.

The standard baseline for kanji and kana characters is slightly lower than the alphabetic baseline used for Latin characters. Mixed script text needs to align baselines correctly.

fig_baselines shows metrics for the Hiragino Mincho Pro font. In this font the maximum height of the Japanese letters reaches slightly higher than the Latin ascenders, but not as low as the Latin descenders.

Font metrics for text in the Hiragino Mincho Pro font.

Japanese characters have no ascenders or descenders, but occupy the square space described earlier. Some characters use more of the square space than others, as can be seen in fig_baselines.

Counters, lists, etc.

You can experiment with counter styles using the Counter styles converter. Patterns for using these styles in CSS can be found in Ready-made Counter Styles, and we use the names of those patterns here to refer to the various styles.

Japanese text uses a number of different counter styles. Some of the more common include full-width European numbers, which in vertical text stand upright. Unicode has various sets of numbers that can be useful here.

For the dotted-decimal numeric style Unicode provides precomposed characters from 1 to 20.


For the circled-decimal numeric style Unicode provides characters from 1 to 50.


The Japanese orthography also uses kanji or kana characters to create 1 fixed, 4 alphabetic, and 2 additive styles.


The circled-katakana fixed style uses the following letters. The suffix is a space, and the numbers run from 1 to 47.



The alphabetic styles all use as a suffix (with no following space). The iroha style ordering is based on the order of characters in a pangram poem dating from the Heian era (794–1179).

The hiragana alphabetic style uses these 48 hiragana characters in the order in which they are typically arranged.




The hiragana-iroha alphabetic style uses 47 hiragana characters in the order shown just below.




The katakana alphabetic style uses these 48 hiragana characters in the order in which they are typically arranged.




The katakana-iroha alphabetic style uses 47 hiragana characters in the order shown just below.





The Japanese additive styles have a range -9,999 to 9,999 and use kanji characters only. The suffix is , and negative numbers are preceded by マイナス.

The japanese-informal additive style uses these letters.




The japanese-formal additive style uses these letters.




Prefixes and suffixes

The most common suffix is . The fixed styles have no prefix/suffix.


一、 二、 三、 四、 五、
あ、 い、 う、 え、 お、
Separator for Japanese list counters.

Styling initials

Large paragraph-initial characters can easily be found in Japanese content. The character typically fills a box that is the height (or width, in vertically-set text) of 2-4 lines.

An enlarged initial character at the beginning of a paragraph.

Page & book layout

General page layout & progression

Book binding & reading direction

Books, magazines, et cetera, that are vertically set have the front cover on the right, and pages turn to the right as you read.

Horizontal books are bound on the left, and vertical on the right.

Page layout

Rather than specifying margins and then filling the space between with the body of the text, Japanese text areas will usually be defined by specifying the width and height of the text area as a number of characters, and then determining the size of the margins based on what remains of the page size. This is possible because Japanese characters are drawn in square character frames, all the same size.

In fact, the calculations also include an inter-line space. This inter-line space must be set for the whole page at a size that is large enough to accommodate any ruby annotations or other items that may protrude into the line gap. Therefore, the line height doesn't change for individual lines that have ruby annotations.

Defining layout of the text area in this way creates a virtual grid, to which some things snap. For example, headings may be indented by a given number of character spaces, and are centred on a given number of lines in the grid. Page headers and footers may also correspond to aspects of the text area grid for positioning.

Column layout

Columns in vertically set text run horizontally from right to left.

Columns run horizontally in vertically-set text.

The title for this content runs horizontally across the top of the columns. This is a common approach. Note that although the columns are read RTL, the heading is LTR.

Notes, footnotes, etc

See inlinenotes for purely inline annotations, such as ruby or warichu. This section is about annotation systems that separate the reference marks and the content of the notes.

can be used in text to set up a footnote reference, and in the footnotes themselves. It can be followed by a number when there are multiple notes, eg. ※1, ※2, etc.

Wikipedia provides the following example.

Footnote reference mark.

Forms & user interaction

Form controls on Web pages should be rotated 90 degrees clockwise, compared to the form controls for Western languages.9→

The following figures show examples of what is expected. Major browsers don't fully support forms with this orientation at the time of writing.


Text entry form controls.

A select control closed (right) and then open while the user makes a choice (left).


Meter, progress, and button elements (right to left).

Page numbering, running headers, etc

Page headers and footers typically run horizontally on vertically set pages.

Example of a page format in vertical writing mode. Source j,#elements_of_page_formats.

Character lists

Version 13.0 of the Unicode Standard has the following blocks dedicated to the Japanese script (numbers in lists are non-ASCII only):

Apart from ASCII characters, the Japanese orthography described here uses 2,136 characters (and 11 more, used infrequently) from the following Unicode blocks: