Updated 13 August, 2024
This page brings together basic information about the Japanese writing system, which includes the use of Kanji, Hiragana, Katakana and Latin scripts, and its use for the Japanese language. It aims to provide a brief, descriptive summary of the modern, printed orthography and typographic features, and to advise how to write Japanese using Unicode.
Richard Ishida, Japanese (Kanji, Hiragana, Katakana, & Latin) Orthography Notes, 13-Aug-2024, https://r12a.github.io/scripts/jpan/ja
第1条 すべての人間は、生まれながらにして自由であり、かつ、尊厳と権利とについて平等である。人間は、理性と良心とを授けられており、互いに同胞の精神をもって行動しなければならない。
第2条 すべて人は、人種、皮膚の色、性、言語、宗教、政治上その他の意見、国民的もしくは社会的出身、財産、門地その他の地位又はこれに類するいかなる自由による差別をも受けることなく、この宣言に掲げるすべての権利と自由とを享有することができる。 さらに、個人の属する国又は地域が独立国であると、信託統治地域であると、非自治地域であると、又は他のなんらかの主権制限の下にあるとを問わず、その国又は地域の政治上、管轄上又は国際上の地位に基ずくいかなる差別もしてはならない。
The 'Japanese orthography' described here is a mixture of 4 scripts which are all used together for any Japanese text: Han (kanji), Hiragana, Katakana, and Latin. Together they are what is used to write the Japanese language.
日本語
"Kanji characters were introduced to Japan around the 3rd century, it is thought from Korea. Until the 7th or 8th century, the Japanese language was written exclusively in these Chinese characters. Initially these were used phonetically to represent similar-sounding Japanese syllables, regardless of their meaning in written Chinese. However, the process of writing Japanese solely in kanji was laborious; each symbol consisted of a number of strokes and only represented one syllable. Two simplified forms of writing began to emerge around the 7th century. The modern hiragana script developed from a simplified cursive style originally developed by women, who were discouraged from learning kanji, and katakana was developed by Buddhist scholars who wrote only one element of each kanji symbol as a form of shorthand."s
Sources: Scriptsource, Wikipedia
Four scripts are used, mixed together to write Japanese: kanji (han), katakana, hiragana, and latin. Essentially, Japanese writing is a mixture of an ideographic and a syllabic script. Non-latin letters typically represent a spoken syllable. See the table to the right for a brief overview of features for the modern Japanese orthography. The character count reflects a typical set of characters needed for everyday reading and writing: there are thousands more kanji characters that could be added for other purposes.
Text can be written horizontally or vertically. The visual forms of characters don't interact, but rotated and alternative glyph forms are needed to enable the switch between directions.
Words are not separated by spaces or any other character. There is no case distinction. The visual forms of characters don't interact.
Kanji characters are mostly derived from the Chinese Han script. They are used for word roots.
The term kana covers two syllabaries that are used with kanji characters (see Han) to write Japanese. See the table to the right for a brief overview of features, taken from the Script Comparison Table.
One syllabary is hiragana, the other katakana. In both cases, the repertoire includes 5 independent vowel sounds, one nasal sound, and the rest are consonant+vowel combinations. There are a small number of additional characters with particular functions, such a katakana lengthening mark, and a few small characters for representing medial glides.
The Latin (romaji) characters and much of the punctuation corresponding to the ASCII range is available in fullwidth sizes that match the dimensions of the kanji and kana.
See: lists
See: lists
These are sounds for the standard Japanese language.
Click on the sounds to reveal locations in this document where they are mentioned.
Phones in a lighter colour are non-native or allophones.
labial | alveolar | post- alveolar |
palatal | velar | uvular | glottal | |
---|---|---|---|---|---|---|---|
stop | p b | t d | k ɡ | ||||
affricate | t͡s d͡z | t͡ɕ d͡ʑ | |||||
fricative | ɸ | s z | ɕ ʑ | ç | h | ||
nasal | m | n | ɲ | ŋ | ɴ | ||
approximant | w | j | |||||
trill/flap | r | ||||||
Japanese is not a tonal language.
The Japanese language, unlike many neighouring languages, uses polysyllabic words, and has no tones. It is an agglutinative language, and doesn't use spaces or other characters to separate words.
Wikipediaw,#Sentences,_phrases_and_words has a nicely written summary of the structural characteristics:
Text (文章 bunshō) is composed of sentences (文 bun), which are in turn composed of phrases (文節 bunsetsu), which are its smallest coherent components. Like Chinese and classical Korean, written Japanese does not typically demarcate words with spaces; its agglutinative nature further makes the concept of a word rather different from words in English. The reader identifies word divisions by semantic cues and a knowledge of phrase structure. Phrases have a single meaning-bearing word, followed by a string of suffixes, auxiliary verbs and particles to modify its meaning and designate its grammatical role. In the following example, phrases are indicated by vertical bars:
- 太陽が|東の|空に|昇る。
- taiyō ga | higashi no | sora ni | noboru
- sun SUBJECT | east POSSESSIVE | sky LOCATIVE | rise
- The sun rises in the eastern sky.
Romanised Japanese text may add spaces between bunsetsu phrases, with hyphens separating suffixes (eg. higashi-no), or may also separate the suffixes using spaces (ie. higashi no).
It is common for Japanese words to repeat morphemes, such as mukashi mukashi once upon a time. Often, this introduces a feature called rendaku, whereby the initial consonant of the repeated sound changes, such as in hitobito people.
Kanji characters are mostly derived from the Chinese Han script. They are commonly used for word roots and compound words.
Kanji characters are primarily constructed from characters that each represent a phonetic symbol. Some have pictographic origins that are still evident, whereas others have a more complicated structure.
In reforms in the mid 20th century, the Japanese repertoire was standardised on around 2,000 core characters, however standardised computer character sets support a few thousand more.
The Jōyō kanji character set (常用漢字) is intended as a literacy baseline for those who have completed compulsory education, as well as a list of permitted characters and readings for use in official government documentswjy.
A second list, called Jinmeiyō kanji (人名用漢字), is a supplementary list of 863 characters that can legally be used in registered personal names in Japan.wj
The number of characters in these lists changes from time to time. The Wikipedia articles for Jōyō and Jinmeiyō kanji provide useful timelines indicating changes over the years.
In addition to the basic lists, there are a number of variants and traditional forms that need to be considered. Note also that the jōyō character set only provides a baseline for the educational process. School leavers still have more characters to learn to achieve a working competency.
Jōyō kanji | 亜 哀 挨 愛 曖 悪 握 圧 扱 宛 嵐 安 案 暗 以 衣 位 囲 医 依 委 威 為 畏 胃 尉 異 移 萎 偉 椅 彙 意 違 維 慰 遺 緯 域 育 一 壱 逸 茨 芋 引 印 因 咽 姻 員 院 淫 陰 飲 隠 韻 右 宇 羽 雨 唄 鬱 畝 浦 運 雲 永 泳 英 映 栄 営 詠 影 鋭 衛 易 疫 益 液 駅 悦 越 謁 閲 円 延 沿 炎 怨 宴 媛 援 園 煙 猿 遠 鉛 塩 演 縁 艶 汚 王 凹 央 応 往 押 旺 欧 殴 桜 翁 奥 横 岡 屋 億 憶 臆 虞 乙 俺 卸 音 恩 温 穏 下 化 火 加 可 仮 何 花 佳 価 果 河 苛 科 架 夏 家 荷 華 菓 貨 渦 過 嫁 暇 禍 靴 寡 歌 箇 稼 課 蚊 牙 瓦 我 画 芽 賀 雅 餓 介 回 灰 会 快 戒 改 怪 拐 悔 海 界 皆 械 絵 開 階 塊 楷 解 潰 壊 懐 諧 貝 外 劾 害 崖 涯 街 慨 蓋 該 概 骸 垣 柿 各 角 拡 革 格 核 殻 郭 覚 較 隔 閣 確 獲 嚇 穫 学 岳 楽 額 顎 掛 潟 括 活 喝 渇 割 葛 滑 褐 轄 且 株 釜 鎌 刈 干 刊 甘 汗 缶 完 肝 官 冠 巻 看 陥 乾 勘 患 貫 寒 喚 堪 換 敢 棺 款 間 閑 勧 寛 幹 感 漢 慣 管 関 歓 監 緩 憾 還 館 環 簡 観 韓 艦 鑑 丸 含 岸 岩 玩 眼 頑 顔 願 企 伎 危 机 気 岐 希 忌 汽 奇 祈 季 紀 軌 既 記 起 飢 鬼 帰 基 寄 規 亀 喜 幾 揮 期 棋 貴 棄 毀 旗 器 畿 輝 機 騎 技 宜 偽 欺 義 疑 儀 戯 擬 犠 議 菊 吉 喫 詰 却 客 脚 逆 虐 九 久 及 弓 丘 旧 休 吸 朽 臼 求 究 泣 急 級 糾 宮 救 球 給 嗅 窮 牛 去 巨 居 拒 拠 挙 虚 許 距 魚 御 漁 凶 共 叫 狂 京 享 供 協 況 峡 挟 狭 恐 恭 胸 脅 強 教 郷 境 橋 矯 鏡 競 響 驚 仰 暁 業 凝 曲 局 極 玉 巾 斤 均 近 金 菌 勤 琴 筋 僅 禁 緊 錦 謹 襟 吟 銀 区 句 苦 駆 具 惧 愚 空 偶 遇 隅 串 屈 掘 窟 熊 繰 君 訓 勲 薫 軍 郡 群 兄 刑 形 系 径 茎 係 型 契 計 恵 啓 掲 渓 経 蛍 敬 景 軽 傾 携 継 詣 慶 憬 稽 憩 警 鶏 芸 迎 鯨 隙 劇 撃 激 桁 欠 穴 血 決 結 傑 潔 月 犬 件 見 券 肩 建 研 県 倹 兼 剣 拳 軒 健 険 圏 堅 検 嫌 献 絹 遣 権 憲 賢 謙 鍵 繭 顕 験 懸 元 幻 玄 言 弦 限 原 現 舷 減 源 厳 己 戸 古 呼 固 股 虎 孤 弧 故 枯 個 庫 湖 雇 誇 鼓 錮 顧 五 互 午 呉 後 娯 悟 碁 語 誤 護 口 工 公 勾 孔 功 巧 広 甲 交 光 向 后 好 江 考 行 坑 孝 抗 攻 更 効 幸 拘 肯 侯 厚 恒 洪 皇 紅 荒 郊 香 候 校 耕 航 貢 降 高 康 控 梗 黄 喉 慌 港 硬 絞 項 溝 鉱 構 綱 酵 稿 興 衡 鋼 講 購 乞 号 合 拷 剛 傲 豪 克 告 谷 刻 国 黒 穀 酷 獄 骨 駒 込 頃 今 困 昆 恨 根 婚 混 痕 紺 魂 墾 懇 左 佐 沙 査 砂 唆 差 詐 鎖 座 挫 才 再 災 妻 采 砕 宰 栽 彩 採 済 祭 斎 細 菜 最 裁 債 催 塞 歳 載 際 埼 在 材 剤 財 罪 崎 作 削 昨 柵 索 策 酢 搾 錯 咲 冊 札 刷 刹 拶 殺 察 撮 擦 雑 皿 三 山 参 桟 蚕 惨 産 傘 散 算 酸 賛 残 斬 暫 士 子 支 止 氏 仕 史 司 四 市 矢 旨 死 糸 至 伺 志 私 使 刺 始 姉 枝 祉 肢 姿 思 指 施 師 恣 紙 脂 視 紫 詞 歯 嗣 試 詩 資 飼 誌 雌 摯 賜 諮 示 字 寺 次 耳 自 似 児 事 侍 治 持 時 滋 慈 辞 磁 餌 璽 鹿 式 識 軸 七 𠮟 失 室 疾 執 湿 嫉 漆 質 実 芝 写 社 車 舎 者 射 捨 赦 斜 煮 遮 謝 邪 蛇 尺 借 酌 釈 爵 若 弱 寂 手 主 守 朱 取 狩 首 殊 珠 酒 腫 種 趣 寿 受 呪 授 需 儒 樹 収 囚 州 舟 秀 周 宗 拾 秋 臭 修 袖 終 羞 習 週 就 衆 集 愁 酬 醜 蹴 襲 十 汁 充 住 柔 重 従 渋 銃 獣 縦 叔 祝 宿 淑 粛 縮 塾 熟 出 述 術 俊 春 瞬 旬 巡 盾 准 殉 純 循 順 準 潤 遵 処 初 所 書 庶 暑 署 緒 諸 女 如 助 序 叙 徐 除 小 升 少 召 匠 床 抄 肖 尚 招 承 昇 松 沼 昭 宵 将 消 症 祥 称 笑 唱 商 渉 章 紹 訟 勝 掌 晶 焼 焦 硝 粧 詔 証 象 傷 奨 照 詳 彰 障 憧 衝 賞 償 礁 鐘 上 丈 冗 条 状 乗 城 浄 剰 常 情 場 畳 蒸 縄 壌 嬢 錠 譲 醸 色 拭 食 植 殖 飾 触 嘱 織 職 辱 尻 心 申 伸 臣 芯 身 辛 侵 信 津 神 唇 娠 振 浸 真 針 深 紳 進 森 診 寝 慎 新 審 震 薪 親 人 刃 仁 尽 迅 甚 陣 尋 腎 須 図 水 吹 垂 炊 帥 粋 衰 推 酔 遂 睡 穂 随 髄 枢 崇 数 据 杉 裾 寸 瀬 是 井 世 正 生 成 西 声 制 姓 征 性 青 斉 政 星 牲 省 凄 逝 清 盛 婿 晴 勢 聖 誠 精 製 誓 静 請 整 醒 税 夕 斥 石 赤 昔 析 席 脊 隻 惜 戚 責 跡 積 績 籍 切 折 拙 窃 接 設 雪 摂 節 説 舌 絶 千 川 仙 占 先 宣 専 泉 浅 洗 染 扇 栓 旋 船 戦 煎 羨 腺 詮 践 箋 銭 潜 線 遷 選 薦 繊 鮮 全 前 善 然 禅 漸 膳 繕 狙 阻 祖 租 素 措 粗 組 疎 訴 塑 遡 礎 双 壮 早 争 走 奏 相 荘 草 送 倉 捜 挿 桑 巣 掃 曹 曽 爽 窓 創 喪 痩 葬 装 僧 想 層 総 遭 槽 踪 操 燥 霜 騒 藻 造 像 増 憎 蔵 贈 臓 即 束 足 促 則 息 捉 速 側 測 俗 族 属 賊 続 卒 率 存 村 孫 尊 損 遜 他 多 汰 打 妥 唾 堕 惰 駄 太 対 体 耐 待 怠 胎 退 帯 泰 堆 袋 逮 替 貸 隊 滞 態 戴 大 代 台 第 題 滝 宅 択 沢 卓 拓 託 濯 諾 濁 但 達 脱 奪 棚 誰 丹 旦 担 単 炭 胆 探 淡 短 嘆 端 綻 誕 鍛 団 男 段 断 弾 暖 談 壇 地 池 知 値 恥 致 遅 痴 稚 置 緻 竹 畜 逐 蓄 築 秩 窒 茶 着 嫡 中 仲 虫 沖 宙 忠 抽 注 昼 柱 衷 酎 鋳 駐 著 貯 丁 弔 庁 兆 町 長 挑 帳 張 彫 眺 釣 頂 鳥 朝 貼 超 腸 跳 徴 嘲 潮 澄 調 聴 懲 直 勅 捗 沈 珍 朕 陳 賃 鎮 追 椎 墜 通 痛 塚 漬 坪 爪 鶴 低 呈 廷 弟 定 底 抵 邸 亭 貞 帝 訂 庭 逓 停 偵 堤 提 程 艇 締 諦 泥 的 笛 摘 滴 適 敵 溺 迭 哲 鉄 徹 撤 天 典 店 点 展 添 転 塡 田 伝 殿 電 斗 吐 妬 徒 途 都 渡 塗 賭 土 奴 努 度 怒 刀 冬 灯 当 投 豆 東 到 逃 倒 凍 唐 島 桃 討 透 党 悼 盗 陶 塔 搭 棟 湯 痘 登 答 等 筒 統 稲 踏 糖 頭 謄 藤 闘 騰 同 洞 胴 動 堂 童 道 働 銅 導 瞳 峠 匿 特 得 督 徳 篤 毒 独 読 栃 凸 突 届 屯 豚 頓 貪 鈍 曇 丼 那 奈 内 梨 謎 鍋 南 軟 難 二 尼 弐 匂 肉 虹 日 入 乳 尿 任 妊 忍 認 寧 熱 年 念 捻 粘 燃 悩 納 能 脳 農 濃 把 波 派 破 覇 馬 婆 罵 拝 杯 背 肺 俳 配 排 敗 廃 輩 売 倍 梅 培 陪 媒 買 賠 白 伯 拍 泊 迫 剝 舶 博 薄 麦 漠 縛 爆 箱 箸 畑 肌 八 鉢 発 髪 伐 抜 罰 閥 反 半 氾 犯 帆 汎 伴 判 坂 阪 板 版 班 畔 般 販 斑 飯 搬 煩 頒 範 繁 藩 晩 番 蛮 盤 比 皮 妃 否 批 彼 披 肥 非 卑 飛 疲 秘 被 悲 扉 費 碑 罷 避 尾 眉 美 備 微 鼻 膝 肘 匹 必 泌 筆 姫 百 氷 表 俵 票 評 漂 標 苗 秒 病 描 猫 品 浜 貧 賓 頻 敏 瓶 不 夫 父 付 布 扶 府 怖 阜 附 訃 負 赴 浮 婦 符 富 普 腐 敷 膚 賦 譜 侮 武 部 舞 封 風 伏 服 副 幅 復 福 腹 複 覆 払 沸 仏 物 粉 紛 雰 噴 墳 憤 奮 分 文 聞 丙 平 兵 併 並 柄 陛 閉 塀 幣 弊 蔽 餅 米 壁 璧 癖 別 蔑 片 辺 返 変 偏 遍 編 弁 辛 便 勉 歩 保 哺 捕 補 舗 母 募 墓 慕 暮 簿 方 包 芳 邦 奉 宝 抱 放 法 泡 胞 俸 倣 峰 砲 崩 訪 報 蜂 豊 飽 褒 縫 亡 乏 忙 坊 妨 忘 防 房 肪 某 冒 剖 紡 望 傍 帽 棒 貿 貌 暴 膨 謀 頰 北 木 朴 牧 睦 僕 墨 撲 没 勃 堀 本 奔 翻 凡 盆 麻 摩 磨 魔 毎 妹 枚 昧 埋 幕 膜 枕 又 末 抹 万 満 慢 漫 未 味 魅 岬 密 蜜 脈 妙 民 眠 矛 務 無 夢 霧 娘 名 命 明 迷 冥 盟 銘 鳴 滅 免 面 綿 麺 茂 模 毛 妄 盲 耗 猛 網 目 黙 門 紋 問 冶 夜 野 弥 厄 役 約 訳 薬 躍 闇 由 油 喩 愉 諭 輸 癒 唯 友 有 勇 幽 悠 郵 湧 猶 裕 遊 雄 誘 憂 融 優 与 予 余 誉 預 幼 用 羊 妖 洋 要 容 庸 揚 揺 葉 陽 溶 腰 様 瘍 踊 窯 養 擁 謡 曜 抑 沃 浴 欲 翌 翼 拉 裸 羅 来 雷 頼 絡 落 酪 辣 乱 卵 覧 濫 藍 欄 吏 利 里 理 痢 裏 履 璃 離 陸 立 律 慄 略 柳 流 留 竜 粒 隆 硫 侶 旅 虜 慮 了 両 良 料 涼 猟 陵 量 僚 領 寮 療 瞭 糧 力 緑 林 厘 倫 輪 隣 臨 瑠 涙 累 塁 類 令 礼 冷 励 戻 例 鈴 零 霊 隷 齢 麗 暦 歴 列 劣 烈 裂 恋 連 廉 練 錬 呂 炉 賂 路 露 老 労 弄 郎 朗 浪 廊 楼 漏 籠 六 録 麓 論 和 話 賄 脇 惑 枠 湾 腕 | 2,138 |
---|---|---|
Alternates | 剥 叱 填 頬 | 4 |
Jōyō traditional variant forms (not including 61 compatibility forms that normalise to other characters) | 亞 惡 壓 圍 醫 爲 壹 隱 榮 營 衞 驛 圓 鹽 緣 艷 應 歐 毆 櫻 奧 橫 溫 穩 假 價 畫 會 繪 壞 懷 槪 擴 殼 覺 學 嶽 樂 渴 罐 卷 陷 勸 寬 關 歡 觀 氣 歸 龜 僞 戲 犧 舊 據 擧 虛 峽 挾 狹 鄕 曉 區 驅 勳 薰 徑 莖 惠 揭 溪 經 螢 輕 繼 鷄 藝 擊 缺 硏 縣 儉 劍 險 圈 檢 獻 權 顯 驗 嚴 廣 效 恆 黃 鑛 號 國 黑 碎 濟 齋 劑 雜 參 棧 蠶 慘 贊 殘 絲 齒 兒 辭 濕 實 寫 舍 釋 壽 收 從 澁 獸 縱 肅 處 緖 敍 將 稱 涉 燒 證 奬 條 狀 乘 淨 剩 疊 繩 壤 孃 讓 釀 觸 囑 眞 寢 愼 盡 圖 粹 醉 穗 隨 髓 樞 數 瀨 聲 齊 靜 竊 攝 絕 專 淺 戰 踐 錢 潛 纖 禪 雙 壯 爭 莊 搜 插 巢 曾 瘦 裝 總 騷 增 藏 臟 卽 屬 續 墮 對 體 帶 滯 臺 瀧 擇 澤 擔 單 膽 團 斷 彈 遲 癡 蟲 晝 鑄 廳 徵 聽 敕 鎭 遞 鐵 點 轉 傳 燈 當 黨 盜 稻 鬭 德 獨 讀 屆 貳 惱 腦 霸 拜 廢 賣 麥 發 髮 拔 晚 蠻 祕 濱 甁 拂 佛 倂 竝 餠 邊 變 辨 瓣 辯 步 寶 豐 襃 沒 飜 每 萬 滿 麵 默 彌 譯 藥 與 豫 餘 譽 搖 樣 謠 來 賴 亂 覽 龍 兩 獵 綠 淚 壘 禮 勵 戾 靈 齡 曆 歷 戀 鍊 爐 勞 郞 樓 錄 灣 | 305 |
The compatibility forms | 逸 謁 禍 悔 海 慨 喝 褐 漢 祈 既 器 響 勤 謹 穀 殺 祉 視 者 煮 臭 祝 暑 署 諸 祥 神 節 祖 僧 層 贈 贈 嘆 著 懲 塚 都 突 難 梅 繁 卑 碑 賓 頻 敏 侮 福 塀 勉 墨 免 欄 隆 虜 類 練 朗 廊 | 61 |
Jinmeiyō kanji | 丑 丞 乃 之 乎 也 云 亘 些 亦 亥 亨 亮 仔 伊 伍 伽 佃 佑 伶 侃 侑 俄 俠 俣 俐 倭 俱 倦 倖 偲 傭 儲 允 兎 兜 其 冴 凌 凜 凧 凪 凰 凱 函 劉 劫 勁 勺 勿 匁 匡 廿 卜 卯 卿 厨 厩 叉 叡 叢 叶 只 吾 吞 吻 哉 哨 啄 哩 喬 喧 喰 喋 嘩 嘉 嘗 噌 噂 圃 圭 坐 尭 坦 埴 堰 堺 堵 塙 壕 壬 夷 奄 奎 套 娃 姪 姥 娩 嬉 孟 宏 宋 宕 宥 寅 寓 寵 尖 尤 屑 峨 峻 崚 嵯 嵩 嶺 巌 巫 已 巳 巴 巷 巽 帖 幌 幡 庄 庇 庚 庵 廟 廻 弘 弛 彗 彦 彪 彬 徠 忽 怜 恢 恰 恕 悌 惟 惚 悉 惇 惹 惺 惣 慧 憐 戊 或 戟 托 按 挺 挽 掬 捲 捷 捺 捧 掠 揃 摑 摺 撒 撰 撞 播 撫 擢 孜 敦 斐 斡 斧 斯 於 旭 昂 昊 昏 昌 昴 晏 晃 晒 晋 晟 晦 晨 智 暉 暢 曙 曝 曳 朋 朔 杏 杖 杜 李 杭 杵 杷 枇 柑 柴 柘 柊 柏 柾 柚 桧 栞 桔 桂 栖 桐 栗 梧 梓 梢 梛 梯 桶 梶 椛 梁 棲 椋 椀 楯 楚 楕 椿 楠 楓 椰 楢 楊 榎 樺 榊 榛 槙 槍 槌 樫 槻 樟 樋 橘 樽 橙 檎 檀 櫂 櫛 櫓 欣 欽 歎 此 殆 毅 毘 毬 汀 汝 汐 汲 沌 沓 沫 洸 洲 洵 洛 浩 浬 淵 淳 渚 淀 淋 渥 渾 湘 湊 湛 溢 滉 溜 漱 漕 漣 澪 濡 瀕 灘 灸 灼 烏 焰 焚 煌 煤 煉 熙 燕 燎 燦 燭 燿 爾 牒 牟 牡 牽 犀 狼 猪 獅 玖 珂 珈 珊 珀 玲 琢 琉 瑛 琥 琶 琵 琳 瑚 瑞 瑶 瑳 瓜 瓢 甥 甫 畠 畢 疋 疏 皐 皓 眸 瞥 矩 砦 砥 砧 硯 碓 碗 碩 碧 磐 磯 祇 祢 祐 祷 禄 禎 禽 禾 秦 秤 稀 稔 稟 稜 穣 穹 穿 窄 窪 窺 竣 竪 竺 竿 笈 笹 笙 笠 筈 筑 箕 箔 篇 篠 簞 簾 籾 粥 粟 糊 紘 紗 紐 絃 紬 絆 絢 綺 綜 綴 緋 綾 綸 縞 徽 繫 繡 纂 纏 羚 翔 翠 耀 而 耶 耽 聡 肇 肋 肴 胤 胡 脩 腔 脹 膏 臥 舜 舵 芥 芹 芭 芙 芦 苑 茄 苔 苺 茅 茉 茸 茜 莞 荻 莫 莉 菅 菫 菖 萄 菩 萌 萊 菱 葦 葵 萱 葺 萩 董 葡 蓑 蒔 蒐 蒼 蒲 蒙 蓉 蓮 蔭 蔣 蔦 蓬 蔓 蕎 蕨 蕉 蕃 蕪 薙 蕾 蕗 藁 薩 蘇 蘭 蝦 蝶 螺 蟬 蟹 蠟 衿 袈 袴 裡 裟 裳 襖 訊 訣 註 詢 詫 誼 諏 諄 諒 謂 諺 讃 豹 貰 賑 赳 跨 蹄 蹟 輔 輯 輿 轟 辰 辻 迂 迄 辿 迪 迦 這 逞 逗 逢 遥 遁 遼 邑 祁 郁 鄭 酉 醇 醐 醍 醬 釉 釘 釧 銑 鋒 鋸 錘 錐 錆 錫 鍬 鎧 閃 閏 閤 阿 陀 隈 隼 雀 雁 雛 雫 霞 靖 鞄 鞍 鞘 鞠 鞭 頁 頌 頗 顚 颯 饗 馨 馴 馳 駕 駿 驍 魁 魯 鮎 鯉 鯛 鰯 鱒 鱗 鳩 鳶 鳳 鴨 鴻 鵜 鵬 鷗 鷲 鷺 鷹 麒 麟 麿 黎 黛 鼎 | 633 |
Jinmeiyō variants | 亙 凛 巖 堯 晄 檜 槇 渚 猪 琢 禰 祐 禱 祿 禎 穰 萠 遙 | 18 |
The Jōyō traditional forms include 60 kanji shapes that Unicode includes in the CJK Compatibility Ideographs block. Normalisation operations (which in some systems may happen automatically, or during things such as cut & paste) convert them to characters in the main CJK block. This makes them unstable, and best avoided. The following list shows the compatibility character shape to the left, and the normalised shape to the right.
Japanese uses two syllabaries: hiragana and katakana. The vowel sounds u and i are often elided between non-voiced consonants, or at the end of a word.
Katakana characters are typically used for foreign loan words and names, such as the word 'text'. They are also used for things such as scientific names of plants and animals, onomatopoeic sounds, telegrams, and some female names.
Hiragana is used for indigenous Japanese words, such as the verb 'to be'.
It is also used for grammatical endings after a word root written using kanji characters.
The basic syllabary includes 5 independent vowel sounds, one nasal sound, and the rest are consonant+vowel combinations. In these lists we show hiragana (first) and katakana (second) together.
Voiced consonants are indicated by attaching a dakuten mark (looks like a quote mark) to the unvoiced shape. Unicode provides precomposed code points for every combination of syllable+dakuten.
The ‘p’ sound is indicated in a similar way by the use of a han-dakuten (half-dakuten).
The Unicode hiragana block does contain separate code points for dakuten combining marks and modifiers, but these are not normally used in text. However, if Unicode NFD normalisation is applied to text, the dakuten and han-dakuten are split from the base and the combining marks are used.
Various strategies are used to represent long vowels, and they tend to differ between hiragana and katakana. This elongation is phonemically significant.
In hiragana, the long vowels aː, iː, uː, and eː are written by adding a corresponding vowel.
おかあさん
おにいさん
すうがく
おねえさん
In words of Chinese origin eː may be written 'ei'.
ていねい
The long oː is usually written 'ou', but is sometimes written 'oo'.
おはよう
おおきい
In katakana, long vowels are indicated using ー. This character is used predominantly with katakana, but occasionally also with hiragana.uk,720
ビール
ボール
エスカレーター
In a few exceptions, katakana uses a similar approach to hiragana.
スペイン
The more common grammatical particles are spelled in an idiosynchratic way. The topic marker wa is written using は. The object marker o is written using を. And the location marker e is written using へ.
The basic set of kana syllables is completed by a number of small forms used for medial glides, foreign sounds, and gemination, and a vowel lengthener.
Small versions of や, ゆ, and よ are used to form syllables such as きゃ kya kʲa きゅ kya kʲu きょ kyo kʲo
っ and ッ are used to lengthen a following consonant sound.
ちょっと
It is also used to represent a glottal stop in a broken-off word.
あっ
The small vowel syllables shown above are typically used for transcribing unusual sounds, such as lengthening a preceding vowel, or transliterating foreign sounds, without creating a new syllable.
ふぁん
シフォン
ティー
はぁぁ
Over time, certain voiced sounds have merged in several important dialects, as shown in fig_yotsugana.
ぢ | じ | づ | ず | |
---|---|---|---|---|
Tokyo (standard) | d͡ʑi~ʑi | d͡zɯᵝ~zɯᵝ | ||
South Tohoku | d͡zɯᵝ | |||
Kōchi (Hata, Tosa) | di~d͡zi | ʑi | dɯᵝ~d͡zɯᵝ | zɯᵝ |
Kagoshima | d͡ʑi | ʑi | d͡zɯᵝ | zɯᵝ |
Okinawa | d͡ʑi |
The orthographic reform shortly after World War 2 recommended the use of only じ and ず, except in circumstances where an unvoiced sound has become voiced because of:wy
A number of characters in the kana blocks are no longer used in modern text, except in counter styles (see lists).
These characters were dropped by an orthographic reform shortly after World War 2.
Unicode has a set of halfwidth katakana forms for legacy encoding roundtrips. In principle, these characters should not be used. The normal, fullsized characters should be used instead.
Text can be written horizontally, left to right, or vertically with lines progressing from right to left. Vertically set text is still common in Japan; most novels, newspapers and magazines are set vertically.j,#h-note-15
Sometimes, vertically set text may contain sections or items that are set horizontally. For example, in newspapers, headings are normally set horizontally above the body of an article which is set vertically, and captions are usually horizontal.
fig_mixed_direction shows pages from a magazine that mix directions on the page.
Older horizontally set texts in Japanese also ran right to left.
Different conventions are applied for horizontal and vertical text, for example in terms of characters used and treatment of embedded romaji and numerals. Apart from the question of what gets rotated and what does not, the two writing modes may show different preferences for emphasis marks, brackets, numbers, and so forth. This means that it is not usually appropriate to simply switch the direction of the text without making additional changes.
In vertical text (only) decisions have to be made about how to present embedded romaji text and numbers. Romaji typically runs down the page, with proportionally-spaced characters rotated 90º to the right. However, acronyms are often written using upright, fullwidth characters.
Numbers, and sometimes text, may also run horizontally within a vertical line. This is most common with double-digit numbers, such as in dates. The width of the horizontal text should not normally exceed the width of the surrounding vertical text (ie. it should fit in the width of a character space). This is referred to as tate chu yoko.
Experiment with examples using the Japanese character app.
The Japanese scripts are not cursive, and when using precomposed kana (which is the norm) involve no context-based shaping or positioning.
The orthography has no case distinction.
By default, all kanji, hiragana, katakana, and punctuation characters are drawn inside a character frame that is square and the same size for all characters. The box containing the actual symbol is called the letter face, and there should be some space left between the letter face and the character frame. There may be variations, particularly for small kana, punctuation, etc., in the size of the letter face.
Because of the regularity of the character frame size, it can be used to measure the size of the text area or other parts of a page (horizontally or vertically).
In principle, Japanese characters are set solid, ie. with no space between the character frames. However, text alignment and justification can make adjustments to the placement of characters in the direction of the line flow. See justification and letterspace.
The kanji characters are derived from Han characters originally used in Chinese. Many of the Japanese and Chinese characters are unified to the same code point in the Unicode repertoire, however over time small but systematic, language-related changes have appeared in the glyph shapes of some characters compared to their Chinese equivalents. It is important to choose fonts that present the user with the correct glyphs. fig_ja_zh_fonts provides some examples.
Besides the need to choose fallback fonts that match the language of the text, Japanese also has some recognisable font styles. Two well-known font styles are often called Mincho and Gothic. The former has strokes with fine gradations of stroke width, whereas the latter has darker strokes with little gradation. For fallback on the Web, these styles are usually equated with serif and sans-serif, respectively, although serifs are not actually involved.
Another useful type of font style relates to the endings of Gothic font strokes, which can be flat or rounded.
Horizontal vs. vertical transformation. Characters such as small kana and punctuation occupy different locations within the character frame in horizontal and vertical text.
fig_small_kana shows how in horizontal text small kana are centred horizontally in the character frame but are vertically below centre; in vertical text they are centred vertically, but aligned right.
The full stop also switches from bottom-left in horizontal text, to top-right in vertical.
These are differences that cannot be produced by rotating glyphs, but require special glyphs in the font which are applied when the directional context is detected.
Positioning of decomposed diacritics. When kana use a dakuten or han-dakuten there can be significant overlap with the base character. See fig_kerning_nfd.
This overlap needs to be unchanged whether the diacritics are part of a single glyph or are separate code points in decomposed text. In the latter case, careful positioning of the diacritics is required.
Shaping of punctuation. Many punctuation marks need to have different shapes for Japanese and non-Japanese text. Often these differences are due to the fact that punctuation for Japanese is based on the em-box, rather than the Latin baseline, cap-height, or x-height. A description of many such differences can be found in Ken Lunde's Proposal to add standardized variation sequences.
Japanese kanji and kana is a monocameral orthography, and no transforms are needed to convert between different case forms for a given letter. However, romaji characters are cased.
Other transforms may be applied to convert between half-width and full-width characters. This can be useful for converting to and from fullwidth Latin and punctuation, and is sometimes useful for converting small kana characters to full-sized versions.
The latter transformation is common for ruby text (see inlinenotes), where small kana are converted visually to full-sized to aid with readability of the text, given that ruby text is written in small character sizes.
To achieve this in web pages use the text-transform
CSS property@CSS Text specification,https://www.w3.org/TR/css-text-3/#transforming in your style sheet with the following values.
Eg. the following converts the visual appearance (only) of small kana in ruby text to fullwidth characters, except in headings (where the characters are larger):rt { font-size: 50%; text-transform: full-size-kana; }
:is(h1, h2, h3, h4) rt { text-transform: none; /* unset for large text*/ }
Japanese rarely uses spaces. In the sample text there are gaps around punctuation, but these are produced by a lack of 'ink' in parts of the square character glyphs.
You can verify this by clicking on this example. The character list popup shows that only four characters make up this sequence, and none are spaces.
い。(こ
Gaps of this kind may also be reduced during justification and line alignment.
In general, word boundaries are not important for line-wrapping, however occasionally text such as headings may be wrapped at word boundaries in order to better balance the text.
Word boundaries are identified when users select screen text, eg. by double-clicking inside a word. Heuristics and dictionaries are needed to identify the boundaries of words in such situations. Note, also, that words in Japanese are very often a mixture of kanji characters followed by hiragana. The word boundary detection needs to treat the various scripts as a unified orthography.
Since there are no combining marks or decompositions in typical Japanese text, graphemes correspond to individual characters for kanji and kana.
The only 2 combining characters listed in this page are 3099 and 309A (which are rarely used), and are not used together. Therefore they simply follow the Unicode rule of combining characters following the base character they are attached to.
Unicode grapheme clusters can therefore be applied to Japanese text without problems. There are no special issues related to operations that use grapheme clusters as their basic unit of text.
The ordering of codepoints in a Japanese grapheme is generally not relevant, because graphemes are usually single, syllablic code points. When combining characters are used, there is usually just one.
Japanese uses the following separators at the sentence level and below.j,#differences_in_vertical_and_horizontal_composition_in_use_of_punctuation_marks Some of the punctuation looks like that for Latin (eg. parentheses, commas, and full stops), but the width of the punctuation is likely to include significant amounts of white space, so that punctuation characters occupy the same space as han characters.
H | V | ||
---|---|---|---|
phrase | , | ||
、 | |||
: | |||
; | |||
sentence | 。 | ||
. | |||
exclamation | ! | ||
question | ? |
、 and 。 are the norm for vertical text, however two alternative conventions as applied to horizontal text: especially in books that mix Japanese and western text, such as books on science and technology, the former may be replaced by , and .. Often, however, the ideographic full stop is retained, since it is more visible and looks better (this convention has been adopted for Japanese official publications).j,#differences_in_vertical_and_horizontal_composition_in_use_of_punctuation_marks
As the table shows, these punctuation marks require dedicated glyphs in the font, and cannot be achieved by simply rotating the glyph.
Japanese also uses the following doubled exclamation/question marks. They remain upright in vertical text.
‼ |
⁇ |
⁈ |
⁉ |
Other punctuation used to separate phrases or items includes:
H | V | |
---|---|---|
⸺ | ||
—— |
If EM DASH characters are used, they are used in pairs.
For general parentheses and bracketing in text, Japanese uses:
H | V | ||
---|---|---|---|
( | ) | ||
[ | ] | - | |
〔 | 〕 | - |
〔 and its closing partner are the vertical equivalent of [, which is used in horizontal text.
Although there are a number of other bracket characters (listed just below), they are less commonly used.
Japanese uses different quote marks for horizontal and vertical writing. The default quote marks are:
H | V | ||
---|---|---|---|
“ | ” | - | |
「 | 」 | ||
〝 | 〟 | - |
When an additional quote is embedded within the first, the quote marks are:
H | V | ||
---|---|---|---|
‘ | ’ | - | |
『 | 』 | - |
Japanese sometimes uses katakana characters to create visual emphasis. uk,720
もうダメだ moː dame da it's too late!
Japanese Layout Requirements lists the following ways of showing emphasis in Japanese.
Note that this list doesn't include italicisation or bolding of text. (1) and (2) are popular approaches. (5) is not as common, but is a traditional approach with some value attached.
Different boten marks are used in horizontal and vertical text. Typically, bullets are used above characters in horizontal text, and sesame dots are used to the right of characters in vertical text.j,#composition_of_emphasis_dots
The boten mark is centre-aligned with the base characters in horizontal text, and middle-aligned in vertical text, and doesn't normally appear alongside full stops, commas, or brackets.j,#composition_of_emphasis_dots
Embedded text in other languages would have boten marks displayed on the same side as for Japanese.
tbd
Japanese has a number of logograms used as abbreviations.
ヶ is a reasonably common shorthand for the character 箇, which is a counter for months, places or provisions. It is pronounced ka or ko, and is not related to the larger kana ケ, which is pronounced ke. See also @Wikipedia,https://en.wikipedia.org/wiki/Small_ke.
三ヶ月
3ヶ
〆 is primarily used as a short form of ʃime from the verb 閉める. For example, it can be used as follows in place of the word 締切.@Wiktionary,https://en.wiktionary.org/wiki/〆#Japanese
〆切
ゟ is a ligature of the word より.
ヿ is a shorthand for the word 事.
〼 is derived from a semi-pictogram for a small wooden measuring box called masu. It then moved on to represent a shorthand for the grammatical ending for the present tense verb, which has the same sound.
Japanese has a number of iteration marks that repeat the previous syllable or word. The repeated sound may differ slightly due to rendaku sound changes.
々 and 〻 are used to repeat kanji characters; the former for horizontal text, and the latter (rare) for vertical text.
人々 島〻
There are separate marks for hiragana and katakana, and within that division there are separate marks for syllables that begin with a voiced stop. ゝ and ゞ are used to denote hiragana ordinary and voiced stop repetitions, respectively. The katakana equivalents are ヽ and ヾ.
かゝし たゞし バナヽ
It is not common, but it is possible to find horizontal text that repeats the iteration mark in order to repeat multiple characters.
馬鹿々々しい
See also @Wikipedia,https://en.wikipedia.org/wiki/Iteration_mark#Japanese.
Japanese also has a set of graphemes to indicate repetition of multiple characters, although they are mostly obsolete these days.@Wikipedia,https://en.wikipedia.org/wiki/Iteration_mark#Japanese They are only used in vertical text, and they take up 2 character spaces.
The Unicode Standard also provides half forms, which can be combined to span the 2 character distance.
Japanese has a few ways of representing inline notes and annotations.
Various ways of arranging inter-linear annotations alongside text fall under the rubrique of ruby (named from the British print size originally used for the annotations). These include mono-ruby, jukugo-ruby, and group-ruby, and they are described in detail below.
Ruby is commonly used to indicate the pronunciation of ideographic characters used in Japanese, as it cannot usually be guessed and so can pose difficulties for those learning the language. For these cases, mono-ruby is most commonly used, however a variant, jukugo-ruby, is sometimes applied to compound nouns (which are called jukugo in Japanese).
Where sequences of kanji characters do not have the same pronunciation as the sum of their parts (called jukuji), such as the following two words, group ruby is used to represent the sound.j,#h-note-109
Click on the words to see their composition.
田舎 今日
Ruby annotations are also used to provide brief indications of the meaning of words or characters. These annotations typically use the group-ruby approach. The most typical example of this is attaching ruby text to a kanji compound word to indicate a corresponding loan word in katakana (see fig_group_ruby).j,#id221 Group ruby is also used to indicate the reading or the meaning of a Western word used in base text, or where a synonymous Western word in Latin characters is attached as a ruby annotation to a Japanese word (see Figure 112).
The rest of this section describes features that are generally common to all forms of ruby, before we move on to examine the differences in following subsections.
All annotations appear within the standard inter-line space for the page, and don't create extra line height if they only appear on a single line. The inter-line space is usually set at an appropriate size to accommodate annotations.
Unlike Chinese, it is common to find annotations applied just to specific words, rather than annotating the whole text.
Ruby annotations normally appear above horizontal lines of text, and to the right of vertical lines. Occasionally, both phonetic and semantic annotations are applied to the same base text, in which case the annotations appear on both sides of the base. A typical scenario in these cases would be to have mono-ruby above/right of the base, and group-ruby below/left.j,#choice_of_sides_for_ruby_with_respect_to_base_characters
The character frame of kana annotations is usually half that of the base character. Occasionally, annotations are compressed in one direction (depending of direction of writing) so that 3 fit over a single base character.j,#fig2_3_10 In large text (12pt or more), such as headings, the size of the annotation may be less that half that of the base.j,#fig2_3_11
Usually applied to kanji base characters, each base character is associated individually with an annotation.
Annotations are normally centred over the base character in horizontal text, and with the middle of the base character in vertical text. (called nakatsuki). An alternative, used only in vertical text, is to align the annotation with the top of the character frame of the base character (katatsuki)j,#id227, as in the righthand example in fig_jukugo.
Since the annotation characters are usually 1/2 the size of the base characters, 3-character annotations require more space that the underlying kanji. Internally to the sequence, this will produce a gap between the base characters, since annotations cannot overlap (see fig_mono_ruby).
At either end of the sequence, either a gap is opened up between the base character with the long annotation and its neighbour (see fig_overhang), or the annotation may overhang the neighbouring base characters. Simpler implementations produce gaps, but allow annotations to overhang any blank parts of adjacent fullwidth punctuation characters. More sophisticated applications may allow overlap of kana or other characters, though never kanjij,#232, but may also have to deal more complicated algorithms, such as balancing space on either side of the ruby sequence, or deciding what can and cannot be overlapped, and to what extent.j,#id229 j,#adjustments_of_ruby_with_length_longer_than_that_of_the_base_characters
At line start or line end, long annotations do not protrude past the line edge – meaning that there will be a gap between the base character and the line edge.
Lines can be broken in the middle of a sequence of mono-ruby annotations, since an associated base and annotation are kept together.
Applies when the base is a sequence of characters, mapped to a single annotation. The base can be a sequence of either kanji or other characters, as can be the annotation.
When the annotation is shorter than the base, and the annotation is composed of kana or kanji characters, they are typically spread out with two units of equal spacing between each character and one at either end. The end space should never exceed half the width of a base character.j,#positioning_of_groupruby_with_respect_to_base_characters
When the base is shorter than the annotation, the inverse applies.
If the annotation or the base is not kanji or kana, the text is set solid and centred relative to the other component (see fig_latin_ruby).
Overhang behaviour is the same as described for mono-ruby, as is the handling at line ends when the annotation is longer than the base.
Unlike a sequence of mono-ruby, there is no line-break opportunity inside a group-ruby.
Where compound nouns (jukugo) occur, special rules for arrangement of annotation characters (so-called jukugo-ruby) can make it appear that they are evenly distributed across the word (see the lefthand example in fig_jukugo), but there are rules about how much and what type of overhang are allowed, which sometimes lead to gaps (see the righthand example of fig_jukugo).
An important feature of jukugo-ruby is that where the full compound noun doesn't fit at the end of a line the base characters wrap one-by-one in the normal way, taking with them the appropriate annotations. The annotation for a single base character is never split across a line break.
It is up to the author whether a word that is actually a sequence or 2 compound nouns is treated as a single jukugo ruby, or as two separate ones.
There are numerous options for overhang and arrangement of jukugo-ruby annotations. They are discussed in detail in JLReq.
Where text sizes are too small for ruby characters to be easily read, the ruby annotation is typically rendered after the base text, in parentheses.
Inline annotations should normally correspond to full words, even if the sequence of base characters would otherwise be represented using mono-ruby. For example, the inline representation of the word 東京 should be displayed inline as 東京(とうきょう) and not 東(とう)京(きょう)
Warichu is a method of adding notes right alongside the relevant text, used particularly in study guides, travel guides, reference books, encyclopedias and manuals. It is generally only used in vertical text, although it is occasionally used in horizontal text for study guides and encyclopedias.
The note is usually surrounded by parentheses (or rarely just spaces), and the text of the note is half the size of the main text and arranged in two parallel lines. The two parallel lines are usually set with no inter-line spacing.
The warichu lines should be as close to equal in length as possible, given the normal wrapping rules, and if there is a difference, the initial line (right side) should be the longer.
In the rare event that the warichu text breaks across more than one line (see fig_warichu on the right), both lines of the warichu on the first line of the main text should be read completely before continuing to the remainder of the note. The characters in memory follow the normal reading sequence (and use normal characters, too), but the application needs to rearrange the visual order around the line break.
Underlines may be used to emphasise words or phrases. (Emphasis can also be indicated in other ways, such as using dots alongside the line – see Emphasis. This section focuses on the practical mechanics of underlining.)
When lines or other text decoration are used, they normally appear below horizontal text, and to the left of vertical text.
If a line of Japanese text contains some text in another language and orthography, the position of any text decoration should follow the Japanese conventions.
Observation: This section needs to be edited. Some punctuation marks should be discussed in other sections, and more explanations are needed for those items that remain.
CLDR 31 lists the following punctuation characters for Japanese. First the fullwidth forms of normal characters.
Then the halfwidth forms.
And finally, the other punctuation.
The katakana block contains two additional punctuation marks.
・ is used to separate words when writing non-Japanese phrases.uk,720
゠ is a delimiter occasionally used in analyzed Katakana or Hiragana textual material.
The hiragana block contains some combining and modifier characters used to represent dakuten and han-dakuten for compatibility with older systems.
The kana blocks each have two marks that are used to indicate repetition of a syllable – one for syllables with unvoiced consonants and another for voiced. The table below shows the hiragana first, then the katakana. In both cases there is a character for repetition of ordinary syllables, and one for repetition of syllables with dakuten.
Unicode also has 3000 for occasions where it is needed.
Lines are normally wrapped between characters – word boundaries usually have no significance for the wrapping. However, occasionally there is a preference to wrap text at word boundaries, eg. to better balance headings.
Kanji characters have the ID line-break property, which means that lines can ordinarily break before and after and between pairs of ideographic characters. Note that this class also includes characters other than Han ideographs.
Kinsoku rules. Japanese should also take into account a few rules (called kinsoku rules) which dictate what characters cannot appear at the end or start of a line. The set of characters affected by these rules varies slightly from application to application, but fig_kinsoku_start and fig_kinsoku_start show examples of the kinds of punctuation involved.
There are a number of ways to handle these characters:
Wrap the previous character to the next line with the punctuation.
Leave the punctuation character protruding into the margin (if there is one).
Ignore the kinsoku rules.
Where a gap appears at the end of a line, full justification is usually restored by adding space across the line (see justification).
Small kana. These kinsoku rules may also be used to prevent small kana characters appearing alone at the start of a line. However, this is much more likely to reflect the preferences of the author. For example, the rule may be ignored in narrow newspaper columns.
There is no hyphenation at line-breaks for Japanese text.
The preferred arrangement of characters on a line is solid set, ie. each character frame immediately follows the previous one, each with the same width. In principle, in books where the width of the text area on a page is set by counting characters and fixed, paragraphs composed of kanji and kana characters don't need to be justified. Lines break as soon as the line is full of characters, and the whole paragraph has grid lines vertically and horizontally between the characters.
However, a number of factors may introduce a need to introduce justification, from time to time. One such would be punctuation that pulls the last character of the previous line with it to the next line, so that it doesn't begin a line on its own. Another would be web-based text where windows can be stretched, resulting in a situation where the width of a line no longer exactly corresponds to the sum of the width of all the characters on that line. Other situations include lines where proportionally-spaced romaji text breaks the grid effect.
Japanese justifies text using a complex set of rules which adjust the space between characters on a line. Some characters are adjusted before others. Typically in character-based justification, rules are applied to different types of character in successive waves. For example, the algorithm may attempt to reduce the spacing around punctuation first, and only when more adjustment is needed turn to adjusting the spacing between ideographs.
In situations where a set of lines each contains self-contained text, the line content may be stretched to fit the line width, for example in table cells. In this case it is typical to set the first and last characters at the line start and end, respectively, and then apply equal amounts of spacing between all remaining characters. This can result in large gaps, including lines where the two characters are arranged at opposite line ends with nothing between. See fig_distributed_spacing.
It is common for the start of a paragraph in Japanese text to be indicated by indenting the first line, rather than adding inter-paragraph leading. The indentation is generally one full character width, although there are complications when a line begins with certain characters, such as brackets or parentheses.
This section looks at ways in which spacing is applied between characters over and above that which is introduced during basic justification. That said, the text spacing techniques described here may also be used or folded in when creating a fully justified paragraph.
Letter-spacing is used to achieve balance between items with large and small numbers of characters, such as headings, running heads, and captions. When expanding text, equal amounts of space are added between the character frames of the item with the smaller number of characters.
Reducing inter-character spacing. Although solid set text is normally best for readability, in large print sizes, such as for magazine headings, it may be desirable to reduce the distance between certain characters. This is typically done by reducing the distance between adjacent letter faces.
Sometimes, text may also be kerned by overlapping the character frames by a regular amount across a whole line.
When a run of romaji or ASCII numerals appears in text, it is often set off from the surrounding kanji/kana letters by a small space.
The amount of spacing can vary. JLREQj,#id209 suggests a ¼em space, but sometimes other spaces may be appropriate, such as ⅙em.gh
Such spacing is not needed when the phrase is followed or preceded by punctuation that already has built in space. It also doesn't appear at the line start/end.
To achieve this in web pages use the text-autospace
CSS property in your style sheet; don't use space characters. For full details of the options available see the CSS spec.
By default, the browser should insert a gap automatically between runs of ideographs and runs of both non-ideographic letters/numerals. The size of the gap is dependent on the browser, however the CSS spec suggests 1/8 of the width of an ideographic character.
There are 2 ways in which CSS can add gaps. If the text already contains gaps produced using ordinary space characters the CSS will, by default, only add gaps where there are no spaces. If, on the other hand, you want to reduce the width of the those space-based gaps, or apply even spacing throughout, then use the replace
value. text-autospace:ideograph-alpha ideograph-numeric replace;
will remove any space characters and replace them with a standard width gap, while also creating gaps where the space character hadn't been used.
To remove all synthesised gaps (but leave any manually-typed space characters in place) use text-autospace:no-autospace
.
The other values can be used to tweak the results as follows.
Most of the time you will probably want to use the following:
text-autospace: ideograph-alpha ideograph-numeric replace;
Punctuation such as full stop, comma, parentheses, etc. normally has built-in space associated with it because the ink takes up only a part of the em square. However, in some situations, the blank space is not appropriate.
When text is arranged on a strict grid pattern, none of this space removal applies.
Sequences of punctuation. One such example is when multiple punctuation marks appear side by side. fig_text_space_adjacent shows how space can be removed between a fullwidth comma and fullwidth bracket to reduce large blank spaces. It shouldn't be necessary to use halfwidth characters for this; you should use normal characters and the application should remove the appropriate amount of space automatically.
It is not yet possible to control this in web pages, but the CSS Text spec proposes a way forward using the text-spacing
CSS property§. The relevant property values are:
Eg. the following collapses spaces between punctuation marks:
text-spacing: trim-adjacent;
Line-initial punctuation. Similarly, space may be removed from punctuation at the start or the end of a line. If we use a bracket as an example, the ink of the bracket should be flush with the line start when that bracket occurs inside a paragraph. Where paragraphs are separated by a blank line, the bracket at the start of the first line should also be flush with the left edge of the text.
It is common, however, to have no blank line before a Japanese paragraph, but instead indent the paragraph's first line. Usually this indent is the width of one fullwidth character. If the line begins with a punctuation such as a bracket, the empty space that usually precedes a fullwidth bracket is still dropped, but the line is set so that the glyph hangs into the indent (which, visually, looks like it is preceded by a half-width space). fig_text_space_para shows examples of this.
It is not yet possible to achieve this in web pages, but the CSS Text spec proposes a way forward using the text-spacing
CSS property§.
A typical way of setting styling for indented paragraphs would therefore include something like this
p { margin: 0; text-indent: 1em; text-spacing: trim-start; hanging-punctuation: first; }
The relevant property values are:
In some cases, the paragraph-start line indentation has been achieved by adding a fullwidth bracket at the start of the paragraph (rather than indentation), while removing the leading space from other brackets in the paragraph. Indentations for lines that don't begin with a bracket-like punctuation will typically use an ideographic space character rather than styling to create the indent (because line indentation doesn't behave differently depending on whether a line starts with a bracket). This approach is not recommended, because it impedes the ability of authors to change behaviour simply through changing the styling, but to provide a workaround for legacy text in this situation, CSS proposes another value:
space-start
on the first line the block container and each line after a forced line break but as trim-start
on all other lines.Line-final punctuation. It is often useful to remove trailing space from a fullwidth punctuation glyph if it allows that character to fit at the end of a line (rather than wrapping it to the next line).
Again, it is not yet possible to achieve this in web pages, but the CSS Text spec proposes a way forward using the text-spacing
CSS property§. The relevant property values are:
The standard baseline for kanji and kana characters is slightly lower than the alphabetic baseline used for Latin characters. Mixed script text needs to align baselines correctly.
fig_baselines shows metrics for the Hiragino Mincho Pro font. In this font the maximum height of the Japanese letters reaches slightly higher than the Latin ascenders, but not as low as the Latin descenders.
Japanese characters have no ascenders or descenders, but occupy the square space described earlier. Some characters use more of the square space than others, as can be seen in fig_baselines.
You can experiment with counter styles using the Counter styles converter. Patterns for using these styles in CSS can be found in Ready-made Counter Styles, and we use the names of those patterns here to refer to the various styles.
Japanese text uses a number of different counter styles. Some of the more common include full-width European numbers, which in vertical text stand upright. Unicode has various sets of numbers that can be useful here.
For the dotted-decimal numeric style Unicode provides precomposed characters from 1 to 20.
For the circled-decimal numeric style Unicode provides characters from 1 to 50.
The Japanese orthography also uses kanji or kana characters to create 1 fixed, 4 alphabetic, and 2 additive styles.
The circled-katakana fixed style uses the following letters. The suffix is a space, and the numbers run from 1 to 47.
The alphabetic styles all use 、 as a suffix (with no following space). The iroha style ordering is based on the order of characters in a pangram poem dating from the Heian era (794–1179).
The hiragana alphabetic style uses these 48 hiragana characters in the order in which they are typically arranged.
Examples:
The hiragana-iroha alphabetic style uses 47 hiragana characters in the order shown just below.
Examples:
The katakana alphabetic style uses these 48 hiragana characters in the order in which they are typically arranged.
Examples:
The katakana-iroha alphabetic style uses 47 hiragana characters in the order shown just below.
Examples:
The Japanese additive styles have a range -9,999 to 9,999 and use kanji characters only. The suffix is 、, and negative numbers are preceded by マイナス.
The japanese-informal additive style uses these letters.
Examples:
The japanese-formal additive style uses these letters.
Examples:
The most common suffix is 、. The fixed styles have no prefix/suffix.
Examples:
Large paragraph-initial characters can easily be found in Japanese content. The character typically fills a box that is the height (or width, in vertically-set text) of 2-4 lines.
Books, magazines, et cetera, that are vertically set have the front cover on the right, and pages turn to the right as you read.
Rather than specifying margins and then filling the space between with the body of the text, Japanese text areas will usually be defined by specifying the width and height of the text area as a number of characters, and then determining the size of the margins based on what remains of the page size. This is possible because Japanese characters are drawn in square character frames, all the same size.
In fact, the calculations also include an inter-line space. This inter-line space must be set for the whole page at a size that is large enough to accommodate any ruby annotations or other items that may protrude into the line gap. Therefore, the line height doesn't change for individual lines that have ruby annotations.
Defining layout of the text area in this way creates a virtual grid, to which some things snap. For example, headings may be indented by a given number of character spaces, and are centred on a given number of lines in the grid. Page headers and footers may also correspond to aspects of the text area grid for positioning.
Columns in vertically set text run horizontally from right to left.
The title for this content runs horizontally across the top of the columns. This is a common approach. Note that although the columns are read RTL, the heading is LTR.
See inlinenotes for purely inline annotations, such as ruby or warichu. This section is about annotation systems that separate the reference marks and the content of the notes.
※ can be used in text to set up a footnote reference, and in the footnotes themselves. It can be followed by a number when there are multiple notes, eg. ※1, ※2, etc.
Wikipedia provides the following example.
Form controls on Web pages should be rotated 90 degrees clockwise, compared to the form controls for Western languages.9→
The following figures show examples of what is expected. Major browsers don't fully support forms with this orientation at the time of writing.
Page headers and footers typically run horizontally on vertically set pages.
Version 13.0 of the Unicode Standard has the following blocks dedicated to the Japanese script (numbers in lists are non-ASCII only):
Apart from ASCII characters, the Japanese orthography described here uses 2,136 characters (and 11 more, used infrequently) from the following Unicode blocks: