Japanese orthography summary

Characters

Kanji characters

Kanji characters are mostly derived from the Chinese Han script. They are commonly used for word roots and compound words.

静電気 — The compound word static electricity (seidenki), written with kanji.

Kanji characters are primarily constructed from characters that each represent a phonetic symbol. Some have pictographic origins that are still evident, whereas others have a more complicated structure.

In reforms in the mid 20th century, the Japanese repertoire was standardised on around 2,000 core characters, however standardised computer character sets support a few thousand more.

The Jōyō kanji character set (常用漢字) is intended as a literacy baseline for those who have completed compulsory education, as well as a list of permitted characters and readings for use in official government documentswjy.

A second list, called Jinmeiyō kanji (人名用漢字), is a supplementary list of 863 characters that can legally be used in registered personal names in Japan.wj

The number of characters in these lists changes from time to time. The Wikipedia articles for Jōyō and Jinmeiyō kanji provide useful timelines indicating changes over the years.

In addition to the basic lists, there are a number of variants and traditional forms that need to be considered. Note also that the jōyō character set only provides a baseline for the educational process. School leavers still have more characters to learn to achieve a working competency.

Show kanji characters in jōyō and jinmeiyō lists in 2020.

Jōyō kanji	亜哀挨愛曖悪握圧扱宛嵐安案暗以衣位囲医依委威為畏胃尉異移萎偉椅彙意違維慰遺緯域育一壱逸茨芋引印因咽姻員院淫陰飲隠韻右宇羽雨唄鬱畝浦運雲永泳英映栄営詠影鋭衛易疫益液駅悦越謁閲円延沿炎怨宴媛援園煙猿遠鉛塩演縁艶汚王凹央応往押旺欧殴桜翁奥横岡屋億憶臆虞乙俺卸音恩温穏下化火加可仮何花佳価果河苛科架夏家荷華菓貨渦過嫁暇禍靴寡歌箇稼課蚊牙瓦我画芽賀雅餓介回灰会快戒改怪拐悔海界皆械絵開階塊楷解潰壊懐諧貝外劾害崖涯街慨蓋該概骸垣柿各角拡革格核殻郭覚較隔閣確獲嚇穫学岳楽額顎掛潟括活喝渇割葛滑褐轄且株釜鎌刈干刊甘汗缶完肝官冠巻看陥乾勘患貫寒喚堪換敢棺款間閑勧寛幹感漢慣管関歓監緩憾還館環簡観韓艦鑑丸含岸岩玩眼頑顔願企伎危机気岐希忌汽奇祈季紀軌既記起飢鬼帰基寄規亀喜幾揮期棋貴棄毀旗器畿輝機騎技宜偽欺義疑儀戯擬犠議菊吉喫詰却客脚逆虐九久及弓丘旧休吸朽臼求究泣急級糾宮救球給嗅窮牛去巨居拒拠挙虚許距魚御漁凶共叫狂京享供協況峡挟狭恐恭胸脅強教郷境橋矯鏡競響驚仰暁業凝曲局極玉巾斤均近金菌勤琴筋僅禁緊錦謹襟吟銀区句苦駆具惧愚空偶遇隅串屈掘窟熊繰君訓勲薫軍郡群兄刑形系径茎係型契計恵啓掲渓経蛍敬景軽傾携継詣慶憬稽憩警鶏芸迎鯨隙劇撃激桁欠穴血決結傑潔月犬件見券肩建研県倹兼剣拳軒健険圏堅検嫌献絹遣権憲賢謙鍵繭顕験懸元幻玄言弦限原現舷減源厳己戸古呼固股虎孤弧故枯個庫湖雇誇鼓錮顧五互午呉後娯悟碁語誤護口工公勾孔功巧広甲交光向后好江考行坑孝抗攻更効幸拘肯侯厚恒洪皇紅荒郊香候校耕航貢降高康控梗黄喉慌港硬絞項溝鉱構綱酵稿興衡鋼講購乞号合拷剛傲豪克告谷刻国黒穀酷獄骨駒込頃今困昆恨根婚混痕紺魂墾懇左佐沙査砂唆差詐鎖座挫才再災妻采砕宰栽彩採済祭斎細菜最裁債催塞歳載際埼在材剤財罪崎作削昨柵索策酢搾錯咲冊札刷刹拶殺察撮擦雑皿三山参桟蚕惨産傘散算酸賛残斬暫士子支止氏仕史司四市矢旨死糸至伺志私使刺始姉枝祉肢姿思指施師恣紙脂視紫詞歯嗣試詩資飼誌雌摯賜諮示字寺次耳自似児事侍治持時滋慈辞磁餌璽鹿式識軸七𠮟失室疾執湿嫉漆質実芝写社車舎者射捨赦斜煮遮謝邪蛇尺借酌釈爵若弱寂手主守朱取狩首殊珠酒腫種趣寿受呪授需儒樹収囚州舟秀周宗拾秋臭修袖終羞習週就衆集愁酬醜蹴襲十汁充住柔重従渋銃獣縦叔祝宿淑粛縮塾熟出述術俊春瞬旬巡盾准殉純循順準潤遵処初所書庶暑署緒諸女如助序叙徐除小升少召匠床抄肖尚招承昇松沼昭宵将消症祥称笑唱商渉章紹訟勝掌晶焼焦硝粧詔証象傷奨照詳彰障憧衝賞償礁鐘上丈冗条状乗城浄剰常情場畳蒸縄壌嬢錠譲醸色拭食植殖飾触嘱織職辱尻心申伸臣芯身辛侵信津神唇娠振浸真針深紳進森診寝慎新審震薪親人刃仁尽迅甚陣尋腎須図水吹垂炊帥粋衰推酔遂睡穂随髄枢崇数据杉裾寸瀬是井世正生成西声制姓征性青斉政星牲省凄逝清盛婿晴勢聖誠精製誓静請整醒税夕斥石赤昔析席脊隻惜戚責跡積績籍切折拙窃接設雪摂節説舌絶千川仙占先宣専泉浅洗染扇栓旋船戦煎羨腺詮践箋銭潜線遷選薦繊鮮全前善然禅漸膳繕狙阻祖租素措粗組疎訴塑遡礎双壮早争走奏相荘草送倉捜挿桑巣掃曹曽爽窓創喪痩葬装僧想層総遭槽踪操燥霜騒藻造像増憎蔵贈臓即束足促則息捉速側測俗族属賊続卒率存村孫尊損遜他多汰打妥唾堕惰駄太対体耐待怠胎退帯泰堆袋逮替貸隊滞態戴大代台第題滝宅択沢卓拓託濯諾濁但達脱奪棚誰丹旦担単炭胆探淡短嘆端綻誕鍛団男段断弾暖談壇地池知値恥致遅痴稚置緻竹畜逐蓄築秩窒茶着嫡中仲虫沖宙忠抽注昼柱衷酎鋳駐著貯丁弔庁兆町長挑帳張彫眺釣頂鳥朝貼超腸跳徴嘲潮澄調聴懲直勅捗沈珍朕陳賃鎮追椎墜通痛塚漬坪爪鶴低呈廷弟定底抵邸亭貞帝訂庭逓停偵堤提程艇締諦泥的笛摘滴適敵溺迭哲鉄徹撤天典店点展添転塡田伝殿電斗吐妬徒途都渡塗賭土奴努度怒刀冬灯当投豆東到逃倒凍唐島桃討透党悼盗陶塔搭棟湯痘登答等筒統稲踏糖頭謄藤闘騰同洞胴動堂童道働銅導瞳峠匿特得督徳篤毒独読栃凸突届屯豚頓貪鈍曇丼那奈内梨謎鍋南軟難二尼弐匂肉虹日入乳尿任妊忍認寧熱年念捻粘燃悩納能脳農濃把波派破覇馬婆罵拝杯背肺俳配排敗廃輩売倍梅培陪媒買賠白伯拍泊迫剝舶博薄麦漠縛爆箱箸畑肌八鉢発髪伐抜罰閥反半氾犯帆汎伴判坂阪板版班畔般販斑飯搬煩頒範繁藩晩番蛮盤比皮妃否批彼披肥非卑飛疲秘被悲扉費碑罷避尾眉美備微鼻膝肘匹必泌筆姫百氷表俵票評漂標苗秒病描猫品浜貧賓頻敏瓶不夫父付布扶府怖阜附訃負赴浮婦符富普腐敷膚賦譜侮武部舞封風伏服副幅復福腹複覆払沸仏物粉紛雰噴墳憤奮分文聞丙平兵併並柄陛閉塀幣弊蔽餅米壁璧癖別蔑片辺返変偏遍編弁辛便勉歩保哺捕補舗母募墓慕暮簿方包芳邦奉宝抱放法泡胞俸倣峰砲崩訪報蜂豊飽褒縫亡乏忙坊妨忘防房肪某冒剖紡望傍帽棒貿貌暴膨謀頰北木朴牧睦僕墨撲没勃堀本奔翻凡盆麻摩磨魔毎妹枚昧埋幕膜枕又末抹万満慢漫未味魅岬密蜜脈妙民眠矛務無夢霧娘名命明迷冥盟銘鳴滅免面綿麺茂模毛妄盲耗猛網目黙門紋問冶夜野弥厄役約訳薬躍闇由油喩愉諭輸癒唯友有勇幽悠郵湧猶裕遊雄誘憂融優与予余誉預幼用羊妖洋要容庸揚揺葉陽溶腰様瘍踊窯養擁謡曜抑沃浴欲翌翼拉裸羅来雷頼絡落酪辣乱卵覧濫藍欄吏利里理痢裏履璃離陸立律慄略柳流留竜粒隆硫侶旅虜慮了両良料涼猟陵量僚領寮療瞭糧力緑林厘倫輪隣臨瑠涙累塁類令礼冷励戻例鈴零霊隷齢麗暦歴列劣烈裂恋連廉練錬呂炉賂路露老労弄郎朗浪廊楼漏籠六録麓論和話賄脇惑枠湾腕	2,138
Alternates	剥叱填頬	4
Jōyō traditional variant forms (not including 61 compatibility forms that normalise to other characters)	亞惡壓圍醫爲壹隱榮營衞驛圓鹽緣艷應歐毆櫻奧橫溫穩假價畫會繪壞懷槪擴殼覺學嶽樂渴罐卷陷勸寬關歡觀氣歸龜僞戲犧舊據擧虛峽挾狹鄕曉區驅勳薰徑莖惠揭溪經螢輕繼鷄藝擊缺硏縣儉劍險圈檢獻權顯驗嚴廣效恆黃鑛號國黑碎濟齋劑雜參棧蠶慘贊殘絲齒兒辭濕實寫舍釋壽收從澁獸縱肅處緖敍將稱涉燒證奬條狀乘淨剩疊繩壤孃讓釀觸囑眞寢愼盡圖粹醉穗隨髓樞數瀨聲齊靜竊攝絕專淺戰踐錢潛纖禪雙壯爭莊搜插巢曾瘦裝總騷增藏臟卽屬續墮對體帶滯臺瀧擇澤擔單膽團斷彈遲癡蟲晝鑄廳徵聽敕鎭遞鐵點轉傳燈當黨盜稻鬭德獨讀屆貳惱腦霸拜廢賣麥發髮拔晚蠻祕濱甁拂佛倂竝餠邊變辨瓣辯步寶豐襃沒飜每萬滿麵默彌譯藥與豫餘譽搖樣謠來賴亂覽龍兩獵綠淚壘禮勵戾靈齡曆歷戀鍊爐勞郞樓錄灣	305
The compatibility forms	逸謁禍悔海慨喝褐漢祈既器響勤謹穀殺祉視者煮臭祝暑署諸祥神節祖僧層贈贈嘆著懲塚都突難梅繁卑碑賓頻敏侮福塀勉墨免欄隆虜類練朗廊	61
Jinmeiyō kanji	丑丞乃之乎也云亘些亦亥亨亮仔伊伍伽佃佑伶侃侑俄俠俣俐倭俱倦倖偲傭儲允兎兜其冴凌凜凧凪凰凱函劉劫勁勺勿匁匡廿卜卯卿厨厩叉叡叢叶只吾吞吻哉哨啄哩喬喧喰喋嘩嘉嘗噌噂圃圭坐尭坦埴堰堺堵塙壕壬夷奄奎套娃姪姥娩嬉孟宏宋宕宥寅寓寵尖尤屑峨峻崚嵯嵩嶺巌巫已巳巴巷巽帖幌幡庄庇庚庵廟廻弘弛彗彦彪彬徠忽怜恢恰恕悌惟惚悉惇惹惺惣慧憐戊或戟托按挺挽掬捲捷捺捧掠揃摑摺撒撰撞播撫擢孜敦斐斡斧斯於旭昂昊昏昌昴晏晃晒晋晟晦晨智暉暢曙曝曳朋朔杏杖杜李杭杵杷枇柑柴柘柊柏柾柚桧栞桔桂栖桐栗梧梓梢梛梯桶梶椛梁棲椋椀楯楚楕椿楠楓椰楢楊榎樺榊榛槙槍槌樫槻樟樋橘樽橙檎檀櫂櫛櫓欣欽歎此殆毅毘毬汀汝汐汲沌沓沫洸洲洵洛浩浬淵淳渚淀淋渥渾湘湊湛溢滉溜漱漕漣澪濡瀕灘灸灼烏焰焚煌煤煉熙燕燎燦燭燿爾牒牟牡牽犀狼猪獅玖珂珈珊珀玲琢琉瑛琥琶琵琳瑚瑞瑶瑳瓜瓢甥甫畠畢疋疏皐皓眸瞥矩砦砥砧硯碓碗碩碧磐磯祇祢祐祷禄禎禽禾秦秤稀稔稟稜穣穹穿窄窪窺竣竪竺竿笈笹笙笠筈筑箕箔篇篠簞簾籾粥粟糊紘紗紐絃紬絆絢綺綜綴緋綾綸縞徽繫繡纂纏羚翔翠耀而耶耽聡肇肋肴胤胡脩腔脹膏臥舜舵芥芹芭芙芦苑茄苔苺茅茉茸茜莞荻莫莉菅菫菖萄菩萌萊菱葦葵萱葺萩董葡蓑蒔蒐蒼蒲蒙蓉蓮蔭蔣蔦蓬蔓蕎蕨蕉蕃蕪薙蕾蕗藁薩蘇蘭蝦蝶螺蟬蟹蠟衿袈袴裡裟裳襖訊訣註詢詫誼諏諄諒謂諺讃豹貰賑赳跨蹄蹟輔輯輿轟辰辻迂迄辿迪迦這逞逗逢遥遁遼邑祁郁鄭酉醇醐醍醬釉釘釧銑鋒鋸錘錐錆錫鍬鎧閃閏閤阿陀隈隼雀雁雛雫霞靖鞄鞍鞘鞠鞭頁頌頗顚颯饗馨馴馳駕駿驍魁魯鮎鯉鯛鰯鱒鱗鳩鳶鳳鴨鴻鵜鵬鷗鷲鷺鷹麒麟麿黎黛鼎	633
Jinmeiyō variants	亙凛巖堯晄檜槇渚猪琢禰祐禱祿禎穰萠遙	18

CJK compatibility characters

The Jōyō traditional forms include 60 kanji shapes that Unicode includes in the CJK Compatibility Ideographs block. Normalisation operations (which in some systems may happen automatically, or during things such as cut & paste) convert them to characters in the main CJK block. This makes them unstable, and best avoided. The following list shows the compatibility character shape to the left, and the normalised shape to the right.

逸逸,謁謁,禍禍,悔悔,海海,慨慨,喝喝,褐褐,漢漢,祈祈,既既,器器,響響,勤勤,謹謹,穀穀,殺殺,祉祉,視視,者者,煮煮,臭臭,祝祝,暑暑,署署,諸諸,祥祥,神神,節節,祖祖,僧僧,層層,贈贈,嘆嘆,著著,懲懲,塚塚,都都,突突,難難,梅梅,繁繁,卑卑,碑碑,賓賓,頻頻,敏敏,侮侮,福福,塀塀,勉勉,墨墨,免免,欄欄,隆隆,虜虜,類類,練練,朗朗,廊廊

Kana syllabaries

Japanese uses two syllabaries: hiragana and katakana. The vowel sounds u and i are often elided between non-voiced consonants, or at the end of a word.

Katakana characters are typically used for foreign loan words and names, such as the word 'text'. They are also used for things such as scientific names of plants and animals, onomatopoeic sounds, telegrams, and some female names.

テキスト — The word text, written with katakana syllables.

Hiragana is used for indigenous Japanese words, such as the verb 'to be'.

The word for to be desu, written with hiragana syllables.

It is also used for grammatical endings after a word root written using kanji characters.

集まります — The word for to collect atsumarimasu, with the verb root atsu written using a kanji character, and the remainder in hiragana expresssing the grammatical present-tense.

The basic syllabary includes 5 independent vowel sounds, one nasal sound, and the rest are consonant+vowel combinations. In these lists we show hiragana (first) and katakana (second) together.

あア,いイ,うウ,えエ,おオ,かカ,きキ,くク,けケ,こコ,さサ,しシ,すス,せセ,そソ,たタ,ちチ,つツ,てテ,とト,なナ,にニ,ぬヌ,ねネ,のノ,はハ,ひヒ,ふフ,へヘ,ほホ,まマ,みミ,むム,めメ,もモ,やヤ,ゆユ,よヨ,らラ,りリ,るル,れレ,ろロ,わワ,をヲ,んン

Voiced consonants are indicated by attaching a dakuten mark (looks like a quote mark) to the unvoiced shape. Unicode provides precomposed code points for every combination of syllable+dakuten.

がガ,ぎギ,ぐグ,げゲ,ごゴ,ざザ,じジ,ずズ,ぜゼ,ぞゾ,だダ,ぢヂ,づヅ,でデ,どド,ばバ,びビ,ぶブ,べベ,ぼボ,ヴ,ヷ,ヺ

The ‘p’ sound is indicated in a similar way by the use of a han-dakuten (half-dakuten).

ぱパ,ぴピ,ぷプ,ぺペ,ぽポ

The Unicode hiragana block does contain separate code points for dakuten combining marks and modifiers, but these are not normally used in text. However, if Unicode NFD normalisation is applied to text, the dakuten and han-dakuten are split from the base and the combining marks are used.

゙,゚,゛,゜

Long vowels

あ,い,う,え,お,ー

Various strategies are used to represent long vowels, and they tend to differ between hiragana and katakana. This elongation is phonemically significant.

In hiragana, the long vowels aː, iː, uː, and eː are written by adding a corresponding vowel.

eg.

おかあさん

おにいさん

すうがく

おねえさん

In words of Chinese origin eː may be written 'ei'.

eg.

ていねい

The long oː is usually written 'ou', but is sometimes written 'oo'.

eg.

おはよう

おおきい

In katakana, long vowels are indicated using ー. This character is used predominantly with katakana, but occasionally also with hiragana.uk,720

eg.

ビール

ボール

エスカレーター

In a few exceptions, katakana uses a similar approach to hiragana.

eg.

スペイン

Particles

The more common grammatical particles are spelled in an idiosynchratic way. The topic marker wa is written using は. The object marker o is written using を. And the location marker e is written using へ.

Small kana

The basic set of kana syllables is completed by a number of small forms used for medial glides, foreign sounds, and gemination, and a vowel lengthener.

ぁァ,ぃィ,ぅゥ,ぇェ,ぉォ,ゃャ,ゅュ,ょョ,ゎヮ,っッ

Small versions of や, ゆ, and よ are used to form syllables such as きゃ kya kʲa きゅ kya kʲu きょ kyo kʲo

っ and ッ are used to lengthen a following consonant sound.

eg.

ちょっと

It is also used to represent a glottal stop in a broken-off word.

eg.

あっ

The small vowel syllables shown above are typically used for transcribing unusual sounds, such as lengthening a preceding vowel, or transliterating foreign sounds, without creating a new syllable.

eg.

ふぁん

シフォン

ティー

はぁぁ

Yotsugana

Over time, certain voiced sounds have merged in several important dialects, as shown in fig_yotsugana.

	ぢ	じ	づ	ず
Tokyo (standard)	d͡ʑi~ʑi		d͡zɯᵝ~zɯᵝ
South Tohoku	d͡zɯᵝ
Kōchi (Hata, Tosa)	di~d͡zi	ʑi	dɯᵝ~d͡zɯᵝ	zɯᵝ
Kagoshima	d͡ʑi	ʑi	d͡zɯᵝ	zɯᵝ
Okinawa	d͡ʑi

Yotsugana pronunciation around Japan. (source wy.)

The orthographic reform shortly after World War 2 recommended the use of only じ and ず, except in circumstances where an unvoiced sound has become voiced because of:wy

compounding (rendaku), eg. 神無月 which combines かん, な, and つき, would be written in hiragana as かんなづき
repetition, eg. 続 is written in hiragana asつづく

Archaic characters

A number of characters in the kana blocks are no longer used in modern text, except in counter styles (see lists).

These characters were dropped by an orthographic reform shortly after World War 2.

ゐヰ,ゑヱ,ヸ,ヹ

Halfwidth katakana

Unicode has a set of halfwidth katakana forms for legacy encoding roundtrips. In principle, these characters should not be used. The normal, fullsized characters should be used instead.

･,ｦ,ｧ,ｨ,ｩ,ｪ,ｫ,ｬ,ｭ,ｮ,ｯ,ｰ,ｱ,ｲ,ｳ,ｴ,ｵ,ｶ,ｷ,ｸ,ｹ,ｺ,ｻ,ｼ,ｽ,ｾ,ｿ,ﾀ,ﾁ,ﾂ,ﾃ,ﾄ,ﾅ,ﾆ,ﾇ,ﾈ,ﾉ,ﾊ,ﾋ,ﾌ,ﾍ,ﾎ,ﾏ,ﾐ,ﾑ,ﾒ,ﾓ,ﾔ,ﾕ,ﾖ,ﾗ,ﾘ,ﾙ,ﾚ,ﾛ,ﾜ,ﾝ,ﾞ,ﾟ

Glyph shaping & positioning

Experiment with examples using the Japanese character app.

The Japanese scripts are not cursive, and when using precomposed kana (which is the norm) involve no context-based shaping or positioning.

The orthography has no case distinction.

By default, all kanji, hiragana, katakana, and punctuation characters are drawn inside a character frame that is square and the same size for all characters. The box containing the actual symbol is called the letter face, and there should be some space left between the letter face and the character frame. There may be variations, particularly for small kana, punctuation, etc., in the size of the letter face.

Because of the regularity of the character frame size, it can be used to measure the size of the text area or other parts of a page (horizontally or vertically).

In principle, Japanese characters are set solid, ie. with no space between the character frames. However, text alignment and justification can make adjustments to the placement of characters in the direction of the line flow. See justification and letterspace.

Font styles

The kanji characters are derived from Han characters originally used in Chinese. Many of the Japanese and Chinese characters are unified to the same code point in the Unicode repertoire, however over time small but systematic, language-related changes have appeared in the glyph shapes of some characters compared to their Chinese equivalents. It is important to choose fonts that present the user with the correct glyphs. fig_ja_zh_fonts provides some examples.

The same code points, displayed with a Japanese font (top) and Chinese font (bottom).

Besides the need to choose fallback fonts that match the language of the text, Japanese also has some recognisable font styles. Two well-known font styles are often called Mincho and Gothic. The former has strokes with fine gradations of stroke width, whereas the latter has darker strokes with little gradation. For fallback on the Web, these styles are usually equated with serif and sans-serif, respectively, although serifs are not actually involved.

尊厳と権利とについて平等である — The same text displayed using the Hiragino Mincho Pro font (top) and the Hiragino Kaku Gothic Pro font (bottom).

Another useful type of font style relates to the endings of Gothic font strokes, which can be flat or rounded.

すべての人間は — A typical Gothic font has strokes with squared-off endings (top), but sometimes a font with rounded stroke endings is preferred (bottom).

Context-based shaping & positioning

Horizontal vs. vertical transformation. Characters such as small kana and punctuation occupy different locations within the character frame in horizontal and vertical text.

Positioning of small ょ and the full stop in horizontal and vertical character frames.

fig_small_kana shows how in horizontal text small kana are centred horizontally in the character frame but are vertically below centre; in vertical text they are centred vertically, but aligned right.

The full stop also switches from bottom-left in horizontal text, to top-right in vertical.

These are differences that cannot be produced by rotating glyphs, but require special glyphs in the font which are applied when the directional context is detected.

Positioning of decomposed diacritics. When kana use a dakuten or han-dakuten there can be significant overlap with the base character. See fig_kerning_nfd.

デドヴペバ — Syllabic characters where the dakuten or han-dakuten overlaps the base.

This overlap needs to be unchanged whether the diacritics are part of a single glyph or are separate code points in decomposed text. In the latter case, careful positioning of the diacritics is required.

Shaping of punctuation. Many punctuation marks need to have different shapes for Japanese and non-Japanese text. Often these differences are due to the fact that punctuation for Japanese is based on the em-box, rather than the Latin baseline, cap-height, or x-height. A description of many such differences can be found in Ken Lunde's Proposal to add standardized variation sequences.

Transforming characters

Japanese kanji and kana is a monocameral orthography, and no transforms are needed to convert between different case forms for a given letter. However, romaji characters are cased.

Other transforms may be applied to convert between half-width and full-width characters. This can be useful for converting to and from fullwidth Latin and punctuation, and is sometimes useful for converting small kana characters to full-sized versions.

The latter transformation is common for ruby text (see inlinenotes), where small kana are converted visually to full-sized to aid with readability of the text, given that ruby text is written in small character sizes.

To achieve this in web pages use the text-transform CSS property@CSS Text specification,https://www.w3.org/TR/css-text-3/#transforming in your style sheet with the following values.

full-width (Not yet supported by browsers.)@CSS Text specification,https://www.w3.org/TR/css-text-3/#transforming: Transforms all ASCII characters into fullwidth forms.
full-size-kana (Not yet supported by browsers.)@CSS Text specification,https://www.w3.org/TR/css-text-3/#transforming: Converts all small Kana characters to the equivalent full-size Kana.

Eg. the following converts the visual appearance (only) of small kana in ruby text to fullwidth characters, except in headings (where the characters are larger):
rt { font-size: 50%; text-transform: full-size-kana; } :is(h1, h2, h3, h4) rt { text-transform: none; /* unset for large text*/ }

Punctuation & inline features

Phrase & section boundaries

Japanese uses the following separators at the sentence level and below.j,#differences_in_vertical_and_horizontal_composition_in_use_of_punctuation_marks Some of the punctuation looks like that for Latin (eg. parentheses, commas, and full stops), but the width of the punctuation is likely to include significant amounts of white space, so that punctuation characters occupy the same space as han characters.

		H	V
phrase	，
	、
	：
	；
sentence	。
sentence	．
exclamation	！
question	？

、 and 。 are the norm for vertical text, however two alternative conventions as applied to horizontal text: especially in books that mix Japanese and western text, such as books on science and technology, the former may be replaced by ， and ．. Often, however, the ideographic full stop is retained, since it is more visible and looks better (this convention has been adopted for Japanese official publications).j,#differences_in_vertical_and_horizontal_composition_in_use_of_punctuation_marks

As the table shows, these punctuation marks require dedicated glyphs in the font, and cannot be achieved by simply rotating the glyph.

Japanese also uses the following doubled exclamation/question marks. They remain upright in vertical text.

‼
⁇
⁈
⁉

Other punctuation used to separate phrases or items includes:

	H	V
⸺
——

If EM DASH characters are used, they are used in pairs.

Bracketed text

For general parentheses and bracketing in text, Japanese uses:

		H	V
（	）
［	］		-
〔	〕	-

〔 and its closing partner are the vertical equivalent of ［, which is used in horizontal text.

Although there are a number of other bracket characters (listed just below), they are less commonly used.

【,】,〖,〗,｛,｝

Quotations & citations

Japanese uses different quote marks for horizontal and vertical writing. The default quote marks are:

		H	V
“	”		-
「	」
〝	〟	-

When an additional quote is embedded within the first, the quote marks are:

		H	V
‘	’		-
『	』	-

Emphasis

Japanese sometimes uses katakana characters to create visual emphasis. uk,720

もうダメだ moː dame da it's too late!

Japanese Layout Requirements lists the following ways of showing emphasis in Japanese.

Select a different typeface (eg. a Gothic font in Mincho text).
Use 「 and 」 or 〈 and 〉.
Change the colour.
Underline. (See also text_decoration)
Boten marks (also known as emphasis marks).

Note that this list doesn't include italicisation or bolding of text. (1) and (2) are popular approaches. (5) is not as common, but is a traditional approach with some value attached.

Different boten marks are used in horizontal and vertical text. Typically, bullets are used above characters in horizontal text, and sesame dots are used to the right of characters in vertical text.j,#composition_of_emphasis_dots

The boten mark is centre-aligned with the base characters in horizontal text, and middle-aligned in vertical text, and doesn't normally appear alongside full stops, commas, or brackets.j,#composition_of_emphasis_dots

Horizontal text and boten marks. — Boten marks used for emphasis in horizontal and vertical text. (source)

Embedded text in other languages would have boten marks displayed on the same side as for Japanese.

Abbreviation, ellipsis & repetition

tbd

Abbreviation

ヶ,〆,ゟ,ヿ,〼

Japanese has a number of logograms used as abbreviations.

ヶ is a reasonably common shorthand for the character 箇, which is a counter for months, places or provisions. It is pronounced ka or ko, and is not related to the larger kana ケ, which is pronounced ke. See also @Wikipedia,https://en.wikipedia.org/wiki/Small_ke.

三ヶ月

３ヶ

〆 is primarily used as a short form of ʃime from the verb 閉める. For example, it can be used as follows in place of the word 締切.@Wiktionary,https://en.wiktionary.org/wiki/〆#Japanese

〆切

ゟ is a ligature of the word より.

ヿ is a shorthand for the word 事.

〼 is derived from a semi-pictogram for a small wooden measuring box called masu. It then moved on to represent a shorthand for the grammatical ending for the present tense verb, which has the same sound.

Repetition

々,〻,ゝ,ゞ,ヽ,ヾ

Japanese has a number of iteration marks that repeat the previous syllable or word. The repeated sound may differ slightly due to rendaku sound changes.

々 and 〻 are used to repeat kanji characters; the former for horizontal text, and the latter (rare) for vertical text.

人々島〻

There are separate marks for hiragana and katakana, and within that division there are separate marks for syllables that begin with a voiced stop. ゝ and ゞ are used to denote hiragana ordinary and voiced stop repetitions, respectively. The katakana equivalents are ヽ and ヾ.

かゝしたゞしバナヽ

It is not common, but it is possible to find horizontal text that repeats the iteration mark in order to repeat multiple characters.

馬鹿々々しい

See also @Wikipedia,https://en.wikipedia.org/wiki/Iteration_mark#Japanese.

〱,〲,〳,〴,〵

Japanese also has a set of graphemes to indicate repetition of multiple characters, although they are mostly obsolete these days.@Wikipedia,https://en.wikipedia.org/wiki/Iteration_mark#Japanese They are only used in vertical text, and they take up 2 character spaces.

The Unicode Standard also provides half forms, which can be combined to span the 2 character distance.

いろ〳〵 — Iterators for multiple syllables in vertical text. The last item uses 2 half glyphs.

離れ〲 — Iterators for multiple syllables in vertical text. The last item uses 2 half glyphs.

Inline notes & annotations

Japanese has a few ways of representing inline notes and annotations.

Ruby

Various ways of arranging inter-linear annotations alongside text fall under the rubrique of ruby (named from the British print size originally used for the annotations). These include mono-ruby, jukugo-ruby, and group-ruby, and they are described in detail below.

Ruby is commonly used to indicate the pronunciation of ideographic characters used in Japanese, as it cannot usually be guessed and so can pose difficulties for those learning the language. For these cases, mono-ruby is most commonly used, however a variant, jukugo-ruby, is sometimes applied to compound nouns (which are called jukugo in Japanese).

Where sequences of kanji characters do not have the same pronunciation as the sum of their parts (called jukuji), such as the following two words, group ruby is used to represent the sound.j,#h-note-109

Click on the words to see their composition.

田舎今日

Ruby annotations are also used to provide brief indications of the meaning of words or characters. These annotations typically use the group-ruby approach. The most typical example of this is attaching ruby text to a kanji compound word to indicate a corresponding loan word in katakana (see fig_group_ruby).j,#id221 Group ruby is also used to indicate the reading or the meaning of a Western word used in base text, or where a synonymous Western word in Latin characters is attached as a ruby annotation to a Japanese word (see Figure 112).

The rest of this section describes features that are generally common to all forms of ruby, before we move on to examine the differences in following subsections.

All annotations appear within the standard inter-line space for the page, and don't create extra line height if they only appear on a single line. The inter-line space is usually set at an appropriate size to accommodate annotations.

Unlike Chinese, it is common to find annotations applied just to specific words, rather than annotating the whole text.

Ruby annotations normally appear above horizontal lines of text, and to the right of vertical lines. Occasionally, both phonetic and semantic annotations are applied to the same base text, in which case the annotations appear on both sides of the base. A typical scenario in these cases would be to have mono-ruby above/right of the base, and group-ruby below/left.j,#choice_of_sides_for_ruby_with_respect_to_base_characters

The character frame of kana annotations is usually half that of the base character. Occasionally, annotations are compressed in one direction (depending of direction of writing) so that 3 fit over a single base character.j,#fig2_3_10 In large text (12pt or more), such as headings, the size of the annotation may be less that half that of the base.j,#fig2_3_11

Mono-ruby

Usually applied to kanji base characters, each base character is associated individually with an annotation.

Annotations are normally centred over the base character in horizontal text, and with the middle of the base character in vertical text. (called nakatsuki). An alternative, used only in vertical text, is to align the annotation with the top of the character frame of the base character (katatsuki)j,#id227, as in the righthand example in fig_jukugo.

推/すい/理/り/小/しょう/説/せつ (detective novel) — Mono-ruby for detective novel, in horizontal and vertical text (colouring added for illustrative purposes).

推/すい/理/り/小/しょう/説/せつ (detective novel) vertical — Mono-ruby for detective novel, in horizontal and vertical text (colouring added for illustrative purposes).

Since the annotation characters are usually 1/2 the size of the base characters, 3-character annotations require more space that the underlying kanji. Internally to the sequence, this will produce a gap between the base characters, since annotations cannot overlap (see fig_mono_ruby).

At either end of the sequence, either a gap is opened up between the base character with the long annotation and its neighbour (see fig_overhang), or the annotation may overhang the neighbouring base characters. Simpler implementations produce gaps, but allow annotations to overhang any blank parts of adjacent fullwidth punctuation characters. More sophisticated applications may allow overlap of kana or other characters, though never kanjij,#232, but may also have to deal more complicated algorithms, such as balancing space on either side of the ruby sequence, or deciding what can and cannot be overlapped, and to what extent.j,#id229 j,#adjustments_of_ruby_with_length_longer_than_that_of_the_base_characters

Alternative ways of dealing with potential overhang either side of the ruby sequence.

At line start or line end, long annotations do not protrude past the line edge – meaning that there will be a gap between the base character and the line edge.

Gaps produced at line end and line start by wide annotations.

Lines can be broken in the middle of a sequence of mono-ruby annotations, since an associated base and annotation are kept together.

Group ruby

Applies when the base is a sequence of characters, mapped to a single annotation. The base can be a sequence of either kanji or other characters, as can be the annotation.

When the annotation is shorter than the base, and the annotation is composed of kana or kanji characters, they are typically spread out with two units of equal spacing between each character and one at either end. The end space should never exceed half the width of a base character.j,#positioning_of_groupruby_with_respect_to_base_characters

When the base is shorter than the annotation, the inverse applies.

顧客/クライアント — 模型 mokei model and 顧客 kokjaku client with katakana group-ruby annotations indicating loan word alternatives.

模型/モデル — 模型 mokei model and 顧客 kokjaku client with katakana group-ruby annotations indicating loan word alternatives.

If the annotation or the base is not kanji or kana, the text is set solid and centred relative to the other component (see fig_latin_ruby).

編集者/editor — Group-ruby involving non-Japanese text.

editor/エデイター — Group-ruby involving non-Japanese text.

Overhang behaviour is the same as described for mono-ruby, as is the handling at line ends when the annotation is longer than the base.

Unlike a sequence of mono-ruby, there is no line-break opportunity inside a group-ruby.

Jukugo-ruby

Where compound nouns (jukugo) occur, special rules for arrangement of annotation characters (so-called jukugo-ruby) can make it appear that they are evenly distributed across the word (see the lefthand example in fig_jukugo), but there are rules about how much and what type of overhang are allowed, which sometimes lead to gaps (see the righthand example of fig_jukugo).

橋頭堡/きようとうほ (beachhead) — Two examples of distributed annotations in jukugo-ruby. On the right, a gap appears in the annotation because of the rules about overhang.

思春期/ししゅんき (puberty) — Two examples of distributed annotations in jukugo-ruby. On the right, a gap appears in the annotation because of the rules about overhang.

An important feature of jukugo-ruby is that where the full compound noun doesn't fit at the end of a line the base characters wrap one-by-one in the normal way, taking with them the appropriate annotations. The annotation for a single base character is never split across a line break.

It is up to the author whether a word that is actually a sequence or 2 compound nouns is treated as a single jukugo ruby, or as two separate ones.

There are numerous options for overhang and arrangement of jukugo-ruby annotations. They are discussed in detail in JLReq.

Inline ruby

Where text sizes are too small for ruby characters to be easily read, the ruby annotation is typically rendered after the base text, in parentheses.

Inline annotations should normally correspond to full words, even if the sequence of base characters would otherwise be represented using mono-ruby. For example, the inline representation of the word 東京 should be displayed inline as 東京（とうきょう） and not 東（とう）京（きょう）

Warichu

Warichu is a method of adding notes right alongside the relevant text, used particularly in study guides, travel guides, reference books, encyclopedias and manuals. It is generally only used in vertical text, although it is occasionally used in horizontal text for study guides and encyclopedias.

The note is usually surrounded by parentheses (or rarely just spaces), and the text of the note is half the size of the main text and arranged in two parallel lines. The two parallel lines are usually set with no inter-line spacing.

The warichu lines should be as close to equal in length as possible, given the normal wrapping rules, and if there is a difference, the initial line (right side) should be the longer.

In the rare event that the warichu text breaks across more than one line (see fig_warichu on the right), both lines of the warichu on the first line of the main text should be read completely before continuing to the remainder of the note. The characters in memory follow the normal reading sequence (and use normal characters, too), but the application needs to rearrange the visual order around the line break.

Text decoration & other inline features

Text decoration

Underlines may be used to emphasise words or phrases. (Emphasis can also be indicated in other ways, such as using dots alongside the line – see Emphasis. This section focuses on the practical mechanics of underlining.)

When lines or other text decoration are used, they normally appear below horizontal text, and to the right of vertical text (unlike Chinese, where the line is to the left of vertical text).

The line should occur immediately outside the boundary of the character box.

If a line of Japanese text contains some text in another language and orthography, the position of any text decoration should follow the Japanese conventions.

Other punctuation

Observation: This section needs to be edited. Some punctuation marks should be discussed in other sections, and more explanations are needed for those items that remain.

CLDR 31 lists the following punctuation characters for Japanese. First the fullwidth forms of normal characters.

－,，,、,；,：,！,？,．,＇,＂,（,）,［,］,｛,｝,＠,＊,／,＼,＆,＃,％

Then the halfwidth forms.

･,、,､,。,｡,｢,｣

And finally, the other punctuation.

-,‐,—,―,〜,,,、,;,:,!,?,.,‥,…,。,‘,’,",“,”,(,),[,],{,},〈,〉,《,》,「,」,『,』,【,】,〔,〕,‖,§,¶,@,*,/,\,&,#,%,‰,†,‡,′,″,〃,※

The katakana block contains two additional punctuation marks.

・,゠

・ is used to separate words when writing non-Japanese phrases.uk,720

゠ is a delimiter occasionally used in analyzed Katakana or Hiragana textual material.

The hiragana block contains some combining and modifier characters used to represent dakuten and han-dakuten for compatibility with older systems.

゙,゚,゛,゜

The kana blocks each have two marks that are used to indicate repetition of a syllable – one for syllables with unvoiced consonants and another for voiced. The table below shows the hiragana first, then the katakana. In both cases there is a character for repetition of ordinary syllables, and one for repetition of syllables with dakuten.

ゝ,ゞ,ヽ,ヾ

Unicode also has 3000 for occasions where it is needed.

Line & paragraph layout

Line breaking & hyphenation

Lines are normally wrapped between characters – word boundaries usually have no significance for the wrapping. However, occasionally there is a preference to wrap text at word boundaries, eg. to better balance headings.

Line start/end rules

Kanji characters have the ID line-break property, which means that lines can ordinarily break before and after and between pairs of ideographic characters. Note that this class also includes characters other than Han ideographs.

Kinsoku rules. Japanese should also take into account a few rules (called kinsoku rules) which dictate what characters cannot appear at the end or start of a line. The set of characters affected by these rules varies slightly from application to application, but fig_kinsoku_start and fig_kinsoku_start show examples of the kinds of punctuation involved.

’ ” ）》〉】〗〕］｝
。．、，・：；？！ヽヾゝゞ々 % º

Characters typically not permitted at the start of a line.

‘ “ （［〔《〈【｛
¥ $ £

Characters typically not permitted at the end of a line.

Show (default) line-breaking properties for non-kanji characters in the Japanese orthography described here.

There are a number of ways to handle these characters:

Wrap the previous character to the next line with the punctuation.
Leave the punctuation character protruding into the margin (if there is one).
Ignore the kinsoku rules.

Where a gap appears at the end of a line, full justification is usually restored by adding space across the line (see justification).

Small kana. These kinsoku rules may also be used to prevent small kana characters appearing alone at the start of a line. However, this is much more likely to reflect the preferences of the author. For example, the rule may be ignored in narrow newspaper columns.

人間は、理性と良心とを授けられており互いに同胞の精神をもって行動しなければならない — Kinsoku rules used to prevent a small kana character appearing alone at the start of a line.

Hyphenation

There is no hyphenation at line-breaks for Japanese text.

Text alignment & justification

The preferred arrangement of characters on a line is solid set, ie. each character frame immediately follows the previous one, each with the same width. In principle, in books where the width of the text area on a page is set by counting characters and fixed, paragraphs composed of kanji and kana characters don't need to be justified. Lines break as soon as the line is full of characters, and the whole paragraph has grid lines vertically and horizontally between the characters.

However, a number of factors may introduce a need to introduce justification, from time to time. One such would be punctuation that pulls the last character of the previous line with it to the next line, so that it doesn't begin a line on its own. Another would be web-based text where windows can be stretched, resulting in a situation where the width of a line no longer exactly corresponds to the sum of the width of all the characters on that line. Other situations include lines where proportionally-spaced romaji text breaks the grid effect.

Japanese justifies text using a complex set of rules which adjust the space between characters on a line. Some characters are adjusted before others. Typically in character-based justification, rules are applied to different types of character in successive waves. For example, the algorithm may attempt to reduce the spacing around punctuation first, and only when more adjustment is needed turn to adjusting the spacing between ideographs.

In situations where a set of lines each contains self-contained text, the line content may be stretched to fit the line width, for example in table cells. In this case it is typical to set the first and last characters at the line start and end, respectively, and then apply equal amounts of spacing between all remaining characters. This can result in large gaps, including lines where the two characters are arranged at opposite line ends with nothing between. See fig_distributed_spacing.

Distributed spacing example. — Evenly distributed spaces across a line in a table. Source`j,#id25`

Paragraph alignment

It is common for the start of a paragraph in Japanese text to be indicated by indenting the first line, rather than adding inter-paragraph leading. The indentation is generally one full character width, although there are complications when a line begins with certain characters, such as brackets or parentheses.

Text spacing

This section looks at ways in which spacing is applied between characters over and above that which is introduced during basic justification. That said, the text spacing techniques described here may also be used or folded in when creating a fully justified paragraph.

Inter-character letter-spacing

Letter-spacing is used to achieve balance between items with large and small numbers of characters, such as headings, running heads, and captions. When expanding text, equal amounts of space are added between the character frames of the item with the smaller number of characters.

Examples of headings where letter spacing has been applied.

Reducing inter-character spacing. Although solid set text is normally best for readability, in large print sizes, such as for magazine headings, it may be desirable to reduce the distance between certain characters. This is typically done by reducing the distance between adjacent letter faces.

Sometimes, text may also be kerned by overlapping the character frames by a regular amount across a whole line.

To create inter-character spacing in HTML use the following CSS styling, with the appropriate value.

See live code.

letter-spacing: 0.1em;

A negative value can be used to reduce the inter-character spacing.

Spacing around alphabetic or numeric phrases

When a run of romaji or ASCII numerals appears in text, it is often set off from the surrounding kanji/kana letters by a small gap.

The amount of spacing can vary. JLREQj,#id209 suggests a ¼em space, but sometimes other spaces may be appropriate, such as ⅙em.gh

A Japanese paragraph where text-spacing has and has not been applied. (source)

Such spacing is not needed when the phrase is followed or preceded by punctuation that already has built in space. It also doesn't appear at the line start/end.

To achieve this in web pages use the text-autospace CSS property in your style sheet; don't use space characters. For full details of the options available see the CSS spec.

By default, the browser should insert a gap automatically between runs of ideographs and runs of both non-ideographic letters/numerals. The size of the gap is dependent on the browser, however the CSS spec suggests 1/8 of the width of an ideographic character.

There are 2 ways in which CSS can add gaps. If the text already contains gaps produced using ordinary space characters the CSS will, by default, only add gaps where there are no spaces. If, on the other hand, you want to reduce the width of the those space-based gaps, or apply even spacing throughout, then use the replace value. text-autospace:ideograph-alpha ideograph-numeric replace; will remove any space characters and replace them with a standard width gap, while also creating gaps where the space character hadn't been used.

To remove all synthesised gaps (but leave any manually-typed space characters in place) use text-autospace:no-autospace.

The other values can be used to tweak the results as follows.

normal (Not yet supported by browsers.)§: Restores the default setting of the browser, ie. it creates a gap between runs of ideographs and runs of non-ideographic letters or numerals., but only inserts a gap where an ordinary space has not been used.
ideograph-alpha or ideograph-numeric (Not yet supported by browsers.)§: These 2 values can be used individually to exclude the other option..
auto (Not yet supported by browsers.)§: Gives over control of autospace behaviour to the browser, assuming that the browser has implemented special rules that differ from the normal case. Note that if you use this value your text may look different from browser to browser.

Most of the time you will probably want to use the following:
text-autospace: ideograph-alpha ideograph-numeric replace;

Spacing around punctuation

Punctuation such as full stop, comma, parentheses, etc. normally has built-in space associated with it because the ink takes up only a part of the em square. However, in some situations, the blank space is not appropriate.

When text is arranged on a strict grid pattern, none of this space removal applies.

Sequences of punctuation. One such example is when multiple punctuation marks appear side by side. fig_text_space_adjacent shows how space can be removed between a fullwidth comma and fullwidth bracket to reduce large blank spaces. It shouldn't be necessary to use halfwidth characters for this; you should use normal characters and the application should remove the appropriate amount of space automatically.

Space has been removed on the left to make the text more readable.

It is not yet possible to control this in web pages, but the CSS Text spec proposes a way forward using the text-spacing CSS property^§. The relevant property values are:

trim-adjacent (Not yet supported by browsers.): Collapses spacing between punctuation glyphs.
space-adjacent (Not yet supported by browsers.): Keeps the space before fullwidth opening punctuation when not at the start of the line. Keeps the space after fullwidth closing punctuation when not at the end of the line.

Eg. the following collapses spaces between punctuation marks:
text-spacing: trim-adjacent;

Line-initial punctuation. Similarly, space may be removed from punctuation at the start or the end of a line. If we use a bracket as an example, the ink of the bracket should be flush with the line start when that bracket occurs inside a paragraph. Where paragraphs are separated by a blank line, the bracket at the start of the first line should also be flush with the left edge of the text.

It is common, however, to have no blank line before a Japanese paragraph, but instead indent the paragraph's first line. Usually this indent is the width of one fullwidth character. If the line begins with a punctuation such as a bracket, the empty space that usually precedes a fullwidth bracket is still dropped, but the line is set so that the glyph hangs into the indent (which, visually, looks like it is preceded by a half-width space). fig_text_space_para shows examples of this.

Space is removed from a bracket at the start of a line, but not at the beginning of a paragraph. (source)

It is not yet possible to achieve this in web pages, but the CSS Text spec proposes a way forward using the text-spacing CSS property^§.

A typical way of setting styling for indented paragraphs would therefore include something like this

p {
  margin: 0;
  text-indent: 1em;
  text-spacing: trim-start;
  hanging-punctuation: first;
  }

The relevant property values are:

text-space: trim-start (Not yet supported by browsers.): Sets fullwidth opening punctuation flush (ie. removes the leading space from fullwidth glyphs) at the start of each line.
text-space: space-start (Not yet supported by browsers.): Keeps the space before all fullwidth opening punctuation at the beginning of every line (ie. full-width glyphs).
text-indent: 1em: Indents the first line of a paragraph by 1 character space.
hanging-punctuation: first (Not yet supported by browsers.): An opening bracket or quote at the start of the first formatted line of an element hangs. This applies to all characters in the Unicode categories Ps, Pf, Pi plus the ASCII quote marks ' U+0027 APOSTROPHE and " U+0022 QUOTATION MARK.

In some cases, the paragraph-start line indentation has been achieved by adding a fullwidth bracket at the start of the paragraph (rather than indentation), while removing the leading space from other brackets in the paragraph. Indentations for lines that don't begin with a bracket-like punctuation will typically use an ideographic space character rather than styling to create the indent (because line indentation doesn't behave differently depending on whether a line starts with a bracket). This approach is not recommended, because it impedes the ability of authors to change behaviour simply through changing the styling, but to provide a workaround for legacy text in this situation, CSS proposes another value:

space-first (Not yet supported by browsers, and not recommended for normal use.): Behaves as space-start on the first line the block container and each line after a forced line break but as trim-start on all other lines.

Line-final punctuation. It is often useful to remove trailing space from a fullwidth punctuation glyph if it allows that character to fit at the end of a line (rather than wrapping it to the next line).

Again, it is not yet possible to achieve this in web pages, but the CSS Text spec proposes a way forward using the text-spacing CSS property^§. The relevant property values are:

allow-end (Not yet supported by browsers.): Removes trailing space from fullwidth closing punctuation at the end of each line if it does not otherwise fit prior to justification; otherwise set the punctuation with full-width glyphs. .
trim-end (Not yet supported by browsers.): Removes trailing space from fullwidth closing punctuation at the end of every line.
space-end (Not yet supported by browsers.): Keeps the space after all fullwidth closing punctuation at the end of every line (ie. full-width glyphs).

Baselines, line height, etc.

The standard baseline for kanji and kana characters is slightly lower than the alphabetic baseline used for Latin characters. Mixed script text needs to align baselines correctly.

fig_baselines shows metrics for the Hiragino Mincho Pro font. In this font the maximum height of the Japanese letters reaches slightly higher than the Latin ascenders, but not as low as the Latin descenders.

qhx国的性もと — Font metrics for text in the Hiragino Mincho Pro font.

Japanese characters have no ascenders or descenders, but occupy the square space described earlier. Some characters use more of the square space than others, as can be seen in fig_baselines.

Counters, lists, etc.

You can experiment with counter styles using the Counter styles converter. Patterns for using these styles in CSS can be found in Ready-made Counter Styles, and we use the names of those patterns here to refer to the various styles.

Japanese uses kanji, hiragana, and katakana numbers, and european digits for list counters. In modern text, european digits appear to be becoming more common.@,https://github.com/w3c/i18n-drafts/issues/621#issuecomment-2502734413

Kanji or kana characters are commonly used to create 1 fixed, 4 alphabetic, and 2 additive styles.

Vertical counters

Vertically-set lists have a number of interesting layout possibilities. Much of the following information about that comes from Toshi Kobayashi@,https://github.com/w3c/i18n-drafts/issues/621#issuecomment-2502734413, and Taro Yamamoto@,https://github.com/w3c/i18n-drafts/issues/620 in the W3C Japanese Layout task force.

Vertically-set lists using kanji typically have 、 below. If multiple kanji characters are required, the counter is likely to extend downwards, pushing the start of the actual text lower.

Vertically-set texts using european digits may have parentheses either side, may be encircled, or may be followed by an ASCII full stop (not a Han full stop) on the same line. The full stop may also appear below the number, but full stops are no longer very common in counters. Another common pattern is to have parentheses above and below the digit(s). The digits are usually upright.

The list itself tends to be indented, and the counter may also have an additional indentation. Also, there is typically a gap after (below) the counter.

For examples of these patterns, along with guidance on how to create them in CSS, see the article How to make list markers stand upright in vertical text.

Fixed

Fixed counter styles have a finite length because they are based on a limited set of Unicode characters. These counters are upright by default, whereas counters made from ASCII digits need to be rotated so that they are upright. Some of the more common include full-width European numbers, which in vertical text stand upright. Unicode has various sets of numbers that can be useful here.

For the circled-decimal numeric style Unicode provides characters from 1 to 50.

⓪,①,②,③,④,⑤,⑥,⑦,⑧,⑨,⑩,⑪,⑫,⑬,⑭,⑮,⑯,⑰,⑱,⑲,⑳,㉑,㉒,㉓,㉔,㉕,㉖,㉗,㉘,㉙,㉚,㉛,㉜,㉝,㉞,㉟,㊱,㊲,㊳,㊴,㊵,㊶,㊷,㊸,㊹,㊺,㊻,㊼,㊽,㊾,㊿

For the dotted-decimal numeric style Unicode provides precomposed characters from 1 to 20.

⒈,⒉,⒊,⒋,⒌,⒍,⒎,⒏,⒐,⒑,⒒,⒓,⒔,⒕,⒖,⒗,⒘,⒙,⒚,⒛

The circled-katakana fixed style uses the following letters. The suffix is a space, and the numbers run from 1 to 47.

㋐,㋑,㋒,㋓,㋔,㋕,㋖,㋗,㋘,㋙,㋚,㋛,㋜,㋝,㋞,㋟,㋠,㋡,㋢,㋣,㋤,㋥,㋦,㋧,㋨,㋩,㋪,㋫,㋬,㋭,㋮,㋯,㋰,㋱,㋲,㋳,㋴,㋵,㋶,㋷,㋸,㋹,㋺,㋻,㋼,㋽,㋾

Alphabetic

The alphabetic styles all use 、 as a suffix (with no following space).

The hiragana alphabetic style uses these 48 hiragana characters in the order in which they are typically arranged.

あ,い,う,え,お,か,き,く,け,こ,さ,し,す,せ,そ,た,ち,つ,て,と,な,に,ぬ,ね,の,は,ひ,ふ,へ,ほ,ま,み,む,め,も,や,ゆ,よ,ら,り,る,れ,ろ,わ,ゐ,ゑ,を,ん

Examples:

あ,い,う,え,さ,に,む,わ,いそ,えほ,かゐ,けし

The hiragana-iroha alphabetic style uses 47 hiragana characters in the order shown just below. The iroha style ordering is based on the order of characters in a pangram poem dating from the Heian era (794–1179).

い,ろ,は,に,ほ,へ,と,ち,り,ぬ,る,を,わ,か,よ,た,れ,そ,つ,ね,な,ら,む,う,ゐ,の,お,く,や,ま,け,ふ,こ,え,て,あ,さ,き,ゆ,め,み,し,ゑ,ひ,も,せ,す

Examples:

い,ろ,は,に,る,ら,こ,ひ,ろれ,にえ,とに,りな

The katakana alphabetic style uses these 48 hiragana characters in the order in which they are typically arranged.

ア,イ,ウ,エ,オ,カ,キ,ク,ケ,コ,サ,シ,ス,セ,ソ,タ,チ,ツ,テ,ト,ナ,ニ,ヌ,ネ,ノ,ハ,ヒ,フ,ヘ,ホ,マ,ミ,ム,メ,モ,ヤ,ユ,ヨ,ラ,リ,ル,レ,ロ,ワ,ヰ,ヱ,ヲ,ン

Examples:

ア,イ,ウ,エ,サ,ニ,ム,ワ,イソ,エホ,カヰ,ケシ

The katakana-iroha alphabetic style uses 47 hiragana characters in the order shown just below.

イ,ロ,ハ,ニ,ホ,ヘ,ト,チ,リ,ヌ,ル,ヲ,ワ,カ,ヨ,タ,レ,ソ,ツ,ネ,ナ,ラ,ム,ウ,ヰ,ノ,オ,ク,ヤ,マ,ケ,フ,コ,エ,テ,ア,サ,キ,ユ,メ,ミ,シ,ヱ,ヒ,モ,セ,ス

Examples:

イ,ロ,ハ,ニ,ル,ラ,コ,ヒ,ロレ,ニエ,トニ,リナ

Additive

The Japanese additive styles have a range -9,999 to 9,999 and use kanji characters only. The suffix is 、, and negative numbers are preceded by マイナス.

The japanese-informal additive style uses these characters.

九千,八千,七千,六千,五千,四千,三千,二千,千,九百,八百,七百,六百,五百,四百,三百,二百,百,九十,八十,七十,六十,五十,四十,三十,二十,十,九,八,七,六,五,四,三,二,一,〇

Examples:

一,二,三,四,十一,二十二,三十三,四十四,百十一,二百二十二,三百三十三,四百四十四

The japanese-formal additive style uses these characters.

九阡,八阡,七阡,六阡,伍阡,四阡,参阡,弐阡,壱阡,九百,八百,七百,六百,伍百,四百,参百,弐百,壱百,九拾,八拾,七拾,六拾,伍拾,四拾,参拾,弐拾,壱拾,九,八,七,六,伍,四,参,弐,壱,零

Examples:

壱,弐,参,四,壱拾壱,弐拾弐,参拾参,四拾四,壱百壱拾壱,弐百弐拾弐,参百参拾参,四百四拾四

Styling initials

Large paragraph-initial characters can easily be found in Japanese content. The character typically fills a box that is the height (or width, in vertically-set text) of 2-4 lines.

An enlarged initial character at the beginning of a paragraph.

Notes, footnotes, etc

See inlinenotes for purely inline annotations, such as ruby or warichu. This section is about annotation systems that separate the reference marks and the content of the notes.

※ can be used in text to set up a footnote reference, and in the footnotes themselves. It can be followed by a number when there are multiple notes, eg. ※1, ※2, etc.

Wikipedia provides the following example.

・・・、動物の性にはオスとメスがある※。・・・\・・・。\※ただし両性の動物も存在する。 — Footnote reference mark.

	labial	alveolar	post- alveolar	palatal	velar	uvular	glottal
stop	p b	t d			k ɡ
affricate		t͡s d͡z	t͡ɕ d͡ʑ
fricative	ɸ	s z	ɕ ʑ	ç			h
nasal	m	n		ɲ	ŋ	ɴ
approximant	w			j
trill/flap		r

Japanese

Sample

Usage & history

Basic features

Character index

Letters

Kana

Modifier letters

CJK compatibility characters

Half-width katakana

Other

Combining marks

Punctuation

ASCII

Numbers

Symbols

Other

Phonology

Vowel sounds

Plain vowels

Consonant sounds

Tone

Structure

Characters

Kanji characters

CJK compatibility characters

Kana syllabaries

Long vowels

Particles

Small kana

Yotsugana

Archaic characters

Halfwidth katakana

Text direction

Glyph shaping & positioning

Font styles

Context-based shaping & positioning

Transforming characters

Typographic units

Word boundaries

Graphemes

Codepoint order

Punctuation & inline features

Phrase & section boundaries

Bracketed text

Quotations & citations

Emphasis

Abbreviation, ellipsis & repetition

Abbreviation

Repetition

Inline notes & annotations

Ruby

Mono-ruby

Group ruby

Jukugo-ruby

Inline ruby

Warichu

Text decoration & other inline features

Text decoration

Other punctuation

Line & paragraph layout

Line breaking & hyphenation

Line start/end rules

Hyphenation

Text alignment & justification

Paragraph alignment

Text spacing

Inter-character letter-spacing

Spacing around alphabetic or numeric phrases

Spacing around punctuation

Baselines, line height, etc.

Counters, lists, etc.

Vertical counters

Fixed

Alphabetic

Additive

Styling initials

Page & book layout

General page layout & progression

Book binding & reading direction