Tibetan orthography notes

Sample

Select part of this sample text to show a list of characters, with links to more details.
Change size: 24px

དོན་ཚན་དང་པོ། འགྲོ་བ་མིའི་རིགས་རྒྱུད་ཡོངས་ལ་སྐྱེས་ཙམ་ཉིད་ནས་ཆེ་མཐོངས་དང༌། ཐོབ་ཐངགི་རང་དབང་འདྲ་མཉམ་དུ་ཡོད་ལ། ཁོང་ཚོར་རང་བྱུང་གི་བློ་རྩལ་དང་བསམ་ཚུལ་བཟང་པོ་འདོན་པའི་འོས་བབས་ཀྱང་ཡོད། དེ་བཞིན་ཕན་ཚུན་གཅིག་གིས་གཅིག་ལ་བུ་སྤུན་གྱི་འདུ་ཤེས་འཛིན་པའི་བྱ་སྤྱོད་ཀྱང་ལག་ལེན་བསྟར་དགོས་པ་ཡིན༎

དོན་ཚན་གཉིས་པ། སྐྱེ་བོ་རེ་རེར་གསལ་བསྒྲགས་འདི་ནང་བཀོད་པའི་ཐོབ་ཐང་དང་རང་དབང་སྟེ། མི་རིགས་དང། ཤ་མདོག། ཕོ་མོ། སྐད་ཡིག། ཆོས་ལུགས། སྲིད་དོན་བཅས་སམ། འདོད་ཚུལ་གཞནདག་དང༌། རྒྱལ་ཁབ་དང་སྤྱི་ཚོགས་ཀྱི་འབྱུང་ཁུངས་། མཁར་དབང༌། རིགས་རྒྱུད། དེ་མིན་གནས་ཚུལ་འདི་རིགས་གང་ཡང་རུང་བར་དབྱེ་འབྱེད་མེད པའི་ཐོབ་དབང་ཡོད༎ ད་དུང་རྒྱལ་ཁབ་བམ། ས་གནས་གང་ཞིག་རང་བཙན་ཡིན་པ་དང༌། ལྟ་རྟོགས་འོག་ཏུ་གནས་པ། རང་གཞུང་རང་སྐྱོང་མ་ཡིན་པ། གཞན་དག་བདག་དབང་ཚད་འཛིན་ཡོད་པ་བཅས་ཇི་ལྟར་ཡང་དེ་དག་གི་སྐྱེ་བོ་གང་ཞིག་སྐིད་དོན་དང། ཁྲིམས་དབང། རྒྱལ་སྤྱིའི་གནས་སྟངས་བཅས་ཀྱི་ཐོག་ཏུ་ཁྱད་པརམི་དབྱེ་བ་ཡིན༎

Source: Unicode UDHR, articles 1 & 2

Usage & history

Origins of the Tibetan script, 6thC – today.

Phoenician

└ Aramaic

└ Brahmi

└ Gupta

└ Tibetan

+ Sharada

+ Siddham

+ Kalinga

+ Bhaiksuki

The Tibetan script is used for writing the Tibetan, Dzongkha, Ladakhi and Sikkimese languages, spoken in Tibet, Bhutan, Nepal and India. It is also used for transcribing religious Sanskrit texts. Language speakers number around 6,000,000.

བོད་སྐད

ལྷ་སའི་སྐད

བོད་ཡིག

While the exact origin of the script is not clear (other than that is a derivative of the Brahmi script), tradition says that it was developed by Thonmi Sambhota after a visit to India in the mid-7th century to study the art of writing.

The creation of the Tibetan alphabet is attributed to Thonmi Sambhota of the mid-7th century. Tradition holds that Thonmi Sambhota, a minister of Songtsen Gampo (569-649), was sent to India to study the art of writing, and upon his return introduced the alphabet. The form of the letters is based on an Indic alphabet of that period.

More information: Wikipedia

Basic features

The Tibetan script is an abugida, ie. each consonant contains an inherent vowel sound. See the table to the right for a brief overview of features for the modern Tibetan orthography.

Tibetan can be written using two different styles: དབུ་ཅན dbu can with a head, the block style of the Tibetan script used in print, pronounced u.cen; and དབུ་མེད dbu med headless, the cursive style of the Tibetan script used in shorthand and calligraphy, pronounced u.me. This page concentrates on the former. Pronunciations are based on the central, Lhasa dialect.

Historically, Tibetan text was written on loose-leaf sheets called pechas, (དཔེ་ཆ). Some of the characters used and formatting approaches are different in books and pechas.

Tibetan text runs left to right in horizontal lines. Word boundaries are not indicated. However, Tibetan words are made up of one or more units called tsheg-bar which are roughly equivalent to phonological syllables. The tsheg-bar units are separated using ་ (tsheg). Line breaks occur after the tsheg, and never inside a tsheg-bar, and it is preferable to avoid breaking a line in the middle of the word.

❯ consonantSummary

The tsheg-bar units are composed of structural elements that include a root surrounded by vowel signs and consonants used as prefixes, subscripts, superscripts, suffixes, and secondary suffixes. A common scenario includes a stack and additional consonants to either side of the root consonant. These may indicate syllable-final consonant sounds, but more often than not they qualify or modify the root value, and are not associated with their nominal sound value. The actual pronunciation of Tibetan is usually much more simple than a typical romanisation would suggest. For example, the word གདུགས is transcribed as gdugs.

show composition

རྒྱུད

To write the sounds of the standard Lhasa dialect, Tibetan uses 28 consonant letters (plus 20 subjoined forms). Tibetan has 3 different code points for each of the stop and affricate sounds, to reflect combinations of tone and voicing. Two of the fricatives also have separate code points to indicate high and low tones. 6 more letters are used to write Sanskrit.

A distinguishing feature of Tibetan is the set of separate code points for the subjoined consonants in stacks, used to create consonant stacks. Of the 77 combining characters in the Tibetan block, 48 represent subjoined consonant forms. Unlike many other Indic scripts, the modern Tibetan orthography doesn't use a virama to create stacks.

❯ basicV

Tibetan is an abugida with one inherent vowel, pronounced a. When writing the Lhasa dialect, other post-consonant vowels are represented using 4 vowel signs, all combining marks. Some vowel sounds are written using a combination of vowel sign plus a root suffix.

There are no pre-base, circumgraph, or multipart vowels in the Tibetan used to write the Llasa dialect (though there are when writing in Sanskrit).

Standalone vowels are written by adding vowel signs to either འ [U+0F60 TIBETAN LETTER -A] or ཨ [U+0F68 TIBETAN LETTER A] depending on the tone.

Sanskrit vowels written in Tibetan use additional vowel signs and combining marks, some of which represent diphthongs, and some of which form circumgraphs or multipart characters, depending on the encoding.

Tone is indicated by the choice of root character and/or its associated prefixes and superscripts.

Modern Tibetan writing uses few punctuation marks or symbols, but the Tibetan script block in Unicode contains many of these.

Tibetan has its own set of numbers.

Character index

Letters

Show

Basic consonants

པ␣ཕ␣བ␣ཏ␣ཐ␣ད␣ཀ␣ཁ␣ག␣ཙ␣ཚ␣ཛ␣ཅ␣ཆ␣ཇ␣ས␣ཟ␣ཤ␣ཞ␣ཧ␣མ␣ན␣ཉ␣ང␣ཝ␣ར␣ལ␣ཡ

Vowels

འ␣ཨ

Other

ༀ

Extended consonants

ཪ␣ཊ␣ཋ␣ཌ␣ཎ␣ཥ

Other Tibetan block characters mentioned here

ྈ␣ྉ␣ྊ␣ྋ␣ྌ

ཫ␣ཬ␣གྷ␣ཌྷ␣དྷ␣བྷ␣ཛྷ␣ཀྵ

Combining marks

Show

Subjoined consonants

ྤ␣ྥ␣ྦ␣ྟ␣ྠ␣ྡ␣ྐ␣ྑ␣ྒ␣ྩ␣ྪ␣ྫ␣ྕ␣ྖ␣ྗ␣ྶ␣ྯ␣ྴ␣ྮ␣ྷ␣ྨ␣ྣ␣ྙ␣ྔ␣ྭ␣ྺ␣ྲ␣ླ␣ྱ␣ྻ

Vowels

ི␣ུ␣ེ␣ོ␣ྰ␣ྸ

Sanskrit additions mentioned here

ྀ␣ཻ␣ཽ␣ྼ␣ྺ␣ྻ␣ྚ␣ྛ␣ྜ␣ྞ␣ྵ␣ཿ␣ཾ␣ྃ␣྄␣༹

Other Tibetan block characters mentioned here

༵␣༷␣༾␣༿␣ཱ␣ཷ␣ཹ

ཱི␣ཱྀ␣ཱུ␣ཱི␣ཱྀ␣ཱུ␣ྲྀ␣ླྀ␣ཷ␣ཹ␣ྒྷ␣ྜྷ␣ྡྷ␣ྦྷ␣ྫྷ␣ྐྵ

Numbers

Show

༠␣༡␣༢␣༣␣༤␣༥␣༦␣༧␣༨␣༩

Other Tibetan block characters mentioned here

༪␣༫␣༬␣༭␣༮␣༯␣༰␣༱␣༲␣༳

Punctuation

Show

་␣༌␣།␣༎␣༈␣༄␣༅␣༼␣༽␣༑␣༔␣྅␣〈␣〉␣《␣》

Other Tibetan block characters mentioned here

༑␣༔␣྅

Symbols

Show

༁␣༂␣༃␣༴␣༶␣༸␣྾␣྿

Other

Show

To be investigated

!␣%␣(␣)␣,␣:␣;␣?␣[␣]␣§␣ʼ␣͏␣༆␣༇␣༉␣༊␣༏␣༐␣༒␣༓␣༺␣༻␣ྂ␣྇␣ྍ␣ྎ␣ྏ␣࿄␣࿅␣࿆␣࿇␣࿈␣࿉␣࿊␣࿋␣࿌␣࿐␣࿑␣࿒␣࿓␣࿔␣࿕␣࿖␣࿗␣࿘␣࿙␣࿚␣␣‌␣‍␣‑␣–␣—␣‘␣’␣“␣”␣†␣‡␣…␣‰␣′␣″␣⁠

Phonology

These sounds are for the modern Lhasa dialect of Tibetan.

Click on the sounds to reveal locations in this document where they are mentioned.

Phones in a lighter colour are non-native or allophones. Source Wikipedia.

Vowel sounds

Plain vowels

Complex vowels

ɑi ɑu

Consonant sounds

	labial	dental	alveolar	post- alveolar	retroflex	palatal	velar	glottal
stops	p b	t d			ʈ	c ɟ	k ɡ	ʔ
	pʰ	tʰ			ʈʰ	cʰ	kʰ
affricates		t͡s d͡z		t͡ɕ d͡ʑ	ʈ͡ʂ d͡ʐ
		t͡sʰ		t͡ɕʰ	ʈ͡ʂʰ
fricatives			s		ʂ	ɕ		h
nasals	m		n			ɲ	ŋ
other	w		l ɹ	ɹ̥ l̥		j ʎ

The sounds ʈ~ʈ͡ʂ and ʈʰ~ʈ͡ʂʰ appear to be alternative versions of the same phoneme. Wiktionary IPA transcriptions for Lhasa Tibetan use the affricate sound only.

Voiced phones occur when the syllable has a low tone. However, the dialect of the upper social strata in Lhasa does not use voiced stops and affricates.wl

Tone

Lhasa Tibetan is commonly said to have 2 tones, but in fact there are 4; the 'high' tone can have 2 distinct contours, as can the 'low' tone.

High tones	˥˥	á
High tones	˥˨	â
Low tones	˩˨	à
Low tones	˩˧˨	a᷈

Tones in Lhasa Tibetan.

Vowels

Vowel summary table

This table summarises only basic vowel to character assignments.

ⓘ represents the inherent vowel. Ⓢ represents one of 4 root suffix letters.

	post-consonant	standalone
	ི␣ུⓈ␣ ␣ུ	འི␣འུ␣␣ཨི␣ཨུ
	ེ␣ོⓈ␣ ␣ོ	འེ␣འོ␣␣ཨེ␣ཨོ
	ⓘⓈ
	ⓘ	འ␣ཨ

Inherent vowel

ཀ ka U+0F40 TIBETAN LETTER KA

a following a consonant is not written, but is seen as an inherent part of the consonant letter, so Ca is written by simply using the consonant letter, eg.

པ་ན་མ

Consonants with no following vowel

The inherent vowel is not normally pronounced at the end of a syllable which ends with a coda. These syllable-final sounds are written using one of 7 suffixes (see suffix).

Vowels after consonants

Post-consonant vowels are written using 4 vowel signs, all combining marks. Some vowel sounds are written using a combination of vowel sign plus a root suffix.

There are no pre-base, circumgraph, or multipart vowels in the Tibetan used to write the Llasa dialect (though there are when writing in Sanskrit).

None of the vowel signs are spacing marks, ie. they don't consume horizontal space when added to a base consonant.

All vowel signs are typed and stored after a consonant or stack. The glyph rendering system takes care of the positioning at display time. Stacks are treated as indivisible units when it comes to rendering vowel signs.

Vowel signs

ཀི ki U+0F40 TIBETAN LETTER KA + U+0F72 TIBETAN VOWEL SIGN I

Four of the post-consonant vowels are written using combining marks (vowel signs).

ི␣ུ␣ེ␣ོ

For example:

རི་མོ

དབུ་མེད

ས␣ྤ␣ྱ␣ི␣ར␣་

Vowel sign position relative to a stack of consonants.

Vowels using root suffixes

A number of phonemes in the Lhasa Tibetan vowel repertoire are typically produced by a combination of one of the vowel signs listed above plus a suffix, as shown in the table below.

In the last row of the table '–' represents the inherent vowel.

Vowel	is produced by	Examples
y	ུད␣ུས␣ུན␣ུལ	བདུད དངུལ ལོ་རྒྱུས བརྩོན་འགྲུས
ø	ོད␣ོས␣ོན␣ོལ	པོད བོན རོལ ཆོས
ɛ	–ད␣–ས␣–ན␣–ལ	བཞད ནས བལ ཆབ་གདན

Vowel length

In normal spoken pronunciation, a lengthening of the vowel is frequently substituted for the sounds r and l when they occur at the end of a syllable.wl

གཟའ་འཁོར

ཆོལ་ཁ

Silent suffixes such as ད and ས can also lengthen the vowel while changing its quality.

དད་པ

གཞས

ཱ

A subjoined 'a-chung is used to express long vowels in loan words (Tibetan doesn't have them natively), such as those borrowed from Chinese, Hindi and Mongolian. For example,

ཤྲཱི

ཡང་ལཱ

ཏཱ་ལའི་བླ་མ

The Unicode Standard recommends the use of 0F71 for this, rather than 0FB0.

U+0FB0 TIBETAN SUBJOINED LETTER -A ( a-chung ) should be used only in the very rare cases where a full-sized subjoined a-chung letter is required. The small vowel lengthening a-chung encoded as U+0F71 TIBETAN VOWEL SIGN AA is far more frequently used in Tibetan text, and it is therefore recommended that implementations treat this character (rather than U+0FB0) as the normal subjoined a-chung.

Nasalisation

Syllables that used to end in n in classical Tibetan are still written with the suffix ན, but nowadays the sound is realised as a nasalisation of the vowel.

ཀུན

ཉན

Standalone vowels

0F68 (called ཨ་ཆེན ạ␣ʧʰen␣ a chen) and 0F60 (called འ་ཆུང à␣ʧʰuŋ␣ 'a chung) represent the phoneme a. In the Lhasa dialect, they have a high and a low tone, respectively.

Other standalone vowels are written by attaching vowel signs to one or other of these, depending on the tone needed.

འི␣ཨི␣འུ␣ཨུ␣ ␣འེ␣ཨེ␣འོ␣ཨོ␣ ␣འ␣ཨ

Other uses of A-chung

Nasals

'A-chung can also represent a nasal, so the following alternative spellings may occur.

མཚམས or འཚམས mtshams boundary

མཐུན or འཐུན mtʰun agreement

'A-chung may also nasalise the juncture of two morphemes, eg.

དགེ་འདུན

Dipthongs

Other than loanwords, Tibetan only allows diphthongs in diminutive expressions. 'A-chung is used to write these, as in the following: མི མེའུ

Another example:

རྡོ རྡེའུ

Inherent vowel locator

Finally, 'a-chung can be used to disambiguate the location of an inherent vowel in a syllable. The following sequence is interpreted as CVC.

དག

To express CCV add 'a-chung, eg.

དགའ

Tones

Some Tibetan tone distinctions are reflected in the orthography by the choice of consonant letter. For example, consonant letters such as the following carry a particular default tone. This applies to stops, affricates, and some fricatives.

ཀ␣ཁ␣ག␣ང

However, the tone of these letters can be modified by additional letters in the tsheg-bar. For example, a prefix or superscript will typically change the default low tone of a nasal to a high tone. (See prefix and suffix.)

Otherwise, there are no diacritics or other ways of marking tone in the orthography (nor is tone marked in the Wylie transcriptions).

Sanskrit vowels

This section examines additional vowel signs used for transcription of Sanskrit text or foreign words, principally from Chinese and Mongolian.

Vowel signs

Circumgraphs & multipart vowels

Sanskrit extensions add vowels that involve more than one glyph per base. In 2 cases these glyphs are positioned both above and below the base or stack.

show composition

ཤྲཱི་

The Unicode Tibetan block provides precomposed characters for each of these vowels, but discourages their use.

ཱི␣ཱྀ␣ཱུ

Instead, the Standard recommends the use of decomposed sequences as follows. The decomposed sequence is produced after normalisation by both NFC and NFD methods

ཱི␣ཱྀ␣ཱུ

Whichever approach is used, the vowel signs must be typed and stored after the consonant characters they surround. In the case of decomposed vowel signs, the order is also important and must be as shown above.

Other vowel signs

In addition to the vowel signs just mentioned, the following combining marks are used to indicate additional vowel sounds for the Sanskrit repertoire.

ྀ␣ཻ␣ཽ

ཱ [U+0F71 TIBETAN VOWEL SIGN AA] is used here to lengthen vowels.

Consonants

Consonant summary table

The following table summarises only the nominal consonant to character assignments. Note that the dialect of the upper social strata of Lhasa doesn't include voiced sounds, and simply de-aspirates the other sound where two are shown. For additional details and sounds see consonant_mappings.

This table only shows the unmodified syllable onset values, using sounds from the Lhasa dialect. It doesn't include letters used for writing Sanskrit other than those in the right column, which are typically used to write foreign proper nouns. There's a need for more research to determine what others should be listed, and the exact IPA equivalents for each.

	native sounds	loan words
Stops	པ␣ཏ␣ཀ	ཊ␣ཋ␣ཌ
	ཕ␣ཐ␣ཁ
	བ␣ད␣ག
Affricates	ཙ␣ཅ
	ཚ␣ཆ
	ཛ␣ཇ
Fricatives	ས␣ཤ␣ར␣ཧ
Fricatives	ཟ␣ཞ
Nasals	མ␣ན␣ཉ␣ང	ཎ
Other	ཝ␣ལ␣ཡ

For additional details see consonant_mappings.

Basic consonants

Native Lhasa Tibetan words use 28 full-sized consonant letters and 20 subjoined consonant letters. Tibetan has distinct stop and affricate letters for consonants that are high tone and unaspirated, high tone and aspirated, and low tone and aspirated. The sound values of these are modified, however, by the other characters within a written syllable. The sound matrix provides a useful framework for understanding these effects, and the various modifications are described in the sections that follow.

The lists below show a basic and a subjoined version of each consonant, however not all subjoined forms are used for writing the Tibetan language. Subjoined letters that are not used are marked as such. A few subjoined letters used as subscripts have IPA fields left blank because they show too much variation. Whereas the table just above takes you from sounds to letters, the following simply lists the basic consonant letters.

པ␣ཕ␣བ␣ཏ␣ཐ␣ད␣ཅ␣ཆ␣ཇ␣ཀ␣ཁ␣ག␣ཙ␣ཚ␣ཛ␣ས␣ཟ␣ཤ␣ཞ␣ཧ␣མ␣ན␣ཉ␣ང␣ཝ␣ར␣ལ␣ཡ

ྤ␣ྥ␣ྦ␣ྟ␣ྠ␣ྡ␣ྕ␣ྖ␣ྗ␣ྐ␣ྑ␣ྒ␣ྩ␣ྪ␣ྫ␣ྶ␣ྯ␣ྴ␣ྮ␣ྷ␣ྨ␣ྣ␣ྙ␣ྔ␣ྭ␣ྲ␣ླ␣ྱ

Here, and for the other lists in this section, click on the characters to reveal detailed information about pronunciation and use.

The following diagram shows characters in all of the syllabic positions, and lists the consonant characters that can appear in each of the non-root locations.

show composition

འགྲེམས་སྟོན་

Sound matrix

Nineteen of the consonants are typically arranged in 4x5 matrix form that is useful for understanding pronunciation, and particularly for the effects of non-root consonants on the syllable.

ཀ␣ཁ␣ག␣ང

ཅ␣ཆ␣ཇ␣ཉ

ཏ␣ཐ␣ད␣ན

པ␣ཕ␣བ␣མ

ཙ␣ཚ␣ཛ

The arrangement of columns is as follows:

Column 1: unaspirated & high tone.
Column 2: aspirated & high tone.
Column 3: aspirated & low tone.
Column 4: nasals, all low tone.

Consonant stacking

A stack has a consonant character at the top (although it may actually be slightly squeezed or adapted slightly in shape), and one or more special subjoined consonant characters beneath it.

The topmost consonant in a stack always uses the standard character from the Unicode Tibetan block regardless of whether it is a root consonant or not, and consonants below it always use a character from the subjoined range.

fig_stack shows an example from the Unicode Standard of a word which shows a stack with three consonants. Click on the image or 'show composition' to see the components and the order in which they are stored.

སྤྱིར་ — A word containing a stack of 3 consonants and a vowel sign.

Unlike other Brahmi-derived scripts, no virama is used for native Tibetan text. Instead, just a full and subjoined form of each consonant. The subjoined forms are combining characters. Avoiding the virama makes sense because the virama is not used by Tibetans, and the approach taken makes it easier to create the large number of stacks contained in Tibetan text.

Tibetan uses the word 'head' to refer to either the top-most consonant (ie. spacially) or the root consonant of a syllable, which may be a subjoined consonant. We therefore avoid this term here, and say 'root' or 'topmost'.

The following list shows the order in which characters should be typed, and stored in memory, for a set of stacked characters.

Standard consonant shape
First subjoined consonant
Any other subjoined consonants, in order of descent
Subjoined vowel 'a-chung
Standard or compound vowel sign, or virama (for transliterations)

In transliterated text consonants are sometimes stacked in ways that are not allowed in native Tibetan text.

The root consonant

The pronunciation of Tibetan syllables is typically much simpler than the orthography, which involves patterns of consonants that reduce ambiguity and can affect pronunciation and tone.

The primary consonant in a tseg-bar is called the root consonant or radical (མིང་བཞི), and the other consonants in the syllable (normally up to a maximum of 6) annotate or modify it. The following rules help identify the root:

a consonant with a vowel is always the root, unless it is the phrase connector འི, and letters with superscripts or subscripts are root consonants.

གཡུ་མདོག júm.tôː turquoise

དཔྱ་ཁྲལ t͡ɕá.ʈ͡ʂʰɛː tribute
in a 2-consonant syllable with no vowel, the first consonant is always the root

ནང naŋ inside, room
in a 3-consonant syllable where the last consonant is not ས, the second consonant is likely to be the root.

དམག maː˥˩ war
in a 4-consonant syllable, the second consonant is always the root.

གཟིགས si᷈ to see, look (hon.)

Prefixes

ག␣ད␣བ␣མ␣འ

The 5 characters that appear in the prefix location are not pronounced, but may de-aspirate or give a higher tone value to certain root characters. Each superscript character can only be used with a specified set of root characters.

The effect varies according to the position of the character in the matrix described earlier:
ཀཅཏཔཙ column 1 characters are unaffected, pronunciation-wise, and prefixes serve to distinguish similar sounding words;
ཁཆཐཕཚ column 2 characters only infrequently have prefixes, and pronunciation is also unaffected;
གཇདབཛ column 3 letters are de-aspirated / voiced (but still low tone);
ངཉནམཡ column 4 letters and ཡ are given a high tone.

Prefixes can also appear before other letters that are not included in the matrix.

The consonant 0F42 may occur before 11 root characters, 0F51 before 6, 0F56 before 10, 0F58 before 11, and 0F60 before 10, eg.

འཁོར་ལོ

བསམ་བློ

The prefix letters མ and འ also introduce prenasalisation of column 3 letters in Old Tibetan pronunciation. The nasal sound is homorganic with the following consonant.

Superscripts

ར␣ལ␣ས

The three characters that appear in the superscript location raise the tone pitch of the syllable, but are not pronounced themselves. Each superscript character can only be used with a specified set of root characters. The effect varies according to the position of the character in the matrix described earlier:
ཀཅཏཔཙ column 1 characters are unaffected;
ཁཆཐཕཚ column 2 characters don't have superscripts;
གཇདབཛ column 3 letters are de-aspirated / voiced (but still low tone);
ངཉནམ column 4 letters receive a high tone.

The following table shows the combinations that are generally used in this orthography, and the pronunciations. The left-most column shows the default pronunciations for the letters supporting the superscripts. Note that the superscript glyph is a normal character, and the bigger character below it is actually a subjoined code point.

Grey characters indicate combinations that exist, but that don't change their pronunciation. Gaps and empty cells indicate that the combination doesn't occur in the Tibetan orthography we are describing.

	ར	ལ	ས
ཀ␣ཁ␣ག␣ང	རྐ␣རྒ␣རྔ	ལྐ␣ལྒ␣ལྔ	སྐ␣སྒ␣སྔ
ཅ␣ཆ␣ཇ␣ཉ	␣རྗ␣རྙ	ལྕ␣ལྗ␣	␣ ␣སྙ
ཏ␣ཐ␣ད␣ན	རྟ␣རྡ␣རྣ	ལྟ␣ལྡ	སྟ␣སྡ␣སྣ
པ␣ཕ␣བ␣མ	␣རྦ␣རྨ	ལྤ␣ལྦ	སྤ␣སྦ␣སྨ
ཙ␣ཚ␣ཛ	རྩ␣རྫ		སྩ
ཧ		ལྷ

ལྷ produces the heavily aspirated sound ʰlá.

ར has a shape slightly different from its nominal shape in all combinations except རྙ and རླ.

You should always use the normal RA character for the superscript. The font will adjust the shape where needed.

Subscripts

ྱ␣ྲ␣ླ␣ྭ

The four characters that can appear in the subscript location are also each combined with a particular subset of root characters and have different effects. The table below shows the resulting sounds.

Note that three of the subscripts have shapes that are significantly different from the nominal shape of the character they represent.

	ྱ	ྲ
ཀ␣ཁ␣ག␣ང	ཀྱ␣ཁྱ␣གྱ	ཀྲ␣ཁྲ␣གྲ
ཅ␣ཆ␣ཇ␣ཉ
ཏ␣ཐ␣ད␣ན		ཏྲ␣ཐྲ␣དྲ
པ␣ཕ␣བ␣མ	པྱ␣ཕྱ␣བྱ␣མྱ	པྲ␣ཕྲ␣བྲ␣མྲ
ཙ␣ཚ␣ཛ
ས␣ཟ␣ཤ␣ཞ␣ཧ		སྲ␣ ␣ ␣ཧྲ
ར␣ལ

	ླ	ྭ
ཀ␣ཁ␣ག␣ང	ཀླ␣ ␣གླ	ཀྭ␣ཁྭ␣གྭ
ཅ␣ཆ␣ཇ␣ཉ		␣ ␣ ␣ཉྭ
ཏ␣ཐ␣ད␣ན		␣ ␣དྭ
པ␣ཕ␣བ␣མ	བླ
ཙ␣ཚ␣ཛ		ཙྭ␣ཚྭ
ས␣ཟ␣ཤ␣ཞ␣ཧ	སླ␣ཟླ	␣ཟྭ␣ཤྭ␣ཞྭ␣ཧྭ
ར␣ལ	རླ	རྭ␣ལྭ

0FB1 functions as a medial -j- after k(ʰ). When combined with bilabial consonants, however, the pronunciation is mapped to the set of palatal consonants.

ྲ mostly produces retroflex ʈ(ʰ).

ླ, with one exception, produces l. ཟླ produces ta or ⁿda.

ྭ doesn't affect the pronunciation, but may be used to indicate which is the root consonant when there is a prefix and second suffix but no vowel sign.

Uniquely, ྭ can also appear as a sub-subscript, as in the following word which is transcribed grwa pa.

གྲྭ་པ

Suffixes

བ␣ད␣ག␣ས␣མ␣ན␣ང␣ར␣ལ␣འ

Characters in the suffix position can have the following effects:

Consonant sound

བ ག མ ང usually add a sound.

ན usually nasalises the vowel.

ར ལ may produce a sound but are often silent.

ད ས འ are usually silent.

Vowel modification

ད ས ན ལ modify the vowel quality as follows: a→ɛ, u→y, o→ø. The vowel quality doesn't change for i and e.

When a final consonant is not pronounced as a consonant or nasalisation, the modified vowels are generally long.

Observation: Roach draws a comparison between ད ས wherein the former produces very short, abrupt vowels and the latter long, but this is not generally reflected in the Wiktionary IPA transcriptions.

ར ལ, when not pronounced, generally produce long vowels, whether modified or not.

The following are illustrative examples.

ག མ ར adding a sound.

གླག

གསུམ

པིར
ན nasalising the vowel.

བདུན

བསྟན་འཛིན
ད ས changing the vowel quality.

དྲོད་ཚད

ཆོས

འཁྲུད

འ is always silent, and serves to disambiguate the spelling (see achung_inherent), unless it has a vowel sign attached (for example, see achung_diphthong), eg. compare the following with the example in point 1 above:

གཟའ

འཇའ

Secondary suffixes

ས␣ད

Only two characters can appear in the secondary suffix location, according to Tibetan grammar, 0F66 and 0F51, and the latter is no longer officially found in modern Tibetan. A character in this position adds no sound and nor does it affect the sounds in the rest of the syllable, eg. བསྒྲུབས གྱུརད

Irregular pronunciations

Most consonants translate to the same basic sound unless they are modified by surrounding letters as mentioned above. In some cases, however, the pronunciation of a consonant is irregular. In particular, b is sometimes pronounced w, eg.

རེ་བ

དབང་ཆ

When it has an u vowel and this transformation applies, the sound becomes simply u, eg.

དབུ་མེད

And some words have an additional nasalisation which is not shown, eg.

ད་ལྟ

Extensions for non-native sounds

The section sanskritC describes extensions to the basic set of consonants for use with Sanskrit and Chinese transliterations. Here we list some of those that are also used in modern Tibetan for writing proper nouns and other loan words from non-Tibetan sources. (The pronunciations shown are approximate.)

ཊ␣ཋ␣ཌ␣ཥ␣ཎ

མོ་ཊ

ཋེག་ཟ་སི

ཁ་ཎ་ཌ

བྷ␣དྷ␣ཌྷ␣གྷ␣ཛྷ

ཧུང་གྷ་རི

དྷེའི་ལ་ཝར

བྷར་མ

Finals

Nominally, 6 of the suffixes produce syllable codas, and another produces vowel nasalisation.

བ␣ག␣མ␣ན␣ང␣ར␣ལ

དེབ

གཏམ

ཀླུང

In practise, in modern Tibetan, syllable-final k is typically pronounced ˀ or k̚ in very formal speech.wl

གཟིག

Syllable-final r and l are often realised as a lengthening of the previous vowel, rather than as consonants.wl

ཏིལ

ཤར་བ

རྒྱལ་ཁབ

ཉལ

A syllable-final n is realised as vowel nasalisation.wl

ཉན

གྱོན

Consonant length

Tibetan doesn't allow geminated or lengthened consonant sounds. If a syllable ends with the same sound as the next syllable begins with, the coda of the first syllable is dropped.wl For example, compare the initial syllables in the following:

ཞབས

ཞབས་པད

Consonant sounds to characters

This section maps Lhasa Tibetan consonant sounds to common graphemes in the Tibetan orthography.

The inherent vowel is included for consonant onsets to indicate the associated tone. ⓟ indicates a prefix. Extended characters for foreign words still need to be added (pending accurate IPA information).

pá

པ

པད་མ

ལྤ

ལྤགས་པ

སྤ

སྤོས

pà~bà

རྦ

ལྦ

ལྦུ་བ

སྦ

སྦི་ཅི་ལི

-p

-བ

ཤུབས

pʰá

ཕ

ཕོ

pʰà

བ

བ་ཕྱུགས

ⓟབ

འབབ

See pà~bà.

tá

ཏ

ཏིལ

ⓟཏ

གཏམ

རྟ

ལྟ

ལྟ་བ

སྟ

སྟོང

tà~dà

ⓟད

གདུགས

རྟ

རྡོ

ལྟ

ལྟུང

སྟ

སྡུར

ཟླ

ཟླ་བ

See tà~dà.

tʰá

ཐ

ཐང

ⓟཐ

འཐུམ

tʰà

ཐ

ཐང

དྭ

དྭངས་ལྕགས

ʈʰá

ཐྲ

ཨོ་སི་ཐྲི་ཡ

cá~t͡ɕá

ལྕ

ལྕགས

cá~kjá

ཀྱ

ཀྱི

cʰá~kʰjá

ཁྱ

ཁྱིམ

cʰà~kʰjà

གྱ

འགྱོད་པ

ká

ཀ

ⓟཀ

བཀོད

ཀྭ

རྐ

རྐང་པ

ལྐ

སྐ

སྐར་མ

kà~ɡà

ⓟག

བགོད

རྒ

ལྒ

སྒོ

སྒ

-k

-ག

གཟིག

See kʰà~ɡà.

kʰá

ཁ

ཁ་པར

ཁྭ

kʰà~ɡà

ག

ག་པར

གྭ

གླ in transcriptions of foriegn words.

ཧུང་གྷ་རི

ʔá

ཨ

ཨང་གྲངས

ⓟབ

དབུ་མེད

t͡sá

ཙ

ཙི་ཙི

ⓟཙ

གཙང

རྩ

རྩེ

སྩ

རྩི་ཤིང

ཙྭ

t͡sà~d͡zà

ⓟཛ

འཛིན

རྫ

རྫོང

t͡sʰá

ཚ

ⓟཚ

མཚན

ཚྭ

t͡sʰà~d͡zà

ཛ

ཛ་ཏི

ཚྭ

t͡ɕá

ཅ

ཅི

ⓟཅ

གཅིག

པྱ

དཔྱ་ཁྲལ

t͡ɕà~ɟà

ⓟཇ

གཅིག

རྗ

རྗོད

ལྗ

ལྗགས

t͡ɕʰá

ཆ

ཆང

ⓟཆ

མཆོད

ཕྱ

ཕྱག

བྱ

ཕྱག

t͡ɕʰà~d͡ʑà

ཇ

ʈ͡ʂá

བྲ

འབྲས

ʈ͡ʂ~ʈá

ཀྲ

ཀྲུང་གོ

ཊ

མོ་ཊ

ʈ͡ʂà~ʈʰà

དྲ

དྲི་མ

གྲ

གྲང་མོ

ʈ͡ʂʰá~ʈʰá

ཕྲ

ཕྲུ་གུ

ཁྲ

ཁྲོམ

གྲྭ

གྲྭ་པ

d͡z

See t͡sà~d͡zà and t͡sʰà~d͡zà.

d͡ʑ

See t͡ɕʰà~d͡ʑà.

ཧྥ

ཧྥིན་ལན

sá

ས

སོ

ⓟས

གསའ

སྲ

སྲོག

sà~zà

ཟ

See sà~zà.

ɕá~ʃá

ཤ

ཤིང

ⓟཤ

གཤེན

ཤྭ

ɕà~ʑà

ཞ

ཞལ

ⓟཞ

གཞིས་ཀ

ཞྭ

See ɕà~ʑà.

ʐà~rà

ར

རི

Sanskrit extensions

Many of the extra consonants (and other characters) in the Uncode Tibetan script block are used for transliteration of other languages, principally Sanskrit and Chinese. These include the retroflex and voiced aspirated consonants. A couple of characters are extensions for Balti.

Retroflex consonants

ཊ␣ཋ␣ཌ␣ཎ␣ཥ

ྚ␣ྛ␣ྜ␣ྞ␣ྵ

The retroflex consonants, which are reversed versions of Tibetan consonant shapes, may be used in modern Tibetan to distinguish loan words from sequences of Tibetan syllables. For example,

ཁ་ཎ་ཌ

མོ་ཊ

Aspirated consonants

Consonant stacks are normally used to represent aspirated sounds and the Sanskrit diphone kʃ.

གྷ␣ཌྷ␣དྷ␣བྷ␣ཛྷ␣ཀྵ

ྒྷ␣ྜྷ␣ྡྷ␣ྦྷ␣ྫྷ␣ྐྵ

A set of precomposed characters exists for representing these sounds, but they are decomposed under both Unicode normalisation forms, so using the decomposed stacks is likely to make the text more consistent.

གྷ␣ཌྷ␣དྷ␣བྷ␣ཛྷ␣ཀྵ

ྒྷ␣ྜྷ␣ྡྷ␣ྦྷ␣ྫྷ␣ྐྵ

The bottom line here is that aspirated consonants are normally written by simply adding a subjoined HA below a consonant, and the diphone kʃ is produced in a similar way.

Fixed form letters

ཪ␣ྼ␣ྺ␣ྻ

0F62 at the top of a stack usually has a reduced form, eg. རྐ rka. For transliterations it is sometimes desirable to retain the full form of RA where in Tibetan words it would be reduced. To do this use 0F6A instead of the normal RA, but only where the normal RA would not produce the full form anyway, ie. do not use eg. རྙ rnya, which has the full form already.

There are also fixed form variants of subjoined YA and WA.

Other Sanskrit characters

ཿ␣ཾ␣ྃ␣྄␣྅

0F7F ( nam chay ) is the visarga, and 0F7E ( ngaro ) is the anusvara.

Tsa-'phru mark

0F39 is an integral part of the three consonants 0F59, 0F5A, and 0F5B. Although those consonants are not decomposable, this mark has been abstracted and may be used in combinations such as ཕ༹ or with other consonants to make new letters for use in transliteration and transcription of other languages. For example, in modern literary Tibetan, it is one of the ways used to transcribe the Chinese “fa” and “va” sounds not represented by the normal Tibetan consonants.

Also used to represent tsa, tsha, and dza in abbreviations.

This code point should be used immediately after the consonant it modifies, even if that consonant is followed by a subjoined consonant.

Balti

Two characters are provided for use with Balti.

ཫ␣ཬ

Other features

Sanskrit vocalics

Tibetan vocalics are used only for transcription of Sanskrit.

As for the vowels described earlier, there are deprecated precomposed characters, and equivalent decomposed sequences. The precomposed characters are all circumgraphs, and the decomposed sequences are all multipart vowels.

These are the precomposed code points. The R and L vowels are decomposed in NFC, but the RR and LL vowel code points are not, nor do they decompose in NFD.

ྲྀ␣ླྀ␣ཷ␣ཹ

The Unicode Standard discourages the use of the above precomposed forms (strongly discouraging the last two), and recommends the following sequences instead.

ྲྀ␣ླྀ␣ྲཱྀ␣ཱླྀ

Other letters

The Unicode Tibetan block contains the following additional characters with the general property of letter.

ༀ

ྈ␣ྉ␣ྊ␣ྋ␣ྌ

Numbers

Tibetan has its own set of numbers.

Observation: My Chinese publication, however, uses european digits.

༠␣༡␣༢␣༣␣༤␣༥␣༦␣༧␣༨␣༩

0F3E and 0F3F are paired characters used in combination with digits.

Half-numbers

By some interpretations, the following shapes each have the value of 0.5 less than the number within which it appears. Used only in some traditional contexts, they appear as the last digit of a multidigit number, eg. ༤༬ represents 42.5. These are very rarely used, however, and other uses have been postulated. For more information see Numbers that Don't Add Up : Tibetan Half Digits, by Andrew West.

༳␣༪␣༫␣༬␣༭␣༮␣༯␣༰␣༱␣༲

Text direction

Tibetan text normally runs left to right in horizontal lines.

Show default bidi_class properties for characters in the Tibetan orthography described here.

Occasionally, Tibetan text may occur in vertically-set text. In this case, syllables are kept together as an unbreakable unit.

藏传佛教活佛转世专题展 བོད་བརྒྱད་ནང་བསྟན་སྤྲུལ་སྐུའི་ལམ་སྲོལ་སྐོར་གྱི་བཤམས་སྟོན། — Tibetan syllables that are not broken in vertically-set text.

Glyph shaping & positioning

You can experiment with examples using the Tibetan picker.

Context-based shaping & positioning

Tibetan requires many rules to position glyphs correctly, and also to shape characters according to context.

Glyphs in Tibetan script need to be adapted sometimes to suit the context in which the character is used. A particularly prevalent example is that of the letter ར [U+0F62 TIBETAN LETTER RA]. When used at the top of a stack it has an abbreviated form, as shown by the highlight in fig_ra_position on the left.

The example on the right in fig_ra_position shows what a normal RA looks like. This is the same underlying character. The shape is determined by rules in the font.

show composition

རྐང་པ

show composition

རང

Combining characters need to be placed in different positions, according to the context. fig_tibetan_position shows the same vowel sign displayed at different heights, according to what stacks above it.

show composition

གྲུ་གསུམ

Letterform slopes, weights, & italics

Tibetan writing never had bold or italic effects until the Chinese introduced bold style for books after the invasion of Tibet.dt,34

Duff describes some Western publications that slant Tibetan text in books, but points out that a more natural slant direction for Tibetan would be the opposite to that of Western italics.dt,34

Typographic units

Word & syllable boundaries

Word boundaries are not indicated by the Tibetan orthography. However, phonetic syllables, represented by a sequence of letters known as a tsheg-bar tsheg bar, are separated by 0F0B.

show composition

གློག་བརྙན་ཁང

ཡོངས་ཁྱབ་གསལ་བསྒྲགས་
འགྲོ་བ་མིའི་ཐོབ་ཐང༌། — This figure shows the use of the tsheg-bar across a whole sentence. There is no indication of word boundaries.

Graphemes

tbd

Punctuation & inline features

Phrase & section boundaries

་␣༌␣།␣༑␣༎␣༈␣྾␣༄␣༅

Tibetan uses native punctuation marks.

syllable	0F0B 0F0C
text divisions	0F0D 0020 0F0E 0F11 after a single syllable wrapped to a new line.
other	0F08 0FBE
head marks	0F04 0F05

Text divisions

Key divisions of the text include ‘expressions’ (brjod-pa) and ‘topics’ (don-tshan). They do not necessarily equate to English phrases, sentences and paragraphs.

Divisions are most commonly marked using 0F0D (transliterated as shad but pronounced ʃe, and so transcribed here as shay) followed by a space. After the space, which can vary in width, another shay will often appear before the text continues (see fig_double_shad).

Other divisions (eg. headlines, verses, and paragraphs, etc) may be terminated with what we will refer to here as a double-shay. This may be written using ༎ or using two single shay with no intervening space.

In fact, the choice of whether to use a single shay or double shay is somewhat fluid. In some texts it is possible to find a double-shay followed by a space and another double-shay (see below). There are also other variant forms of the single shad for use in specialised contexts.

དུང་དང་འོ་མར་འགྲན་པའི་ལྷག་བསམ་མཐུ། །དམན་ཡང་དཀར་པོའི་བྱས་འབྲས་ཅུང་ཟད་ཅིག །བློ་དང་འདུན་པ་བཟང་བའི་རང་རིགས་ཀུན། །རྒྱལ་ཁའི་འཕྲིན་བཟང་ལས་དོན་འགྲུབ་ཕྱིར་འབད།།

Examples of shad and double shad.

A phrase that ends with the root consonant 0F40 or 0F42 will normally swallow up a shad that immediately follows it, even if there is a vowel sign. For example, where you might expect to see two shads, you might see ཀུ ། and སྐུ །. However, the shad is not omitted if these characters have a subscript consonant, eg. གྲུ། །.

རྩོམ་པ་པོ། ལྡོང་ཕྲུག པར་སྐྲུན་ཁང་། གངས་ཅན་པ་ཚོང་འཕྲིན་ཇུས་འགོད་ཞབས་ཞུ་ལྟེ་གནས་ནས་དཔར།

GA swallowing up a shad.

GA swallowing up the first of two shads.

When a phrase ends with shad+space+shad the space between the shad marks is normally reduced in Tibetan pechas, down to 1/4 or 1/3 of the normal width, or made to fit the space available. Some space is retained to avoid the appearance of a double-shad.dt,39

A shay should never normally appear at the start of a line, so the combination of shay+space+shay should not be broken when it occurs at the end of a line (see also justification).

Boundaries between chapters or significant sections may also be represented by a double-shad followed by 5-6 spaces and another double-shad.dt,38

དང་འཇམ་པའི་དབྱངས་དང་པ་བྱང་ཆུབ་སེམས་དཔའ་ཡིན༎ ༎སང་རྒྱས་བཅོམ་ལྔན་འདས་དགྲ་བཅོམ་པ་་་་་་

Double shads separated by several spaces between chapters or significant sections.

0F0E can be used for the double-shad.

Observation: In a Chinese magazine publication I have, most articles contain no double shad as a delimiter. (The text is formatted in paragraphs.) I did find a double shad at the very end of one of the articles, and it was used at the end of each line on a page containing some verse-formatted folk literature. The same appears to apply for large parts of the Bhutanese newspapers I have, however there are other pages with plenty of double shads - some at the end of paragraphs, some inside paragraphs.

0F08 is used to separate texts that are equivalent to topics and subtopics, such as the start of a smaller text, the start of a prayer, a chapter boundary, or to mark the beginning and end of insertions into text in pechas.

This drul-shad is usually surrounded on both sides by the equivalent of about three non-breaking spaces (though no rule is specified).dt,35 The drul-shad should not appear at the beginning of a new line and the whole structure of spacing-plus-shad needs to be kept together.

For 0F11 see rin_chen_spung_shad.

0FBE (often repeated three times) indicates a refrain.

Tsek and section boundaries

The tsheg is not used before a shad, except after 0F44. For example, note the end of the three sections in fig_tsek_shad:

དོན་ཚན་དང་པོ། འགྲོ་བ་མིའི་རིགས་རྒྱུད་ཡོངས་ལ་སྐྱེས་ཙམ་ཉིད་ནས་ཆེ་མཐོངས་དང༌། ཐོབ་ཐངགི་རང་དབང་འདྲ་མཉམ་དུ་ཡོད་ལ།

Examples of tsheg not being used before shad, and of U+0F0C being used between NGA and shad.

So that line-breaking keeps the NGA + tsheg + shad together, 0F0C should be used between NGA and a shad. This is a non-breaking version of the tsheg (the word 'delimiter' in the name is a misnomer).

སུ་གང་གིས་གང་ཞིག་གི་རྒྱལ་ཁབ་ཐོབ་ཐང་བཙན་ཤེད་ཀྱིས་འཕྲོག་པ་དང༌། ཡང་ན་དེའི་རྒྱལ་ཁབ་བརྗེ་བསྒྱུར་གྱི་ཐོབ་ཐང་བཀག་འགོག་བྱེདཔའི་རིགས་མི་ཆོག །

Example of TSEG BSTAR being used between NGA and shad.

White space

Space is used as a punctuation mark in Tibetan, to separate meaning in sections. It should not appear at the start of a line.dt,37

Spaces in Tibetan text are usually wider than spaces in English text, and typically only occur after one of the following:

0F0D
༑
0F14
0F7F

However, numbers and embedded Western text are surrounded by smaller spaces, eg.

ལོ་ ༢༠༠༡ ཤིང་བྱ་ཟླ་ ༩ ཚེས་ ༥ ཉིན་

So that line-breaks work correctly, 00A0 should be used for spaces when they appear after 0F40 or 0F42, or between 2 shad or double-shad characters. It should also be used for spacing around 0F08.

Except for special situations, such as the use of sbrul shad, it is recommended to use a single space where gaps appear, and to stretch that space where necessary.dt,35

Head marks

In traditional, loose-leaf Tibetan pechas a head mark or yig-mgo (yig go) is used at the beginning of the front of the folio so that you can tell which is the front.

Head marks are also used in both pechas and books to indicate the start of a headline or the start of the first paragraph in a longer text.

fig_head_marks shows a common head mark, 0F04, and the extension character 0F05. A head mark can be written alone, or can be followed by as many as three closing marks; head marks are also followed by two shads.

༄༅༎ ཡོངས་ཁྱབ་གསལ་བསྒྲགས་འགྲོ་བ་མིའི་ཐོབ་ཐང༌།

Example of use of head marks at the start of the Universal Declaration of Human Rights.

Head marks differ from text to text. The Unicode Standard provides a number of characters to give some basic coverage, but may not meet all needs.

Three less common head marks, used in Nyingmapa and Bonpo literature, are also represented in the Tibetan block, namely:

0F01
0F02
0F03

Bracketed text

༼␣༽

Tibetan commonly uses native marks to insert parenthetical information into text.

	start	end
standard	༼ [U+0F3C TIBETAN MARK ANG KHANG GYON]	༽ [U+0F3D TIBETAN MARK ANG KHANG GYAS]

༼ [U+0F3C TIBETAN MARK ANG KHANG GYON] and ༽ [U+0F3D TIBETAN MARK ANG KHANG GYAS] are paired punctuation used to form a roof over one or more digits or words. The right-hand character can also be used much like a single parenthesis in list counters.

Observation: Looking at various articles in Wikipedia it is possible to find various sets of paired punctuation.

རྩིས་དཔོན་ལུང་ཤར་ལ་མ་དགའ་བའི་གླེང་ཕྱོགས་ཆེན་པོ་བྱུང་འདུག་ཀྱང་དེའི་དུས་ཏཱ་ལའི་བླ་མའི་གཟིགས་བསྐྱོང་ཆེན་པོ་ཡོད་སྟབས་སུས་་ཀྱང་གནོད་འཚེ་གཏོང་ཐུབ་མི་འདུག དེ་རྗེས་ལུང་ཤར་མཆོག་ལ་〈རྩིས་དཔོན་〉དང་〈འབབ་ཞིབ་〉དོ་དམ་ཡིན་མུས་ཐོག་ཏྭ་ལའི་བླ་མས་〈དམག་སྤྱིའི་འཚོ་འཛིན་〉ཞེས་དམག་སྤྱི་ཁང་གི་ལས་འགན་ཡང་གནང་བ་རེད།

Observation:〈 [U+3008 LEFT ANGLE BRACKET] and 〉 [U+3009 RIGHT ANGLE BRACKET].

བཀའ་ཤག་གི་ཕྲ་མའི་ཐོད་དུ་དེ་སྔ་ཆིང་གོང་མས་གནང་བའི་པྭན་ཞིག་ཡོད་པ་དེའི་ཐོག་བོད་ཡིག་དབུ་ཅན་གྱིས་《ཕྱི་ནང་བསྟན་སྲིད་ཡོངས་ ལ་བསོད་ནམས་ཀྱི་དགེ་ཚོགས་ལྷག་པར་སྤེལ་བ་》ཞེས་འཁོད་ཡོད་པ་ དེ་ལ་བལྟས་ནས་བཀའ་ཤག་གི་ཕྲ་མའི་འོག་ཏུ་ཡི་གེ་ཞིག་སྦྱར་བའི་ནང་ཕྱི་ནང་སྐྱེ་འགྲོ་ཡོངས་ལ་འབྲུ་དངུལ་གྱི་སྟོན་མོ་ལྷག་པར་སྤེལ་བ་ཞེས་བྲིས་ཡོད་པ་རེད།

Observation:《 [U+300A LEFT DOUBLE ANGLE BRACKET] and 》 [U+300B RIGHT DOUBLE ANGLE BRACKET].

དེ་རྗེས་བཀའ་བློན་དང་བོད་དམག་གི་སྤྱི་ཁྱབ་མདའ་དཔོན(དམག་སྤྱི)བྱས་ཏེ་དམག་ཁྲིམས་དམ་འདོམས་ཚ་ནན་ཐལ་དྲགས་པ་དང་།

Observation: ( [U+0028 LEFT PARENTHESIS] and ) [U+0029 RIGHT PARENTHESIS].

སྐར་རྩིས་སྐོར་ལ་སྔོན་མེད་ཀྱི་ལེགས་བཤད་ཀུན་བཏུས་༼བེེ་དཀར་༽དང་དེའི་༼བུ་དཔེ་༽དྲིས་ལན་༼གཡའ་སེལ་༽སོགས་དང༌།

Observation: ༼ [U+0F3C TIBETAN MARK ANG KHANG GYON] and ༽ [U+0F3D TIBETAN MARK ANG KHANG GYAS].

Emphasis

◌༵

0F35 can be used to indicate emphasis.

བསོ༵ད་ནམ༵ས་ — Use of colour and diacritics to emphasise text.

If entered as a combining character it can be added after the vowel sign in a stack. But the use of this mark is not straightforward, since it attaches to a syllable rather than a character and therefore to place it correctly the application needs to take syllable boundary positions into account. If the syllable has an even number of stacks/consonants, the diacritic will be displayed between the 2 in the middle, not underneath a particular consonant/stack.

Application software has to ignore this character for text processing, such as search and collation.

Alternative methods of emphasis include use of a different colour, or the use of the prefix 0F38.

Modern texts appear to use bolding on text.

Abbreviation, ellipsis & repetition

Repetition

༴

༴ [U+0F34 TIBETAN MARK BSDUS RTAGS] means 'etc', or 'ditto', and is used after the first few tsheg-bar of a recurring phrase.

Inline notes & annotations

tbd

◌༷

0F37 can be used in interspersed commentaries to tag the root text that is being commented on. An alternative is to set the tsheg-bar being commented on in large type and the commentary in small type.

Application software has to ignore this character for text processing, such as search and collation.

སྐུ༷་གསུ༷ང་ཐུག༷ས། — Marks being used to identify root text.

Line & paragraph layout

Line breaking & hyphenation

Tibetan never breaks inside a syllable, and has no hyphenation.

Normally, lines break after ་, and don't break after spaces. If a word is composed of multiple syllables, it is also preferable to avoid breaking a line in the middle of the word.

However, line breaks should not occur after a tsheg when it is between ང (with or without a vowel sign) and །. Applications should be able to handle this if they encounter a normal tsheg, but content authors are advised to use ༌ instead, to be on the safe side.

Line breaks are also possible after:

། - as long as the next line starts with a consonant (ie. not a second shad).
༔
ཿ (visarga) (there is never a tsheg after this character, eg. ཨོཾ་ཨཱཿཧཱུྃ་)
༴ or ྾dt,37

A line must never start with a shad, space, or other punctuation sign.dt,37

A line that ends with a shad plus space followed by a consonant can wrap after the shad and discard the space. But a line that ends with one of the following must not lose the space and must not be broken either side of the space:

ཀ or ག followed by a space (in which case a shad is not used).
a shad followed by a space then another shad.

This should be straighforward if content authors use 00A0 for the latter cases.

Show (default) line-breaking properties for characters in the modern Tibetan orthography.

Line breaks and rin chen spungs shad

In Tibetan, especially in pechas, it is considered a special case if the last syllable of an expression that is terminated by a shad or a double-shad breaks onto a new line. In that case ། may be replaced by 0F11. This change serves as an optical indication that there is a left-over syllable at the beginning of the line that actually belongs to the preceding line.

ས་དང་པོར་སྦྱིན་པའི་གཏམ་མཛད་དོ། །སྙིང་རྗེ་དམན་ཞིང་་་་་་་་་་སེམས་<br>ཅན༑ །གཉས་པ་རྒྱལ་བའི་ — Rin chen spung shad in use. Source du40.

This behaviour varies in the following cases:

When a line starts with a tseg-bar containing 2 vowel signs (ie. a diphthong), such as ལེའུ། །, no rin chen spungs shad would be used, since le'u is pronounced as two syllables.
At the end of a topic the rules say that only one shad should be converted, ie. ༑ །, however it is moderately common to see both converted, ie. ༑ ༑.

Some printed books do not use rin chen spungs shad replacements, however the majority of books seem to apply the same rules as are used with pechas.

In an environment where the width or content of the page can change, such as a web browser, this feature poses a problem. In printed or written texts where the layout is fixed, a content author would typically only insert rin chen spungs shad once the line breaks have been established, and would not expect the text to be changed after that. On the Web, resizing a window or displaying on different devices will reflow the content, and only after that process is it apparent which instances of shad need to be converted. Applications need to be able to automatically switch between the two styles of shad in real time, as a syllable moves on or off a new line because the page is resized or the preceding content is modified.

The Unicode Standard adds: Not only is rin-chen-spungs-shad used as the replacement for the shad but a whole class of “ornamental shads” are used for the same purpose. All are scribal variants on a rin-chen-spungs-shad, which is correctly written with three dots above it.

Text alignment & justification

There are two alternative methods of justification.

Method 1: Inter-character spacing

Spacing between all characters should be adapted equally. Note that the width of the white-space character should not be changed significantly, so Tibetan texts use the non-breaking space mentioned above, which doesn't change width on justification.

Method 2: tsheg padding

While hand writing pechas, but also in some publications, authors add small spaces across the text to get the line end as near as possible to the right margin. Where space remains at the margin, it may be left as is, if it is short. Otherwise, the remaining space will be filled with tshegs to make the line as flush as possible with the right margin (there will usually still be a slight raggedness to the right edge of the text).

A page of a booklet showing tsheg padding. (Click on the image to see larger.)

There are a couple of detailed rules about the use of tsheg padding. Justifying tshegs are almost always used when the line ends in a tsheg. If, however, the line ends in a shad, there are a number of alternatives.

If the line ends with a single shad the shad is followed by spaces. Tsheg padding is never applied after spaces. (See examples in the figure above.)

If the line ends in a double shad (with space between), it is unusual (though possible) to add tsheg padding. Instead, the space between the shads is stretched or narrowed. (See examples in the figure below.) The same applies if the second shad was removed because it was preceded by a KA or GA.

Booklet pages showing double shad usage at the end of a line.

Baselines, line height, etc.

Tibetan uses a hanging baseline, which tends to fall between the ascender and x-height of Latin text. When text in smaller annotations or larger heading text is mixed with normal text, the letter-heads of all characters should align to the same height.

Tibetan places vowel marks above base characters, and can also add combining characters below the line. In addition, the stacking of consonants, which can have vowel sign below, further extends the text height downwards. The complexity of the glyph clusters means that the vertical resolution needed for clearly readable Tibetan text is higher than for English, or most Latin text.

To give an approximate idea, fig_baselines compares Latin and Tibetan glyphs from the Noto Serif font.

Hhqxམ་ནཐུདུརྡེམོཀདཤྲཱིསྤྱིསྒྲུཧཱུྃ།༼ — Font metrics for Latin text compared with Tibetan glyphs in the Noto Serif Tibetan font.

fig_baselines_other shows similar comparisons for the Microsoft Himalaya and Tibetan Machine Uni fonts.

Counters, lists, etc.

You can experiment with counter styles using the Counter styles converter. Patterns for using these styles in CSS can be found in Ready-made Counter Styles, and we use the names of those patterns here to refer to the various styles.

The modern Tibetan orthography uses a numeric style with native digits.

Numeric

The tibetan numeric style is decimal-based and uses these digits.rmcs

༠␣༡␣༢␣༣␣༤␣༥␣༦␣༧␣༨␣༩

Examples:

༡␣༢␣༣␣༤␣༡༡␣༢༢␣༣༣␣༤༤␣༡༡༡␣༢༢༢␣༣༣༣␣༤༤༤

Prefixes and suffixes

༽ can be used much like a single parenthesis in list counters.

Sub- or superscript counters

Duff describes a form of numbered list where the numbers are placed above or below the main text, which he says is often used.dt,41

Page & book layout

General page layout & progression

Pechas

In pechas, Tibetan text is written inside a visible box which defines the margin of the page. In more recent publications this box may be invisible. Modern publications also use paragraphs. The initial line of a new paragraph may be indented.

Traditional pechas only have 2 sizes of text: ཡིག་ཆེན་ yig␣ʧʰen␣ larger, and ཡིག་ཆུང་ yig␣ʧʰuŋ␣ smaller, where the larger is the standard size. The smaller text needs to be readable, and so doesn't usually go below 20pt; the larger text is likely to be around 27-30pt.dt,34

Titles are usually written on a title page, using the 'larger' size. However, on shorter pechas it may be written on the same page as the text using the 'smaller' size.dt,34

Small size text is also used to write annotations, in a similar way to the use of italics or footnotes in the West.dt,34

Notes, footnotes, etc.

See inlinenotes for purely inline annotations, such as ruby or warichu. This section is about annotation systems that separate the reference marks and the content of the notes.

༶ and ྿ are used to indicate where text should be inserted within other text or as references to footnotes and marginal notes.