Updated 28 February, 2019 • tags scriptnotes, tibetan
This page provides basic information about the Tibetan script. It is not authoritative, peer-reviewed information – these are just notes I have gathered or copied from various places as i learned. For character-specific details follow the links to the Tibetan character notes.
For similar information related to other scripts, see the Script comparison table.
Clicking on red text examples, or highlighting part of the sample text shows a list of characters, with links to more details. Click on the vertical blue bar (bottom right) to change font settings for the sample text.
དོན་ཚན་དང་པོ། འགྲོ་བ་མིའི་རིགས་རྒྱུད་ཡོངས་ལ་སྐྱེས་ཙམ་ཉིད་ནས་ཆེ་མཐོངས་དང༌། ཐོབ་ཐངགི་རང་དབང་འདྲ་མཉམ་དུ་ཡོད་ལ། ཁོང་ཚོར་རང་བྱུང་གི་བློ་རྩལ་དང་བསམ་ཚུལ་བཟང་པོ་འདོན་པའི་འོས་བབས་ཀྱང་ཡོད། དེ་བཞིན་ཕན་ཚུན་གཅིག་གིས་གཅིག་ལ་བུ་སྤུན་གྱི་འདུ་ཤེས་འཛིན་པའི་བྱ་སྤྱོད་ཀྱང་ལག་ལེན་བསྟར་དགོས་པ་ཡིན༎
དོན་ཚན་གཉིས་པ། སྐྱེ་བོ་རེ་རེར་གསལ་བསྒྲགས་འདི་ནང་བཀོད་པའི་ཐོབ་ཐང་དང་རང་དབང་སྟེ། མི་རིགས་དང། ཤ་མདོག། ཕོ་མོ། སྐད་ཡིག། ཆོས་ལུགས། སྲིད་དོན་བཅས་སམ། འདོད་ཚུལ་གཞནདག་དང༌། རྒྱལ་ཁབ་དང་སྤྱི་ཚོགས་ཀྱི་འབྱུང་ཁུངས་། མཁར་དབང༌། རིགས་རྒྱུད། དེ་མིན་གནས་ཚུལ་འདི་རིགས་གང་ཡང་རུང་བར་དབྱེ་འབྱེད་མེད པའི་ཐོབ་དབང་ཡོད༎ ད་དུང་རྒྱལ་ཁབ་བམ། ས་གནས་གང་ཞིག་རང་བཙན་ཡིན་པ་དང༌། ལྟ་རྟོགས་འོག་ཏུ་གནས་པ། རང་གཞུང་རང་སྐྱོང་མ་ཡིན་པ། གཞན་དག་བདག་དབང་ཚད་འཛིན་ཡོད་པ་བཅས་ཇི་ལྟར་ཡང་དེ་དག་གི་སྐྱེ་བོ་གང་ཞིག་སྐིད་དོན་དང། ཁྲིམས་དབང། རྒྱལ་སྤྱིའི་གནས་སྟངས་བཅས་ཀྱི་ཐོག་ཏུ་ཁྱད་པརམི་དབྱེ་བ་ཡིན༎
The Tibetan script is used for writing the Tibetan, Dzongkha, Ladakhi and Sikkimese languages, spoken in Tibet, Bhutan, Nepal and India. It is also used for transcribing religious Sanskrit texts. The exact origin of the script is not clear; Tibetan Buddhism traditionally ascribes its creation to Minister Thon mi Sambhota in Northeast India, but Bon Po religious tradition cites Iranian or Central Asian origins. What is generally agreed upon is that it is ultimately derived from the Brahmi script, as evidenced by its syllabic structure, its use of diacritics to modify the vowel in a syllable, and its typically Brahmic canonical arrangement of the letters in phonological groups.
There are a number of different styles of writing Tibetan, which can be grouped into two main variants: dbu-can 'with a head', which is the most commonly used and is the less cursive of the two, and dbu-med 'headless' which includes the relatively careful dpe-yig 'book writing' or the rapid nkhyug-yig 'running writing'.
The Tibetan alphabet is an abugida used to write the Tibetic languages such as Tibetan, as well as Dzongkha, Sikkimese, Ladakhi, and sometimes Balti. The printed form of the alphabet is called uchen script while the hand-written cursive form used in everyday writing is called umê script.
The alphabet is very closely linked to a broad ethnic Tibetan identity, spanning across areas in Tibet, Bhutan, India, Nepal. The Tibetan alphabet is of Indic origin and it is ancestral to the Limbu alphabet, the Lepcha alphabet, and the multilingual 'Phags-pa script. ...
The creation of the Tibetan alphabet is attributed to Thonmi Sambhota of the mid-7th century. Tradition holds that Thonmi Sambhota, a minister of Songtsen Gampo (569-649), was sent to India to study the art of writing, and upon his return introduced the alphabet. The form of the letters is based on an Indic alphabet of that period.
Three orthographic standardizations were developed. The most important, an official orthography aimed to facilitate the translation of Buddhist scriptures, emerged during the early 9th century. Standard orthography has not altered since then, while the spoken language has changed by, for example, losing complex consonant clusters. As a result, in all modern Tibetan dialects, in particular in the Standard Tibetan of Lhasa, there is a great divergence between current spelling (which still reflects the 9th-century spoken Tibetan) and current pronunciation. This divergence is the basis of an argument in favour of spelling reform, to write Tibetan as it is pronounced, for example, writing Kagyu instead of Bka'-rgyud. In contrast, the pronunciation of the Balti, Ladakhi and Burig languages adheres more closely to the archaic spelling.
Tibetan is an abugida, ie. consonants carry an inherent vowel sound a that is overridden using vowel signs. See the table to the right for a brief overview of features, taken from the Script Comparison Table.
Text runs horizontally from left to right.
There are various different Tibetan scripts, of two basic types: དབུ་ཅན་ dbu␣ʧn␣ (dbu can), pronounced uchen (with a head), and དབུ་མེད་ dbu␣med␣, pronounced ume (headless). This page concentrates on the former. Pronunciations are based on the central, Lhasa dialect.
Traditional Tibetan text was written on pechas (དཔེ་ཆ་ dpe␣ʧʰ␣ (dpe cha)), loose-leaf sheets. Some of the characters used and formatting approaches are different in books and pechas.
One of the key distinguishing features of Tibetan is the set of separate code points for subjoined consonants, used where a syllable has multiple consonants. Of the 77 combining characters in the Tibetan block, 48 represent subjoined consonant forms. There is no virama for normal Tibetan writing.
Most Tibetan syllables contain stacked consonants or vowel signs, and very often they contain both together.
The Tibetan script characters in Unicode 10.0 are contained in a single block (not counting shared characters, such as punctuation):
Follow these links for information about characters used by languages associated with this script. The numbers in parentheses are for non-ASCII characters.
For character-specific details see Tibetan character notes.
Tibetan text is written horizontally, left to right.
Consonants carry an inherent vowel usually transcribed as a. So ཀ is pronounced ka.
The inherent vowel is not normally pronounced at the end of a word.
Word-internally, consonants with no intervening vowel are stacked (see clusters).
To produce a different vowel than the inherent one, Tibetan attaches vowel-signs to the preceding consonant, eg. ཀི ki.
Standard Tibetan has only 5 vowels, including the inherent vowel, and so only 4 vowel-signs.
Characters that produce vowel signs are all combining characters.
All vowel-signs are typed and stored after the base consonant or stacked consonant. The font takes care of the glyph positioning. In the example སྤྱིར་ sp̰y̰ir␣ ʧí general the vowel sign that appears above the stack is typed after the three consonants that make up the stack.
See also vocalics.
For transcription of Sanskrit text or foriegn words, principally from Chinese and Mongolian, the Unicode block provides an additional set of code points.
ཱ [U+0F71 TIBETAN VOWEL SIGN AA] is used to lengthen vowels. The 3 last items in the list above are discouraged by the Unicode Standard, and decomposed in Unicode Normalisation Form C (NFC). Instead you should use the following pairs.
The vowel-signs must always be typed and stored after the consonant characters they surround, and in left to right order.
The following list shows where vowel-signs are positioned around a base consonant to produce vowels, and how many instances of that pattern there are. The figure after the + sign represents combinations of Unicode characters,
Standalone vowels can be written by attaching vowel-signs to འ [U+0F60 TIBETAN LETTER -A] or ཨ [U+0F68 TIBETAN LETTER A]. See achung.
The phonological realisation for འ [U+0F60 TIBETAN LETTER -A] (called འ་ཆུང་ à␣ʧʰuŋ␣ ('a chung)) and ཨ [U+0F68 TIBETAN LETTER A] (called ཨ་ཆེན་ ạ␣ʧʰen␣ (a chen)) is a. In the Lhasa dialect, the former has a high and the latter a low tone.
Both 'a-chung and a-chen can be used with vowel signs, in which case the a sound is replaced by that of the vowel.
'A-chung can also represent a nasal, so མཚམས་ mʦʰms␣ (mtshams) boundary and མཐུན་ mtʰun␣ agreement are often written འཚམས་ and འཐུན་.
'A-chung may also nasalise the juncture of two morphemes, as in དགེ་འདུན་ dge␣àdun␣ (dge 'dun), pronounced ɡenyn.
Other than loanwords, Tibetan only allows diphthongs in diminutive expressions. 'A-chung is used to write these, as in the following: མི་ mi person → མེའུ་ meàu␣ (me'u) dwarf; རྡོ་ rd̰o␣ stone → རྡེའུ་ rd̰eàu␣ (rde'u) pebble.
A subjoined 'a-chung is used to express long vowels in loan words (Tibetan doesn't have them natively), such as those borrowed from Chinese, Hindi and Mongolian. For example, ཏཱ་བླ་མ་ tà␣bl̰␣m␣ (tā bla ma) grand lama (ta from Chinese), and ཤྲཱི་ ʃr̰ài␣ (śrī) wealth from Sanskrit. For this purpose you should use ཱ [U+0F71 TIBETAN VOWEL SIGN AA], and not ྰ [U+0FB0 TIBETAN SUBJOINED LETTER -A].
The Unicode Standard says of SUBJOINED LETTER -A:
U+0FB0 TIBETAN SUBJOINED LETTER -A ( a-chung ) should be used only in the very rare cases where a full-sized subjoined a-chung letter is required. The small vowel lengthening a-chung encoded as U+0F71 TIBETAN VOWEL SIGN AA is far more frequently used in Tibetan text, and it is therefore recommended that implementations treat this character (rather than U+0FB0) as the normal subjoined a-chung.
Finally, 'a-chung can be used to disambiguate the location of an inherent vowel in a syllable. The sequence དག་ dg␣ dàg I is interpreted as CVC. To express CCV add 'a-chung, eg. དགའ་ dgà␣ gà virtue.
Tibetan vocalics are used only for transcription of Sanskrit.
The following precomposed code points represent the vocalic vowels.
However, the Unicode Standard discourages the use of these precomposed forms (strongly discouraging the last two), and recommends the following sequences instead.
The R and L vowels are decomposed in NFC, but the RR and LL vowel code points are not, nor do they decompose in NFD.
Native Tibetan words use 30 consonants, but the Tibetan block contains many more (see transliteration).
There is a basic and a subjoined version of each consonant. These are shown in pairs in this list.
A stack has a consonant character at the top (although it may actually be slightly squeezed or adapted slightly in shape), and one or more special subjoined consonant characters beneath it.
The topmost consonant in a stack always uses the standard character from the Unicode Tibetan block regardless of whether it is a root consonant or not, and consonants below it always use a character from the subjoined range.
See this example from the Unicode Standard of the word སྤྱིར་ sp̰y̰ir␣ ʧí general, which shows a stack with three consonants.
Unlike Indic scripts, there is no virama (or halant) used for native Tibetan text. Instead, just a full and subjoined form of each consonant. The subjoined forms are combining characters. Avoiding the virama makes sense because the virama is not used by Tibetans, and the approach taken makes it easier to create the large number of stacks contained in Tibetan text.
Tibetan uses the word 'head' to refer to either the top-most consonant (ie. spacially) or the root consonant of a syllable, which may be a subjoined consonant. We therefore avoid this term here, and say 'root' or 'topmost'.
The following list shows the order in which characters should be typed, and stored in memory, for a set of stacked characters.
In transliterated text consonants are sometimes stacked in ways that are not allowed in native Tibetan text.
@@@@ Where used, the character ༹ [U+0F39 TIBETAN MARK TSA -PHRU] occurs immediately after the consonant it modifies
The pronunciation of Tibetan words is typically much simpler than the orthography, which involves patterns of consonants. These reduce ambiguity and can affect pronunciation and tone.
The primary consonant is called the root consonant (or radical), and the other consonants in the syllable (which normally has up to 6 consonants in total) annotate or modify it. The following rules help identify the root:
a consonant with a vowel is always the root, unless it is the phrase connector འི, and letters with superscripts or subscripts are root consonants.
in a 2-consonant syllable with no vowel, the first consonant is always the root
in a 3-consonant syllable where the last consonant is not ས [U+0F66 TIBETAN LETTER SA], the second consonant is likely to be the root.
in a 4-consonant syllable, the second consonant is always the root.
The following diagram shows characters in all of the syllabic positions, and lists the characters that can appear in each of the non-root locations. The word is འགྲེམས་སྟོན་ àgr̰ems␣st̰on␣ ɖɹem-ton exhibition.
Characters in the prefix position are not pronounced, but de-aspirate aspirated root characters and give a higher tone value to nasal root characters. The consonant ག [U+0F42 TIBETAN LETTER GA] may occur before 11 root characters, ད [U+0F51 TIBETAN LETTER DA] before 6, བ [U+0F56 TIBETAN LETTER BA] before 10, མ [U+0F58 TIBETAN LETTER MA] before 11, and འ [U+0F60 TIBETAN LETTER -A] before 10, eg. འཁོར་ལོ་ àkʰor␣lo␣ kor-lo wheel, བསད་ bsd␣ (bsad) sɛ́ killed.
Characters in the suffix position have one of the following effects:
add their own sound ( ག ང བ མ འ ར ) , eg. དག་ dg␣ dàg I.
modify the root's vowel value ( ད ས ), eg. བསད་ bsd␣ (bsad) sɛ́ killed.
both of the above ( ན ལ ), eg. བདུན་ bdun␣ dỳn seven.
Only two characters can appear in the secondary suffix location, according to Tibetan grammar, ས [U+0F66 TIBETAN LETTER SA] and ད [U+0F51 TIBETAN LETTER DA], and the latter is no longer officially found in modern Tibetan. A character in this position adds no sound and nor does it affect the sounds in the rest of the syllable, eg. བསྒྲུབས་ bsg̰r̰ubs␣ ɖɹúb established, and གྱུརད་ gy̰urd␣ kjùr became.
The three characters that appear in the superscript location raise the tone pitch of the syllable, but are not pronounced themselves. Each superscript character can only be used with a specified set of root characters.
Note that RA has a shape slightly different from its nominal shape in all combinations except རྙ and རླ. You should still use the normal RA character for the superscript. The font will make the needed adjustments of shape.
The four characters that can appear in the subscript location are also each combined with a particular subset of root characters and have different effects.
Note that three of the subscripts have shapes that are significantly different from the nominal shape of the character they represent.
Uniquely, WA can also appear as a sub-subscript as in གྲྭ་ gr̰w̰␣ (grwa).
Most consonants translate to the same basic sound unless they are modified by surrounding letters as mentioned above. In some cases, however, the pronunciation of a consonant is irregular. In particular, b is sometimes pronounced w, eg. རེ་བ་ re␣b␣ re-wa hope, དབང་ཆ་ dbŋ␣ʧʰ␣ wang-ʧa power, and some words have an additional nasalisation which is not shown, eg. ད་ལྟ་ d␣lt̰␣ dan-ta now.
Many of the extra consonants (and other characters) are used for transliteration of other languages, principally Sanskrit and Chinese. These include the retroflex and voiced aspirated consonants. A couple of characters are extensions for Balti.
ར [U+0F62 TIBETAN LETTER RA] at the top of a stack usually has a reduced form, eg. རྐ rka. For transliterations it is sometimes desirable to retain the full form of RA where in Tibetan words it would be reduced.
To do this use ཪ [U+0F6A TIBETAN LETTER FIXED-FORM RA] instead of the normal RA, but only where the normal RA would not produce the full form anyway, ie. do not use eg. རྙ rnya, which has the full form already.
There are also fixed form variants of subjoined YA and WA.
A set of precomposed characters exists for representing aspirated sounds, and the Sanskrit diphone kʃ. (The default webfont renders the subjoined forms as if they were two separate code points.)
These characters are decomposed under Normalization Form C, and the Unicode Standard recommends that these letters should always be represented by those decomposed forms.
The retroflex consonants, which are reversed versions of Tibetan consonant shapes, are often used to distinguish loan words from sequences of Tibetan syllables. For example, ཁ་ཎ་ཌ་ kʰ␣ɳ␣ɖ␣ (kha-ṇa-ḍa) Canada, མོ་ཊ་ mo␣ʈ␣ (mo-ṭa) car.
ཿ [U+0F7F TIBETAN SIGN RNAM BCAD] ( nam chay ) is the visarga, and ཾ [U+0F7E TIBETAN SIGN RJES SU NGA RO] ( ngaro ) is the anusvara.
༹ [U+0F39 TIBETAN MARK TSA -PHRU] is an integral part of the three consonants ཙ [U+0F59 TIBETAN LETTER TSA], ཚ [U+0F5A TIBETAN LETTER TSHA] , and ཛ [U+0F5B TIBETAN LETTER DZA]. Although those consonants are not decomposable, this mark has been abstracted and may by itself be applied to ཕ [U+0F55 TIBETAN LETTER PHA] (ie. ཕ༹) and other consonants to make new letters for use in transliteration and transcription of other languages. For example, in modern literary Tibetan, it is one of the ways used to transcribe the Chinese “fa” and “va” sounds not represented by the normal Tibetan consonants.
Also used to represent tsa , tsha , and dza in abbreviations.
This code point should be used immediately after the consonant it modifies, even if that consonant is followed by a subjoined consonant.
Two characters are provided for use with Balti.
The following are characters with the general Unicode property of mark, excluding those combining marks which are described in the sections relating to subjoined consonants and vowel signs.
These are characters with the general Unicode property of punctuation. They are described elsewhere on this page.
These characters have the general Unicode property of symbol. They are described elsewhere on this page.
☸ [U+2638 WHEEL OF DHARMA] which occurs sometimes in Tibetan texts is encoded in the Miscellaneous Symbols block.
Tibetan has its own set of numbers. My Chinese publication, however, uses european digits.
༾ [U+0F3E TIBETAN SIGN YAR TSHES] and ༿ [U+0F3F TIBETAN SIGN MAR TSHES] are paired characters used in combination with digits.
By some interpretations, the following shapes each have the value of 0.5 less than the number within which it appears. Used only in some traditional contexts, they appear as the last digit of a multidigit number, eg. ༤༬ represents 42.5. These are very rarely used, however, and other uses have been postulated. For more information see Numbers that Don't Add Up : Tibetan Half Digits, by Andrew West.
Tibetan requires many rules to position glyphs correctly, and also to shape characters according to context.
Combining characters need to be placed in different positions, according to the context. The example below shows the same vowel sign displayed at different heights, according to what stacks above it.
Glyphs in Tibetan script need to be adapted sometimes to suit the context in which the character is used. A particularly prevalent example is that of the letter ར [U+0F62 TIBETAN LETTER RA]. When used at the top of a stack it has an abbreviated form, as shown by the grey highlight in the example below on the left.
The example on the right shows what a normal RA looks like. This is the same underlying character. The shape is determined by rules in the font.
In pechas, Tibetan text is written inside a visible box which defines the margin of the page. In more recent publications this box may be invisible. Modern publications also use paragraphs. The initial line of a new paragraph may be indented.
Word boundaries within a section are not indicated. Only 'syllables', known as tsheg-bar tsek bar, are separated by the tsek character, ་ [U+0F0B TIBETAN MARK INTERSYLLABIC TSHEG].
Key divisions of the text are sections (or expressions (brjod-pa)) and topics (don-tshan), which do not necessarily equate to English phrases, sentences and paragraphs.
Sections normally end with a shay, ། [U+0F0D TIBETAN MARK SHAD], followed by a space. Topics (eg. headlines, verses, and longer paragraphs) are often terminated or separated with shay+space+shay.
Unicode provides ༎ [U+0F0E TIBETAN MARK NYIS SHAD] as a means of regularising the spacing between the two shad marks, which tends to be slightly bigger than a normal space. The space between the shad marks can be stretched during justification, however, and it's not clear to me how that would work when using NYIS SHAD.
In a Chinese magazine publication I have, most articles contain no double shay as a delimiter. (The text is formatted in paragraphs.) I did find a double shay at the very end of one of the articles, and it was used at the end of each line on a page containing some verse-formatted folk literature. The same appears to apply for large parts of the Bhutanese newspapers I have, however there are other pages with plenty of double shays - some at the end of paragraphs, some inside paragraphs.
A line that ends with the root consonant ཀ [U+0F40 TIBETAN LETTER KA] or ག [U+0F42 TIBETAN LETTER GA] will normally swallow up the shay that immediately follows it, even if there is a vowel sign. For example, where you might expect to see a double shay, you might see ཀུ ། and སྐུ །. However, the shad is not omitted if these characters have a subscript, eg. གྲུ། །.
The tsek is not used before a shay, except after ང [U+0F44 TIBETAN LETTER NGA]. For example, note the end of the three sections in this example:
Users may use an ordinary TSHEG between NGA and SHAD, but Unicode also provides a special non-breaking character that can be used instead, ༌ [U+0F0C TIBETAN MARK DELIMITER TSHEG BSTAR]. The word 'delimiter' in the name is a misnomer.
Whitespace in Tibetan text should use U+00A0 NO-BREAK SPACE. Spaces in Tibetan text are usually wider than spaces in English text, and typically only occur after one of the following: །, ༑, ༔ or ཿ. However, numbers and embedded Western text are surrounded by smaller spaces, eg. ལོ་ ༢༠༠༡ ཤིང་བྱ་ཟླ་ ༩ ཚེས་ ༥ ཉིན་. Looks like this is also something that the application needs to take care of.
༈ [U+0F08 TIBETAN MARK SBRUL SHAD] is used to separate texts that are equivalent to topics and subtopics, such as the start of a smaller text, the start of a prayer, a chapter boundary, or to mark the beginning and end of insertions into text in pechas.
This drul-shay is usually surrounded on both sides by the equivalent of about three non-breaking spaces (though no rule is specified). The drul-shay should not appear at the beginning of a new line and the whole structure of spacing-plus- shay needs to be kept together.
྾ [U+0FBE TIBETAN KU RU KHA] (often repeated three times) indicates a refrain.
༼ [U+0F3C TIBETAN MARK ANG KHANG GYON] and ༽ [U+0F3D TIBETAN MARK ANG KHANG GYAS] are paired punctuation used to form a roof over one or more digits or words. The right-hand character can also be used much like a single parenthesis in list counters.
༾ [U+0F3E TIBETAN SIGN YAR TSHES] and ༿ [U+0F3F TIBETAN SIGN MAR TSHES] are also paired characters used in combination with digits.
In traditional, loose-leaf Tibetan pechas a head mark or yig-mgo (yig go) is used at the beginning of the front of the folio so that you can tell which is the front.
Head marks are also used in both pechas and books to indicate the start of a headline or the start of the first paragraph in a longer text.
Head marks differ from text to text. The Unicode Standard provides a number of characters to give some basic coverage, but may not meet all needs.
A common head mark is ༄ [U+0F04 TIBETAN MARK INITIAL YIG MGO MDUN MA], and there is also the extension character ༅ [U+0F05 TIBETAN MARK CLOSING YIG MGO SGAB MA]. A head mark can be written alone, or can be followed by as many as three closing marks; head marks are also followed by two shads, eg.༄༅། །.
Three less common head marks, used in Nyingmapa and Bonpo literature, are also represented in the Tibetan block, namely:
༁ [U+0F01 TIBETAN MARK GTER YIG MGO TRUNCATED A]
༂ [U+0F02 TIBETAN MARK GTER YIG MGO -UM RNAM BCAD MA]
༃ [U+0F03 TIBETAN MARK GTER YIG MGO -UM GTER TSHEG MA]
༵ [U+0F35 TIBETAN MARK NGAS BZUNG NYI ZLA] and ༷ [U+0F37 TIBETAN MARK NGAS BZUNG SGOR RTAGS] can be used to create a similar effect to underlining or to mark emphasis.
The use of these marks is not straightforward, since they attach to a syllable rather than a character and therefore to place them correctly the application needs to take syllable boundary positions into account. If entered as combining characters they can be added after the vowel-sign in a stack.
Application software has to ignore these characters for text processing, such as search and collation.
Alternative methods of emphasis include use of a different colour, or the use of the prefix ༸ [U+0F38 TIBETAN MARK CHE MGO].
These characters may also be used in interspersed commentaries to tag the root text that is being commented on. An alternative is to set the tsek-bar being commented on in large type and the commentary in small type.
Modern texts appear to use bolding on text.
༴ [U+0F34 TIBETAN MARK BSDUS RTAGS] means 'etc.', and is used after the first few tsek-bar of a recurring phrase.
Normally, Tibetan only breaks after the tsek, and doesn't break after spaces.
Tibetan never breaks inside a syllable, and has no hyphenation. If a word is composed of multiple syllables, it is also preferable to avoid breaking a line in the middle of the word.
Line breaks do not occur after a tsek when it follows ང [U+0F44 TIBETAN LETTER NGA] (with or without a vowel sign) and precedes a shay, ། [U+0F0D TIBETAN MARK SHAD]. The Unicode Standard also talks of other instances where Tibetan grammatical rules do not permit a break, but it isn't clear what those are.
If the character after NGA is an ordinary INTER-SYLLABIC TSHEG, then applications need to ensure that lines do not break between the TSHEG and the SHAD. Text is likely to be more portable if content authors use the TSHEG BSTAR in these locations, instead of the normal TSHEG.
Line breaks are also possible after:
A line must never start with a shad.
Line breaks and rin chen spungs shad. In Tibetan, especially in pechas, it is considered a special case if the last syllable of an expression that is terminated by a shay breaks onto a new line. In that case the shay or double shay is replaced by rin chen spungs shad, ༑ [U+0F11 TIBETAN MARK RIN CHEN SPUNGS SHAD]. At the end of a topic the rules say that only one shay should be converted, ie. ༑ །, however it is moderately popular to convert both, ie. ༑ ༑. This change serves as an optical indication that there is a left-over syllable at the beginning of the line that actually belongs to the preceding line.
This varies in the following cases:
In an environment where the width or content of the page can change, this feature poses a problem for the content author. The application needs to be able to automatically switch between the two styles of shad as a syllable moves on or off a new line when the page is resized or when preceding content is modified.
The Unicode Standard adds: "Not only is rin-chen-spungs-shad used as the replacement for the shay but a whole class of “ornamental shays ” are used for the same purpose. All are scribal variants on a rin-chen-spungs-shad, which is correctly written with three dots above it."
There appear to be two alternative methods of justification.
Method 1: inter-character spacing. Spacing between all characters should be adapted equally. Note that the width of the white-space character should not be changed significantly, so Tibetan texts use the non-breaking space mentioned above, which doesn't change width on justification.
Method 2: tsek padding. While hand writing, authors add small spaces across the text to get the line end as near as possible to the right margin. Where space remains at the margin, it may be left as is, if it is short. Otherwise, the remaining space will be filled with tseks to make the line as flush as possible with the right margin (there will usually still be a slight raggedness to the right edge of the text).
There are a couple of detailed rules about the use of tsek padding. Justifying tseks are almost always used when the line ends in a tsek. If, however, the line ends in a shay, there are a number of alternatives.
If the line ends with a single shay the shay is followed by spaces. Tsek padding is never applied after spaces. (See examples in the figure above.)
If the line ends in a double shay (with space between), it is unusual (though possible) to add tsek padding. Instead, the space between the shays is stretched or narrowed. (See examples in the figure below.) The same applies if the second shay was removed because it was preceded by a KA or GA.
Use the control below to see how your browser justifies the text sample here. The gaps are all no-break spaces.
སྐྱེ་བོ་ཐམས་ཅད་ཁྲིམས་ཀྱི་མདུན་སར་འདྲ་མཉམ་ཡིན་པ་དང༌། ཁྲིམས་ཀྱི་ཐོག་ནས་དབྱེ་འབྱེད་མེད་པར་འདྲ་མཉམ་གྱི་རྒྱབ་གཉེར་སྲུང་སྐྱོབ་བྱེདདགོས་པའི་ཐོབ་དབང་ཡོད། ཚང་མས་གསལ་བསྒྲགས་ཀྱི་སྙིང་དོན་འདི་ལས་འགལ་བའི་དབྱེ་འབྱེད་ཀྱི་གནས་ཚུལ་ལ་ངོ་རྒོལ་གང་ཡང་བྱེད་པ་དང༌། འདི་ལྟ་བུའི་དབྱེ་འབྱེད་ཀྱི་ངན་འགུལ་ལའང་ངོ་རྒོལ་གང་ཡང་བྱེད་པ་བཅས་ལ་རྒྱབ་གཉེར་བྱེད་དགོས་པའི་ཐོབ་དབང་འདྲ་མཉམ་དུ་ཡོད༎
༽ [U+0F3D TIBETAN MARK ANG KHANG GYAS] can be used much like a single parenthesis in list counters.
When text in smaller annotations or larger heading text is mixed with normal text, the letter-heads of all characters should align to the same height.
༶ [U+0F36 TIBETAN MARK CARET -DZUD RTAGS BZHI MIG CAN] and ྿ [U+0FBF TIBETAN KU RU KHA BZHI MIG CAN] are used to indicate where text should be inserted within other text or as references to footnotes and marginal notes.
Further information needed for this section includes:
Glyph shaping & positioning Cursive text Context-based shaping Multiple combining characters Context-based positioning Transforming characters Structural boundaries & markers Grapheme, word & phrase boundaries Hyphens & dashes Bracketing information Quotations Abbreviations, ellipsis, & repetition Emphasis & highlights Inline notes & annotations Inline layout Inline text spacing Bidirectional text Line & paragraph layout Text direction Line breaking Hyphenation Text alignment & justification Counters, lists, etc. Styling initials Baselines & inline alignment Page & book layout General page layout & progression Directional layout features Grids & tables Notes, footnotes, etc. Forms & user interaction Page numbering, running headers, etc.