Tibetan

Updated Tue 21 Oct 2014 • tags tibetan, scriptnotes

This page provides information about the characteristics of the script used to write Tibetan. It is not intended to be exhaustively scientific – merely to give a basic idea of the essential features of the script. For more detail see Tibetan Script Notes.

Sample (Tibetan)

དོན་ཚན་དང་པོ། (Article 1) འགྲོ་བ་མིའི་རིགས་རྒྱུད་ཡོངས་ལ་སྐྱེས་ཙམ་ཉིད་ནས་ཆེ་མཐོངས་དང༌། ཐོབ་ཐངགི་རང་དབང་འདྲ་མཉམ་དུ་ཡོད་ལ། ཁོང་ཚོར་རང་བྱུང་གི་བློ་རྩལ་དང་བསམ་ཚུལ་བཟང་པོ་འདོན་པའི་འོས་བབས་ཀྱང་ཡོད། དེ་བཞིན་ཕན་ཚུན་གཅིག་གིས་གཅིག་ལ་བུ་སྤུན་གྱི་འདུ་ཤེས་འཛིན་པའི་བྱ་སྤྱོད་ཀྱང་ལག་ལེན་བསྟར་དགོས་པ་ཡིན༎

Script name

Tibetan

Script type abugida
Number of characters 211
Case distinction? no
Combining characters 77
Multiple combining characters yes
Context-based positioning yes
Contextual shaping yes
Cursive script no
Many more glyphs than characters? no
Text direction ltr
Baseline high
Space is word separator no
Wraps at syllable
Justification character/tsek-padding
Non-ASCII digits? yes
Other height

Click on the orange text in the features list (right column) to see examples and notes. Click on highlighted text in the Sample section to see the characters. Click on the vertical blue bar, bottom right, to change font settings.

Context-based positioning

Combining characters need to be placed in different positions, according to the context.

The highlighted example shows the same vowel sign displayed at different heights, according to what stacks above it.

Multiple combining characters

Most Tibetan syllables contain stacked consonants or vowel signs, and very often they contain both together. An example is the text below, which shows a syllable with an initial stack of three consonants plus a vowel sign.

Click on the highlighted text in the Sample section to see the composition.

Contextual shaping

Glyps in Tibetan script need to be adapted sometimes to suit the context in which the character is used. A particularly prevalent example is that of the letter RA. When used at the top of a stack it has an abbreviated form, as shown by the grey highlight in the example below on the left.

The example on the right shows what a normal RA looks like. This is the same underlying character. The shape is determined by rules in the font.

Click on the highlighted text in the Sample section to see the characters that make up these syllables.

Line height

Tibetan script stacks consonants, and sometimes adds vowel signs above and below the stack. This means that the vertical resolution needed for clearly readable Tibetan text is higher than for, say, Latin text.

Here is an example of a syllable that contains three stacked consonant, plus a vowel sign above. (In theory, Tibetan can stack even more consonants than this.)

Click on the highlighted text in the Sample section to see the composition.

Spaces

Whitespace in Tibetan text should use U+00A0 NO-BREAK SPACE. Spaces in Tibetan text are usually wider than spaces in English text, and typically only occur after one of the following: , , or ཿ. However, numbers and embedded Western text are surrounded by smaller spaces, eg. ལོ་ ༢༠༠༡ ཤིང་བྱ་ཟླ་ ༩ ཚེས་ ༥ ཉིན་.

Tibetan uses the tsek character like a space, but it separates syllables rather than words.

Text wrapping

Tibetan indicates syllable boundaries using the tsek character, and doesn't use spaces to delimit words. Normally, Tibetan only breaks after the tsek, and doesn't break after spaces. There are, however, some additional rules which prevent breaking after a tsek in some circumstances, and allow breaks after other characters in some more. (See more details.)

As you change the width of the browser window the highlighted text above should break at the following points if your browser supports Tibetan wrapping:

Justification

Tibetan justifies text in one of two ways: either by adjusting inter-character spacing, or by tsek padding.

For the first method, spacing between all characters should be adapted equally. Note that the width of the white-space character should not be changed significantly, so Tibetan texts use the non-breaking space mentioned above, which doesn't change width on justification.

Using the second method, authors add small spaces across the text to get the line end as near as possible to the right margin. Where space remains at the margin, it may be left as is, if it is short. Otherwise, the remaining space will be filled with tseks to make the line as flush as possible with the right margin (there will usually still be a slight raggedness to the right edge of the text). There are additional rules involved for some characters.

A page of a booklet showing tsek padding.

Use the following control to see how this browser justifies Tibetan.

Character list

The Tibetan script characters in Unicode 7.0 are contained in a single block:

The following is an incomplete list of languages and the number of characters they use, per version 26 of CLDR's lists of characters (exemplarCharacters).

Digits

These are characters that represent digits in the Tibetan block:

0 1 2 3 4 5 6 7 8 9
༠ ༡ ༢ ༣ ༤ ༥ ༦ ༧ ༨ ༩

 

First published 21 Oct 2014. This version 2015-02-16 10:40 GMT.  •  Copyright r12a@w3.org. Licence CC-By.