Thai

Updated Tue 15 Oct 2017 • tags thai, scriptnotes

This page provides information about the characteristics of the script used to write Thai, as described in the Writing Systems Tutorial. It is not intended to be exhaustively scientific – merely to give a basic idea of the essential features of the script.

Click on the orange text in the table to the right to see more details about that aspect of the script. Click on red text in the main sample area to see a list of code points for that text.

Sample (Thai)

ณ ยามที่โลกต้องการเอ่ยถ้อยคำใดๆ โลกจะใช้เพียง Unicode เราจึงขอเชิญชวนท่านรีบลงทะเบียนงาน International Unicode Conference ครั้งที่ 10 ซึ่งจะจัดให้มีขึ้น ณ เมือง Mainz ประเทศเยอรมัน ในระหว่างวันที่ 10-12 มีนาคม ค.ศ. 1997 เสียแต่บัดนี้ โดยในงานประชุมดังกล่าว ท่านจะมีโอกาสได้พบกับบรรดาผู้เชี่ยวชาญจากธุรกิจอินเตอร์เน็ตและ Unicode ธุรกิจ Internationalization และ Localization จากทุกมุมทั่วโลก พร้อมรับทราบการใช้ประโยชน์จาก Unicode ร่วมกับระบบปฏิบัติการและโปรแกรมต่างๆ ฟอนต์ รูปแบบข้อความ รวมทั้งวิทยาการด้านคอมพิวเตอร์ในภาษาต่างๆ

Script name

Thai

Script type abugida
Number of characters 87
Case distinction no
Combining characters 16
Multiple combining characters yes
Context-based positioning yes
Contextual shaping no
Cursive script no
Many more glyphs than characters? no
Text direction ltr
Baseline mid
Space is word separator no
Wraps at word
Justification distributed
Non-ASCII digits? yes
Other height

Click on the orange text in the features list (right column) to see examples and notes. Click on highlighted text in the Sample section to see the characters. Click on the vertical blue bar, bottom right, to change font settings.

Context-based positioning

Combining characters need to be placed in different positions, according to the context.

The highlighted example shows the same tone character displayed at different heights, according to what falls beneath it.

Click on the highlighted text in the Sample section to see the characters that make up this example.

Multiple combining characters

Thai regularly combines multiple combining characters above a base consonant. There are two examples in the text below, both of which show a base character with a vowel sign and then a tone mark on top.

Click on the highlighted text in the Sample section to see the characters that make up this example.

Line height

Thai places vowel and tone marks above base characters, one above the other, and can also add combining characters below the line. The complexity of these marks means that the vertical resolution needed for clearly readable Thai text is higher than for, say, Latin text. In addition, Thai tends to adds more interline spacing than Latin text does.

Here is an example of combining characters above and below base characters:

Click on the highlighted text in the Sample section to see the characters that make up this example.

Spaces

Thai words are not separated by spaces or any other character, within a phrase. Spaces do have a function in Thai text, but it is to separate phrases or sentences – they are the equivalent of the comma or period. Although the boundaries are not demarcated, there is a concept of words in the text, for example, lines are supposed to be broken at word boundaries. The following text shows the word boundaries in the example highlighted above:

Text wrapping

Thai doesn't indicate word boundaries, but when Thai text is wrapped at the end of a line you should not split a word. This normally requires an application to look up in a dictionary where the word boundaries fall. As you change the width of the browser window the highlighted text above should break at the following points if you browser supports Thai wrapping:

Justification

Justification in Thai adjusts blank spaces, but also makes certain adjustments to inter-character spacing. Browsers currently tend not to justify Thai text well.

Use the control below to see how this browser justifies Thai.

Character list

The Thai script characters in Unicode 7.0 are contained in a single block:

The following is an incomplete list of languages and the number of characters they use, per version 26 of CLDR's lists of characters (exemplarCharacters).

First published 16 Dec 2002. This version 2017-10-15 9:48 GMT.  •  Copyright r12a@w3.org. Licence CC-By.