Thai script summary

Updated 01-May-2019 • tags thai, scriptnotes

This page provides basic information about the Thai script. It is not authoritative, peer-reviewed information – these are just notes I have gathered or copied from various places as I learned. For character-specific details follow the links to the Thai character notes.

For similar information related to other scripts, see the Script comparison table.

Clicking on red text examples, or highlighting part of the sample text shows a list of characters, with links to more details. Click on the vertical blue bar (bottom right) to change font settings for the sample text.

Sample

ข้อ 1 มนุษย์ทั้งหลายเกิดมามีอิสระและเสมอภาคกันในเกียรติศักด[เกียรติศักดิ์]และสิทธิ ต่างมีเหตุผลและมโนธรรม และควรปฏิบัติต่อกันด้วยเจตนารมณ์แห่งภราดรภาพ

ข้อ 2 ทุกคนย่อมมีสิทธิและอิสรภาพบรรดาที่กำหนดไว้ในปฏิญญานี้ โดยปราศจากความแตกต่างไม่ว่าชนิดใด ๆ ดังเช่น เชื้อชาติ ผิว เพศ ภาษา ศาสนา ความคิดเห็นทางการเมืองหรือทางอื่น เผ่าพันธุ์แห่งชาติ หรือสังคม ทรัพย์สิน กำเนิด หรือสถานะอื่น ๆ อนึ่งจะไม่มีความแตกต่างใด ๆ ตามมูลฐานแห่งสถานะทางการเมือง ทางการศาล หรือทางการระหว่างประเทศของประเทศหรือดินแดนที่บุคคลสังกัด ไม่ว่าดินแดนนี้จะเป็นเอกราช อยู่ในความพิทักษ์มิได้ปกครองตนเอง หรืออยู่ภายใต้การจำกัดอธิปไตยใด ๆ ทั้งสิ้น

Usage & history

From Scriptsource:

The Thai script is used primarily for writing the Thai language, as well as Northern Thai, Northeastern Thai, Southern Thai, and Thai Song, which are separate languages. It is also used to write a number of minority languages in Thailand, Laos and China, as well as Pali, which is widely used in Buddhist temples and monasteries. Both the Thai language and script are closely related to Laotian. The script is of Indic origin, derived from Old Khmer.

From Wikipedia:

Thai alphabet (Thai: อักษรไทย; RTGS: akson thai; [ʔ̯àksɔ̌ːn tʰāj]) is used to write the Thai, Southern Thai and other languages in Thailand. ...

The Thai alphabet is derived from the Old Khmer script (Thai: อักษรขอม, akson khom), which is a southern Brahmic style of writing derived from the south Indian Pallava alphabet (Thai: ปัลลวะ).

Thai is considered to be the first script in the world which invented tone markers to indicate distinctive tones, which are lacking in the Mon-Khmer (Austroasiatic languages) and Indo-Aryan languages from which its script is derived. Although Chinese and other Sino-Tibetan languages have distinctive tones in their phonological system, no tone marker is found in their orthographies. Thus, tone markers are an innovation in the Thai language that later influenced other related Tai languages and some Tibeto-Burman languages on the Southeast Asian mainland.

Thai tradition attributes the creation of the script to King Ramkhamhaeng the Great (Thai: พ่อขุนรามคำแหงมหาราช) in 1283, though this has been challenged.

Distinctive features

Thai is an abugida. Consonant letters have an inherent vowel sound. Vowel-signs are attached to the consonant to produce a different vowel. See the table to the right for a brief overview of features, taken from the Script Comparison Table.

Unlike devanagari, multiple vowel-signs may be used with a single character, and those positioned to the left of the consonant(s) are not combining characters.

Like indic scripts, thai is based on orthographic syllables, so the vowel-sign is actually attached to the syllable. An orthographic syllable includes clusters of consonants without intervening vowel sounds. These clusters are typically represented as partially merged forms, called conjuncts.

Thai has no subjoined consonants, nor does it have any code points dedicated to medial or final consonants, although consonants do appear in those positions, eg. โปรแกรม opṟɛk̯ṟm̱ proː krɛːm (computer) program.

Text is written horizontally, left to right.

Character lists

The Thai script characters in Unicode 11.0 are contained in a single block (not counting shared characters, such as punctuation):

Follow these links for information about characters used by languages associated with this script. The numbers in parentheses are for non-ASCII characters.

For character-specific details see Thai character notes.

In yellow boxes, show:

Text direction

Text is written horizontally, left to right.

Vowels

Inherent vowel

Consonants carry an inherent vowel, pronounced o inside a closed syllable, and a in an open syllable. So is pronounced ka.

Vowel absence

Vowel absence after syllable-final consonants is not normally marked in any way. Nor is it marked in syllable-initial clusters.

 ์ [U+0E4C THAI CHARACTER THANTHAKHAT​] can be used above a syllable when it is not pronounced (usually at the end of a syllable), eg. รถเมล์ rotmeː bus, ศักดิ์สิทธิ์ saksitʰ to be sacred. It is often used for foreign loan words, eg. คอมพิวเตอร์ kʰompʰiwtɤː computer, โปสการ์ด poːskaːt postcard, สแตมป์ satɛːm stamp.

Vowel signs

To produce a different vowel than the inherent one, Thai uses one or more vowel-signs, eg. กิ ki.

Vowel signs in Thai are a mixture of combining characters and ordinary spacing characters, Only the superscript and subscript vowel-signs are combining characters.

Vowel-signs can also be combined to create additional sounds.

See also vocalics.

Prescript dependent vowels

เ␣แ␣ใ␣ไ␣โ

Thai uses a visual encoding model. Of the vowel-signs, 5 appear to the left of the onset consonant. These characters are not combining characters, and are typed and stored before the base.

Note that these vowel-signs are placed before the start of the syllable. This means that a word with a consonant cluster at the start separates the prescript vowel from any postscript vowels by more than one consonant character, eg. เปล่า ep̯ḻ¹ā plàw no.

Note also that [U+0E41 THAI CHARACTER SARA AE] should not be typed as two successive [U+0E40 THAI CHARACTER SARA E] characters.

Other vowel signs

ะ␣ั␣า␣ิ␣ี␣ึ␣ื␣ุ␣ู␣ำ

[U+0E30 THAI CHARACTER SARA A] and [U+0E32 THAI CHARACTER SARA AA] are normal spacing characters; the rest are combining characters.

[U+0E33 THAI CHARACTER SARA AM] is classed as a vowel, but also contains the final consonant m, represented by a built-in nikhahit (cf.  ํ [U+0E4D THAI CHARACTER NIKHAHIT​]).

Both of these characters also appear as a part of the complex vowels described below.

Consonants pronounced as vowels

The consonant [U+0E2D THAI CHARACTER O ANG] is also be pronounced as the vowel ɔː when it appears alone after a base consonant.

Many of the vowel combinations involve [U+0E22 THAI CHARACTER YO YAK] and/or [U+0E27 THAI CHARACTER WO WAEN] to create diphthongs.

The consonant [U+0E23 THAI CHARACTER RO RUA] is pronounced as a vowel a when doubled medially, eg. ธรรม tʰam justice. When doubled at the end of a syllable it is pronounced an, eg. กรรไกร kankraj scissors. Note, however, that this may also constitute the end and beginning of two syllables, eg. ภรรยา pʰanrájaː wife.

Vowel-sign combinations

The various vowel-signs described just above can be mixed together with [U+0E2D THAI CHARACTER O ANG], [U+0E22 THAI CHARACTER YO YAK], [U+0E27 THAI CHARACTER WO WAEN], and  ็ [U+0E47 THAI CHARACTER MAITAIKHU​] to produce additional sounds, as shown in the examples below.

-ือ␣-็อ␣-ัว␣-ัย␣-ิว␣-าว␣-าย␣-อย␣-ุย␣-ูย␣-วย␣เ-ะ␣เ-็␣เ-อ␣เ-ิ␣เ-ว␣เ-า␣เ-ย␣แ-ว␣แ-ะ␣แ-็␣โ-ะ␣โ-ย␣ไ-ย␣-็อย␣-ัวะ␣เ-าะ␣เ-อะ␣เ-ือ␣เ-็ว␣เ-ีย␣เ-ียะ␣เ-ือะ␣เ-ียว␣เ-ือย␣ฤๅ␣ฦๅ

Vowel-sign placement

The following list shows where vowel-signs are positioned around a base consonant to produce vowels, and how many instances of that pattern there are. The figure after the + sign represents combinations of Unicode characters,

At maximum, vowel components can occur concurrently on 3 sides of the base.

Distribution of vowel elements is as follows:

  ั   ิ   ี   ึ   ื   ็  ำ
เ แ ใ ไ โ อ ะ า ย ว ๅ ะ ย
    ุ   ู    

Standalone vowels

Standalone vowels are not preceded by a consonant, and may appear at the beginning or in the middle of a word. This typically includes a way to represent the sound of the inherent vowel in isolation.

Thai uses [U+0E2D THAI CHARACTER O ANG] as a base for vowel signs, eg. อิ่ม ʔ̯i¹m̱ ìm to be full, or สะอาด saʔ̯ād̯ sà àːt clean.

[U+0E2D THAI CHARACTER O ANG] on its own represents the same sound as the inherent vowel, eg. อเมริกา ʔ̯em̱ṟik̯ā à meː rì kaː America.

There are no independent vowel letters in Thai,

Vocalics

Vocalics are letters derived from Sanskrit that generally behave like vowels, but represent r/l followed by a vowel. They are often available both as vowel-signs and independent vowel letters.

ฤ␣ฦ␣ๅ

These letters are actually considered to be consonants in Thai.

The long forms of both are created using [U+0E45 THAI CHARACTER LAKKHANGYAO], ie. ฤๅ and ฦๅ. Otherwise, that character doesn't appear alone.

Consonants

The consonants are associated with high, mid, or low classes related to tone values. (Low class consonants are indicated using an underline in the transliteration. The consonants that default to mid class have an inverted breve below the transliteration.)

High class consonants

ข␣ฃ␣ฉ␣ฐ␣ถ␣ผ␣ฝ␣ศ␣ษ␣ส␣ห

A silent [U+0E2B THAI CHARACTER HO HIP] can be added before the following characters to make their default tonal class high, eg. หมา hm̱ā mǎː dog, หยุด hy̱ud̯ jùt to stop.

ง␣ญ␣น␣ม␣ย␣ร␣ล␣ว

See onset_clusters for further details about how these are presented.

Mid class consonants

ก␣จ␣ฏ␣ฎ␣ด␣ต␣บ␣ป␣อ

[U+0E2D THAI CHARACTER O ANG] is silent when used as a base for vowels at the beginning of a syllable. When it appears alone after a base consonant it becomes the vowel ɔː. It is also used in combination with other characters to produce additional vowel sounds (see independent).

Low class consonants

ค␣ฆ␣ฅ␣ง␣ช␣ฌ␣ฑ␣ฒ␣ณ␣ท␣ธ␣น␣พ␣ภ␣ม␣ย␣ญ␣ล␣ฬ␣ร␣ว␣ฮ␣ซ␣ฟ

Syllable-onset clusters

Consonant clusters at the start of a syllable can arise from additional consonants such as [U+0E25 THAI CHARACTER LO LING] and [U+0E23 THAI CHARACTER RO RUA] eg. ปลา p̯ḻā plaː fish, or the silent [U+0E2B THAI CHARACTER HO HIP] used to affect tonal values, eg. หมา hm̱ā mǎː dog.

There are no dedicated code points for these, so it is feasible that ปลา could be pronounced pà laː in a different context.

Tone marks and/or super-/subscript vowel-signs are attached to the second consonant, eg. เปลี่ยน ep̯ḻī¹y̱ṉ plìːan to change.

Prescript vowel-signs are placed before the first consonant in the cluster, ie. at the start of the syllable, eg. โปรแกรม opṟɛk̯ṟm̱ proː krɛːm (computer) program, which does this twice.

Syllable-final consonants

Consonants at the end of a syllable use ordinary code points, eg. ตื่น t̯ɯ̄¹ṉ tɯ̀ːn to wake up.

This can create some ambiguity, since there is no distinction between the sequence in the previous example and one where is a new syllable with an inherent vowel.

The one exception is the character that is normally regarded as a vowel, [U+0E33 THAI CHARACTER SARA AM], which includes the final m sound, eg. ห้องน้ำ h²ʔ̯ŋ̱ṉ²aᵐ hôŋ nám toilet.

A final m is not always represented using sara am, eg. ห้าม h²ām̱ hâːm to forbid.

Consonant clusters

Consonant clusters occur syllable-initially, or where one syllable ends with a consonant and the next begins with one.

There are no special mechanisms in Thai for dealing with clusters, such as conjuncts, stacking, special final consonants, etc.

See also onset_clusters.

Tones

'Checked' means ending in the sound -p, -t, or -k or a short vowel.

่␣้␣๊␣๋

The following chart shows how to tell which tones are associated with a syllable.

Consonant Checked? Tone mark Tone
high checked short ˩˩ low
long ˩˩ low
open - ˩˥ rising
˩˩ low
˥˩ falling
mid checked short ˩˩ low
long ˩˩ low
open - ˧˧ mid
˩˩ low
˥˩ falling
˦˥ high
˩˥ rising
low checked short ˦˥ high
long ˥˩ falling
open - ˧˧ mid
˥˩ falling
˦˥ high

The expected typing and storage position for tone marks is immediately after the base consonant of the syllable, or after a superscript vowel-sign if there is one. However, the tone mark should be typed before [U+0E33 THAI CHARACTER SARA AM], and should be displayed above the nikhahit, eg. ก่ำ.

 

Combining marks

Here we list combining marks other than those previously described under the sections on vowels and tones.

์␣็

  ์ [U+0E4C THAI CHARACTER THANTHAKHAT​] is used to suppress a vowel sound. See vowel_absence.

  ็ [U+0E47 THAI CHARACTER MAITAIKHU​] converts vowels produced by the following three vowel signs to short vowels when they occur in medial position: ɔ –็อ– (ɔː > ɔ), e เ–็– ( > e), eg. เด็ก dèk child, and oːj แ–็– æː > æ (not very common). It is also used for: ew เ–็ว (eːw > ew), eg. เร็ว rew fast. Also one word consists of consonant + short symbol: ก็ .

Other combining marks include the following:

Used in Pali and Sanskrit,   [U+0E4D THAI CHARACTER NIKHAHIT] is not commonly used in Thai, except that when letter spacing Thai text it is necessary to add the space between the circle and the remainder of  ำ [U+0E33 THAI CHARACTER SARA AM]. To do this, the application may split U+0E33 into this character and  า [U+0E32 THAI CHARACTER SARA AA].

  [U+0E4E THAI CHARACTER YAMAKKAN] is an ancient punctuation mark used to mark clusters, such as in พ๎ราห๎มณ pʰraːmǒn.

  [U+0E3A THAI CHARACTER PHINTHU] is the Pali or Sanskrit virama, and are not used in Thai.

Punctuation

The Unicode Standard classifies some of these as punctuation, and some as letters.

๚␣๛␣๏␣ๆ␣ฯ

Follow the links for more information about these characters.

Numbers, dates, currency, etc.

Thai has a set of decimal digits, that are used regularly.

๐␣๑␣๒␣๓␣๔␣๕␣๖␣๗␣๘␣๙

The currency symbol for baht is encoded in the Unicode Thai block.

฿

Glyph shaping & positioning

Combining characters

Most of the combining characters in Thai are used for vowel-signs and tone marks.

Thai regularly combines multiple combining characters above a base consonant. There are two examples in the text below, both of which show a base character with a vowel sign and then a tone mark on top.

ครั้งที่  

Multiple diacritics (vowel sign + tone mark) attached to the same base character.

Context-based positioning

Combining characters need to be placed in different positions, according to the context. The example below shows the same tone character displayed at different heights, according to what falls beneath it.

ให้มีขึ้น

The same tone mark displayed at different heights.

Structural boundaries & markers

Word boundaries

The concept of 'word' is difficult to define in any language (see What is a word?). We will treat it as a vaguely-defined semantic unit that is typically smaller than a phrase and may comprise one or more syllables.

Thai doesn't separate words in a phrase.

There is, however, a concept of words in the text. For example, lines are supposed to be broken at word boundaries.

รวมทั้งวิทยาการด้านคอมพิว

Word boundaries occur where the vertical lines appear, though they are not marked by the script.

The main difficulty arises when dealing with compound words. It can often be difficult to decide whether a given string of syllables represents multiple words or a single compound word.

Long Thai words.

Shorter Thai words.

Alternative line break opportunities for Thai text using compound nouns.

The variation may be related to the operation being performed on the text (eg. line breaking in narrow newsprint columns, vs. double-click selection, vs. cursor movement, etc.), or it may just be down to personal preference,

The difference may also be contextually dependent. Wirote Aroonmanakun describes how คนขับรถ ḵʰṉkʰäb̯ṟtʰ kʰon kʰàp ròt driver should be viewed as a single word in the context คนขับรถนั่งคอยอยู่ในรถ ḵʰṉkʰäb̯ṟtʰṉä¹ŋ̱ḵʰʔ̯y̱ʔ̯y̱ū¹äʲṉṟtʰ kʰon kʰàp ròt nâŋ kʰɔːj jûː nràjt the (man who works as a) driver is waiting in the car, whereas in the phrase คนขับรถผ่านแยกนี้ไม่มากนัก ḵʰṉkʰäb̯ṟtʰpʰ¹āṉɛy̱k̯ṉī²aʲm̱¹m̱āk̯ṉäk̯ kʰon kʰàp ròt pʰàːn jɛ̀ːk níː mâj màːk nàk not many people drive through this intersection it would be viewed as 3 words, referring to anyone who is driving. a

Proper names, which are composed from multiple words, are also problematic, especially because there are no capital letters to distinguish them from other pieces of text. . g

ZWSP & WJ

In order to manually fine-tune word-boundary detection, the invisible character U+200B ZERO WIDTH SPACE (ZWSP) can be used to create breaks. u625

To prevent a break between syllables, U+2060 WORD JOINER(WJ) can be used.

It is also important to bear in mind that Thai may be used to write various languages, in particular minority languages for which different dictionaries are needed. Since such dictionaries may not available in a given browser or other application, there is a tendency to use ZWSP in order to compensate.

Large-scale manual entry of ZWSP and WJ has potential downsides because the user cannot see them; this leads to problems with ZWSP being inserted in the wrong position, or multiple times. However, these don't set a state, so it doesn't create major issues. It would be useful, however, if an editor showed the location of these characters.

Care should also be taken when trying to match text, eg. for searching in a page. WJ should be ignored. ZWSP may or may not be ignored, depending on whether word boundaries are significant for the search.

Phrase boundaries

Thai uses space as a phrase marker, rather than to delimit words, often in places where English text would use commas or periods.

Latin-based punctuation such as comma, period, and colon are also used in text, particularly in conjunction with Latin letters or in formatting numbers, addresses, and so forth.

[U+0E5A THAI CHARACTER ANGKHANKHU] is used to mark the end of a long segment of text. It can be combined as ๚ะ to mark a larger segment of text; typically this usage can be seen at the end of a verse in poetry. u625

[U+0E5B THAI CHARACTER KHOMUT] marks the end of a chapter or document, where it always follows the ๚ะ combination. u625

Quotations

According to CLDR, the default quote marks for Lao are [U+201C LEFT DOUBLE QUOTATION MARK] at the start, and [U+201D RIGHT DOUBLE QUOTATION MARK] at the end.

When an additional quote is embedded within the first, the quote marks are [U+2018 LEFT SINGLE QUOTATION MARK] and [U+2019 RIGHT SINGLE QUOTATION MARK].

Repetition & elision

[U+0E46 THAI CHARACTER MAIYAMOK] is used to mark repetition of preceding letters. u625

[U+0E2F THAI CHARACTER PAIYANNOI] is used to indicate elision or abbreviation of letters; it is itself viewed as a kind of letter, however, and is used with considerable frequency because of its appearance in such words as the Thai name for Bangkok. Paiyannoi is also used in the combination ฯลฯ to create a construct called paiyanyai , which means “et cetera, and so forth.” u625

Line & paragraph layout

Line breaks & text wrap

Thai doesn't indicate word boundaries, but when Thai text is wrapped at the end of a line you should not split a word.

As you change the width of the browser window the highlighted text above should break at the following points if your browser supports Thai wrapping:

โลกจะใช้เพียง

Because Thai doesn't separate words, applications typically look up word boundaries in a dictionary, however, such lookup doesn't always produce the needed result, especially when dealing with compound words and proper names (see words). To counteract these deficiencies, authors may use U+200B ZERO WIDTH SPACE and U+2060 WORD JOINER (see zwsp).

Justification

Justification in Thai adjusts blank spaces, but also makes certain adjustments to inter-character spacing. Browsers currently tend not to justify Thai text well.

The zero width space can grow to have a visible width when justified. u625

Use the control below to see how your browser justifies the text sample here.

ทุกคนเสมอกันตามกฏหมายและมีสิทธิที่จะได้รับความคุ้มครองของกฏหมายเท่าทียมกัน โดยปราศจากการเลือกปฏิบัติใด ๆ ทุกคนมีสิทธิที่จะได้รับความคุ้มครองเท่าเทียมกันจากการเลือกปฏิบัติใด ๆ อันเป็นการล่วงละเมิดปฏิญญา และจากการยุยงให้เกิดการเลือกปฏิบัติดังกล่าว

Lists, counters, etc.

[U+0E4F THAI CHARACTER FONGMAN] is the Thai bullet, which is used to mark items in lists or appears at the beginning of a verse, sentence, paragraph, or other textual segment. u625

The document Ready-made Counter Styles includes two counter styles:

  1. a numeric style, called thai, and
  2. an alphabetic style, called thai-alphabetic.

Line height

Thai places vowel and tone marks above base characters, one above the other, and can also add combining characters below the line. The complexity of these marks means that the vertical resolution needed for clearly readable Thai text is higher than for, say, Latin text. In addition, Thai tends to adds more interline spacing than Latin text does.

Here is an example of a word with combining characters above and below base characters:

ผู้เชี่ย

Multiple diacritics above and below the base significantly increase the vertical height of lines.

TBD

Further information needed for this section includes:

Glyph shaping & positioning
    Cursive text
    Context-based shaping
    Multiple combining characters
    Context-based positioning
    Transforming characters

Structural boundaries & markers
    Grapheme, word & phrase boundaries
    Hyphens & dashes
    Bracketing information
    Quotations
    Abbreviations, ellipsis, & repetition
    Emphasis & highlights
    Inline notes & annotations

Inline layout
    Inline text spacing
    Bidirectional text

Line & paragraph layout
    Text direction
    Line breaking
    Hyphenation
    Text alignment & justification
    Counters, lists, etc.
    Styling initials
    Baselines & inline alignment

Page & book layout
    General page layout & progression
    Directional layout features
	Grids & tables
    Notes, footnotes, etc.
    Forms & user interaction
    Page numbering, running headers, etc.

References

  1. [ u ] The Unicode Standard v7.0, Thai
  2. [ b ] Marie-Hélène Brown, Reading and Writing Thai, ISBN 974-210-4506
  3. [ t ] Benjawan Poomsan Becker, Thai for Beginners, ISBN 1-887521-00-3
  4. [ p ] Special characters in thai language, The Packnam Web Forums
  5. [ g ] Github discussions. Especially contributions by James Clarke.
  6. [ a ] Wirote Aroonmanakun, Thoughts on Word and Sentence Segmentation in Thai
Last changed 2019-05-01 6:54 GMT.  •  Make a comment.  •  Licence CC-By © r12a.