Updated 01-May-2019 • tags thai, scriptnotes
This page provides basic information about the Thai script. It is not authoritative, peer-reviewed information – these are just notes I have gathered or copied from various places as I learned. For character-specific details follow the links to the Thai character notes.
For similar information related to other scripts, see the Script comparison table.
ข้อ 1 มนุษย์ทั้งหลายเกิดมามีอิสระและเสมอภาคกันในเกียรติศักด[เกียรติศักดิ์]และสิทธิ ต่างมีเหตุผลและมโนธรรม และควรปฏิบัติต่อกันด้วยเจตนารมณ์แห่งภราดรภาพ
ข้อ 2 ทุกคนย่อมมีสิทธิและอิสรภาพบรรดาที่กำหนดไว้ในปฏิญญานี้ โดยปราศจากความแตกต่างไม่ว่าชนิดใด ๆ ดังเช่น เชื้อชาติ ผิว เพศ ภาษา ศาสนา ความคิดเห็นทางการเมืองหรือทางอื่น เผ่าพันธุ์แห่งชาติ หรือสังคม ทรัพย์สิน กำเนิด หรือสถานะอื่น ๆ อนึ่งจะไม่มีความแตกต่างใด ๆ ตามมูลฐานแห่งสถานะทางการเมือง ทางการศาล หรือทางการระหว่างประเทศของประเทศหรือดินแดนที่บุคคลสังกัด ไม่ว่าดินแดนนี้จะเป็นเอกราช อยู่ในความพิทักษ์มิได้ปกครองตนเอง หรืออยู่ภายใต้การจำกัดอธิปไตยใด ๆ ทั้งสิ้น
The Thai script is used primarily for writing the Thai language, as well as Northern Thai, Northeastern Thai, Southern Thai, and Thai Song, which are separate languages. It is also used to write a number of minority languages in Thailand, Laos and China, as well as Pali, which is widely used in Buddhist temples and monasteries. Both the Thai language and script are closely related to Laotian. The script is of Indic origin, derived from Old Khmer.
Thai alphabet (Thai: อักษรไทย; RTGS: akson thai; [ʔ̯àksɔ̌ːn tʰāj]) is used to write the Thai, Southern Thai and other languages in Thailand. ...
The Thai alphabet is derived from the Old Khmer script (Thai: อักษรขอม, akson khom), which is a southern Brahmic style of writing derived from the south Indian Pallava alphabet (Thai: ปัลลวะ).
Thai is considered to be the first script in the world which invented tone markers to indicate distinctive tones, which are lacking in the Mon-Khmer (Austroasiatic languages) and Indo-Aryan languages from which its script is derived. Although Chinese and other Sino-Tibetan languages have distinctive tones in their phonological system, no tone marker is found in their orthographies. Thus, tone markers are an innovation in the Thai language that later influenced other related Tai languages and some Tibeto-Burman languages on the Southeast Asian mainland.
Thai tradition attributes the creation of the script to King Ramkhamhaeng the Great (Thai: พ่อขุนรามคำแหงมหาราช) in 1283, though this has been challenged.
Thai is an abugida. Consonant letters have an inherent vowel sound. Vowel-signs are attached to the consonant to produce a different vowel. See the table to the right for a brief overview of features, taken from the Script Comparison Table.
Unlike devanagari, multiple vowel-signs may be used with a single character, and those positioned to the left of the consonant(s) are not combining characters.
Like indic scripts, thai is based on orthographic syllables, so the vowel-sign is actually attached to the syllable. An orthographic syllable includes clusters of consonants without intervening vowel sounds. These clusters are typically represented as partially merged forms, called conjuncts.
Thai has no subjoined consonants, nor does it have any code points dedicated to medial or final consonants, although consonants do appear in those positions, eg. โปรแกรม opṟɛk̯ṟm̱ proː krɛːm (computer) program.
Text is written horizontally, left to right.
The Thai script characters in Unicode 11.0 are contained in a single block (not counting shared characters, such as punctuation):
Follow these links for information about characters used by languages associated with this script. The numbers in parentheses are for non-ASCII characters.
For character-specific details see Thai character notes.
Text is written horizontally, left to right.
Consonants carry an inherent vowel, pronounced o inside a closed syllable, and a in an open syllable. So ก is pronounced ka.
Vowel absence after syllable-final consonants is not normally marked in any way. Nor is it marked in syllable-initial clusters.
์ [U+0E4C THAI CHARACTER THANTHAKHAT] can be used above a syllable when it is not pronounced (usually at the end of a syllable), eg. รถเมล์ rotmeː bus, ศักดิ์สิทธิ์ saksitʰ to be sacred. It is often used for foreign loan words, eg. คอมพิวเตอร์ kʰompʰiwtɤː computer, โปสการ์ด poːskaːt postcard, สแตมป์ satɛːm stamp.
To produce a different vowel than the inherent one, Thai uses one or more vowel-signs, eg. กิ ki.
Vowel signs in Thai are a mixture of combining characters and ordinary spacing characters, Only the superscript and subscript vowel-signs are combining characters.
Vowel-signs can also be combined to create additional sounds.
See also vocalics.
Thai uses a visual encoding model. Of the vowel-signs, 5 appear to the left of the onset consonant. These characters are not combining characters, and are typed and stored before the base.
Note that these vowel-signs are placed before the start of the syllable. This means that a word with a consonant cluster at the start separates the prescript vowel from any postscript vowels by more than one consonant character, eg. เปล่า ep̯ḻ¹ā plàw no.
Note also that แ [U+0E41 THAI CHARACTER SARA AE] should not be typed as two successive เ [U+0E40 THAI CHARACTER SARA E] characters.
ะ [U+0E30 THAI CHARACTER SARA A] and า [U+0E32 THAI CHARACTER SARA AA] are normal spacing characters; the rest are combining characters.
ำ [U+0E33 THAI CHARACTER SARA AM] is classed as a vowel, but also contains the final consonant m, represented by a built-in nikhahit (cf. ํ [U+0E4D THAI CHARACTER NIKHAHIT]).
Both of these characters also appear as a part of the complex vowels described below.
The consonant อ [U+0E2D THAI CHARACTER O ANG] is also be pronounced as the vowel ɔː when it appears alone after a base consonant.
Many of the vowel combinations involve ย [U+0E22 THAI CHARACTER YO YAK] and/or ว [U+0E27 THAI CHARACTER WO WAEN] to create diphthongs.
The consonant ร [U+0E23 THAI CHARACTER RO RUA] is pronounced as a vowel a when doubled medially, eg. ธรรม tʰam justice. When doubled at the end of a syllable it is pronounced an, eg. กรรไกร kankraj scissors. Note, however, that this may also constitute the end and beginning of two syllables, eg. ภรรยา pʰanrájaː wife.
The various vowel-signs described just above can be mixed together with อ [U+0E2D THAI CHARACTER O ANG], ย [U+0E22 THAI CHARACTER YO YAK], ว [U+0E27 THAI CHARACTER WO WAEN], and ็ [U+0E47 THAI CHARACTER MAITAIKHU] to produce additional sounds, as shown in the examples below.
The following list shows where vowel-signs are positioned around a base consonant to produce vowels, and how many instances of that pattern there are. The figure after the + sign represents combinations of Unicode characters,
At maximum, vowel components can occur concurrently on 3 sides of the base.
Distribution of vowel elements is as follows:
|ั ิ ี ึ ื ็||ำ|
|เ แ ใ ไ โ||อ ะ า ย ว ๅ||ะ ย|
Thai uses อ [U+0E2D THAI CHARACTER O ANG] as a base for vowel signs, eg. อิ่ม ʔ̯i¹m̱ ìm to be full, or สะอาด saʔ̯ād̯ sà àːt clean.
อ [U+0E2D THAI CHARACTER O ANG] on its own represents the same sound as the inherent vowel, eg. อเมริกา ʔ̯em̱ṟik̯ā à meː rì kaː America.
There are no independent vowel letters in Thai,
These letters are actually considered to be consonants in Thai.
The long forms of both are created using ๅ [U+0E45 THAI CHARACTER LAKKHANGYAO], ie. ฤๅ and ฦๅ. Otherwise, that character doesn't appear alone.
The consonants are associated with high, mid, or low classes related to tone values. (Low class consonants are indicated using an underline in the transliteration. The consonants that default to mid class have an inverted breve below the transliteration.)
A silent ห [U+0E2B THAI CHARACTER HO HIP] can be added before the following characters to make their default tonal class high, eg. หมา hm̱ā mǎː dog, หยุด hy̱ud̯ jùt to stop.
See onset_clusters for further details about how these are presented.
อ [U+0E2D THAI CHARACTER O ANG] is silent when used as a base for vowels at the beginning of a syllable. When it appears alone after a base consonant it becomes the vowel ɔː. It is also used in combination with other characters to produce additional vowel sounds (see independent).
Consonant clusters at the start of a syllable can arise from additional consonants such as ล [U+0E25 THAI CHARACTER LO LING] and ร [U+0E23 THAI CHARACTER RO RUA] eg. ปลา p̯ḻā plaː fish, or the silent ห [U+0E2B THAI CHARACTER HO HIP] used to affect tonal values, eg. หมา hm̱ā mǎː dog.
There are no dedicated code points for these, so it is feasible that ปลา could be pronounced pà laː in a different context.
Tone marks and/or super-/subscript vowel-signs are attached to the second consonant, eg. เปลี่ยน ep̯ḻī¹y̱ṉ plìːan to change.
Prescript vowel-signs are placed before the first consonant in the cluster, ie. at the start of the syllable, eg. โปรแกรม opṟɛk̯ṟm̱ proː krɛːm (computer) program, which does this twice.
Consonants at the end of a syllable use ordinary code points, eg. ตื่น t̯ɯ̄¹ṉ tɯ̀ːn to wake up.
This can create some ambiguity, since there is no distinction between the sequence in the previous example and one where น is a new syllable with an inherent vowel.
The one exception is the character that is normally regarded as a vowel, ำ [U+0E33 THAI CHARACTER SARA AM], which includes the final m sound, eg. ห้องน้ำ h²ʔ̯ŋ̱ṉ²aᵐ hôŋ nám toilet.
A final m is not always represented using sara am, eg. ห้าม h²ām̱ hâːm to forbid.
Consonant clusters occur syllable-initially, or where one syllable ends with a consonant and the next begins with one.
There are no special mechanisms in Thai for dealing with clusters, such as conjuncts, stacking, special final consonants, etc.
See also onset_clusters.
The following chart shows how to tell which tones are associated with a syllable.
The expected typing and storage position for tone marks is immediately after the base consonant of the syllable, or after a superscript vowel-sign if there is one. However, the tone mark should be typed before ำ [U+0E33 THAI CHARACTER SARA AM], and should be displayed above the nikhahit, eg. ก่ำ.
Here we list combining marks other than those previously described under the sections on vowels and tones.
์ [U+0E4C THAI CHARACTER THANTHAKHAT] is used to suppress a vowel sound. See vowel_absence.
็ [U+0E47 THAI CHARACTER MAITAIKHU] converts vowels produced by the following three vowel signs to short vowels when they occur in medial position: ɔ –็อ– (ɔː > ɔ), e เ–็– (eː > e), eg. เด็ก dèk child, and oːj แ–็– æː > æ (not very common). It is also used for: ew เ–็ว (eːw > ew), eg. เร็ว rew fast. Also one word consists of consonant + short symbol: ก็ gɔ.
Other combining marks include the following:
Used in Pali and Sanskrit, ํ [U+0E4D THAI CHARACTER NIKHAHIT] is not commonly used in Thai, except that when letter spacing Thai text it is necessary to add the space between the circle and the remainder of ำ [U+0E33 THAI CHARACTER SARA AM]. To do this, the application may split U+0E33 into this character and า [U+0E32 THAI CHARACTER SARA AA].
๎ [U+0E4E THAI CHARACTER YAMAKKAN] is an ancient punctuation mark used to mark clusters, such as in พ๎ราห๎มณ pʰraːmǒn.
ฺ [U+0E3A THAI CHARACTER PHINTHU] is the Pali or Sanskrit virama, and are not used in Thai.
The Unicode Standard classifies some of these as punctuation, and some as letters.
Follow the links for more information about these characters.
Thai has a set of decimal digits, that are used regularly.
The currency symbol for baht is encoded in the Unicode Thai block.
Most of the combining characters in Thai are used for vowel-signs and tone marks.
Thai regularly combines multiple combining characters above a base consonant. There are two examples in the text below, both of which show a base character with a vowel sign and then a tone mark on top.
Combining characters need to be placed in different positions, according to the context. The example below shows the same tone character displayed at different heights, according to what falls beneath it.
Thai doesn't separate words in a phrase.
There is, however, a concept of words in the text. For example, lines are supposed to be broken at word boundaries.
The main difficulty arises when dealing with compound words. It can often be difficult to decide whether a given string of syllables represents multiple words or a single compound word.
The variation may be related to the operation being performed on the text (eg. line breaking in narrow newsprint columns, vs. double-click selection, vs. cursor movement, etc.), or it may just be down to personal preference,
The difference may also be contextually dependent. Wirote Aroonmanakun describes how คนขับรถ ḵʰṉkʰäb̯ṟtʰ kʰon kʰàp ròt driver should be viewed as a single word in the context คนขับรถนั่งคอยอยู่ในรถ ḵʰṉkʰäb̯ṟtʰṉä¹ŋ̱ḵʰʔ̯y̱ʔ̯y̱ū¹äʲṉṟtʰ kʰon kʰàp ròt nâŋ kʰɔːj jûː nràjt the (man who works as a) driver is waiting in the car, whereas in the phrase คนขับรถผ่านแยกนี้ไม่มากนัก ḵʰṉkʰäb̯ṟtʰpʰ¹āṉɛy̱k̯ṉī²aʲm̱¹m̱āk̯ṉäk̯ kʰon kʰàp ròt pʰàːn jɛ̀ːk níː mâj màːk nàk not many people drive through this intersection it would be viewed as 3 words, referring to anyone who is driving. a
Proper names, which are composed from multiple words, are also problematic, especially because there are no capital letters to distinguish them from other pieces of text. . g
In order to manually fine-tune word-boundary detection, the invisible character U+200B ZERO WIDTH SPACE (ZWSP) can be used to create breaks. u625
To prevent a break between syllables, U+2060 WORD JOINER(WJ) can be used.
It is also important to bear in mind that Thai may be used to write various languages, in particular minority languages for which different dictionaries are needed. Since such dictionaries may not available in a given browser or other application, there is a tendency to use ZWSP in order to compensate.
Large-scale manual entry of ZWSP and WJ has potential downsides because the user cannot see them; this leads to problems with ZWSP being inserted in the wrong position, or multiple times. However, these don't set a state, so it doesn't create major issues. It would be useful, however, if an editor showed the location of these characters.
Care should also be taken when trying to match text, eg. for searching in a page. WJ should be ignored. ZWSP may or may not be ignored, depending on whether word boundaries are significant for the search.
Thai uses space as a phrase marker, rather than to delimit words, often in places where English text would use commas or periods.
Latin-based punctuation such as comma, period, and colon are also used in text, particularly in conjunction with Latin letters or in formatting numbers, addresses, and so forth.
๚ [U+0E5A THAI CHARACTER ANGKHANKHU] is used to mark the end of a long segment of text. It can be combined as ๚ะ to mark a larger segment of text; typically this usage can be seen at the end of a verse in poetry. u625
๛ [U+0E5B THAI CHARACTER KHOMUT] marks the end of a chapter or document, where it always follows the ๚ะ combination. u625
According to CLDR, the default quote marks for Lao are “ [U+201C LEFT DOUBLE QUOTATION MARK] at the start, and ” [U+201D RIGHT DOUBLE QUOTATION MARK] at the end.
When an additional quote is embedded within the first, the quote marks are ‘ [U+2018 LEFT SINGLE QUOTATION MARK] and ’ [U+2019 RIGHT SINGLE QUOTATION MARK].
ๆ [U+0E46 THAI CHARACTER MAIYAMOK] is used to mark repetition of preceding letters. u625
ฯ [U+0E2F THAI CHARACTER PAIYANNOI] is used to indicate elision or abbreviation of letters; it is itself viewed as a kind of letter, however, and is used with considerable frequency because of its appearance in such words as the Thai name for Bangkok. Paiyannoi is also used in the combination ฯลฯ to create a construct called paiyanyai , which means “et cetera, and so forth.” u625
Thai doesn't indicate word boundaries, but when Thai text is wrapped at the end of a line you should not split a word.
As you change the width of the browser window the highlighted text above should break at the following points if your browser supports Thai wrapping:
Because Thai doesn't separate words, applications typically look up word boundaries in a dictionary, however, such lookup doesn't always produce the needed result, especially when dealing with compound words and proper names (see words). To counteract these deficiencies, authors may use U+200B ZERO WIDTH SPACE and U+2060 WORD JOINER (see zwsp).
Justification in Thai adjusts blank spaces, but also makes certain adjustments to inter-character spacing. Browsers currently tend not to justify Thai text well.
The zero width space can grow to have a visible width when justified. u625
Use the control below to see how your browser justifies the text sample here.
ทุกคนเสมอกันตามกฏหมายและมีสิทธิที่จะได้รับความคุ้มครองของกฏหมายเท่าทียมกัน โดยปราศจากการเลือกปฏิบัติใด ๆ ทุกคนมีสิทธิที่จะได้รับความคุ้มครองเท่าเทียมกันจากการเลือกปฏิบัติใด ๆ อันเป็นการล่วงละเมิดปฏิญญา และจากการยุยงให้เกิดการเลือกปฏิบัติดังกล่าว
๏ [U+0E4F THAI CHARACTER FONGMAN] is the Thai bullet, which is used to mark items in lists or appears at the beginning of a verse, sentence, paragraph, or other textual segment. u625
The document Ready-made Counter Styles includes two counter styles:
Thai places vowel and tone marks above base characters, one above the other, and can also add combining characters below the line. The complexity of these marks means that the vertical resolution needed for clearly readable Thai text is higher than for, say, Latin text. In addition, Thai tends to adds more interline spacing than Latin text does.
Here is an example of a word with combining characters above and below base characters:
Further information needed for this section includes:
Glyph shaping & positioning Cursive text Context-based shaping Multiple combining characters Context-based positioning Transforming characters Structural boundaries & markers Grapheme, word & phrase boundaries Hyphens & dashes Bracketing information Quotations Abbreviations, ellipsis, & repetition Emphasis & highlights Inline notes & annotations Inline layout Inline text spacing Bidirectional text Line & paragraph layout Text direction Line breaking Hyphenation Text alignment & justification Counters, lists, etc. Styling initials Baselines & inline alignment Page & book layout General page layout & progression Directional layout features Grids & tables Notes, footnotes, etc. Forms & user interaction Page numbering, running headers, etc.