Tamil

Updated 17 September, 2021

This page gathers basic information about the Tamil script and its use for the Tamil language. It aims (generally) to provide an overview of the orthography and typographic features, and (specifically) to advise how to write Tamil using Unicode.

Tamil has a fairly complicated set of rules and variations on pronunciation, and the writing system abstracts away from the detail. Phonetic transcriptions on this page should be treated as an approximate guide, only. Many are more phonemic than phonetic, and there may be variations depending on the source of the transcription. For example, the symbol a represents a set of central sounds which may be written a, ə, or ʌ in more detailed transcriptions.

More about using this page
Related pages.
Other script summaries.

Sample (Tamil)

Select part of this sample text to show a list of characters, with links to more details.
Change size:   28px

உறுப்புரை 1 மனிதப் பிறிவியினர் சகலரும் சுதந்திரமாகவே பிறக்கின்றனர்; அவர்கள் மதிப்பிலும், உரிமைகளிலும் சமமானவர்கள், அவர்கள் நியாயத்தையும் மனச்சாட்சியையும் இயற்பண்பாகப் பெற்றவர்கள். அவர்கள் ஒருவருடனொருவர் சகோதர உணர்வுப் பாங்கில் நடந்துகொள்ளல் வேண்டும்.

உறுப்புரை 2 இனம், நிறம், பால், மொழி, மதம், அரசியல் அல்லது வேறு அபிப்பிராயமுடைமை, தேசிய அல்லது சமூகத் தோற்றம், ஆதனம், பிறப்பு அல்லது பிற அந்தஸ்து என்பன போன்ற எத்தகைய வேறுபாடுமின்றி, இப்பிரகடனத்தில் தரப்பட்டுள்ள எல்லா உரிமைகளுக்கும் சுதந்திரங்களுக்கும் எல்லோரும் உரித்துடையவராவர். மேலும், எவரும் அவருக்குரித்துள்ள நாட்டின் அல்லது ஆள்புலத்தின் அரசியல், நியாயாதிக்க அல்லது நாட்டிடை அந்தஸ்தின் அடிப்படையில் — அது தனியாட்சி நாடாக, நம்பிக்கைப் பொறுப்பு நாடாக, தன்னாட்சியற்ற நாடாக அல்லது இறைமை வேறேதேனும் வகையில் மட்டப்படுத்தப்பட்ட நாடாக இருப்பினுஞ்சரி — வேறுபாடெதுவும் காட்டப்படுதலாகாது.

Usage & history

The Tamil script is used for writing the Tamil language, a Dravidian language spoken by over 65,500,000 people in India, Sri Lanka, Singapore, Malaysia and Mauritius. Tamil is an official language in the south Indian state of Tamil Nadu as well as in Sri Lanka and Malaysia. It is also used to write the liturgical language Sanskrit, using consonants and diacritics not represented in the Tamil alphabet. Certain minority languages such as Saurashtra, Badaga, Irula, and Paniya are also written in the Tamil script.

தமிழ் அரிச்சுவடி t̪ɐmɨɻ ˈɐɾit͡ɕːuʋəɽi

An old Tamil script derived from Brahmi, and dates back to the Ashokan period, however this differs in various significant ways from the modern script, which evolved from a new script created during the 6th century Pallava dynasty. It took around 500 years for this new script to spread throughout the Tamil regions. Orthographic reform in the 19th and 20th centuries simplified and regularised the script, removing many ligated forms, to facilitate typesetting.

Sources: Scriptsource, Wikipedia.

Orthographic development & variants

The script was reformed in the 19th century to make it easier to typeset, and again in the 20th. The advent of printing also brought back the use of the pulli to denote consonants without an inherent vowel, since the difficulty of using such on palm leaves made it become rare.ws

In 1978, in an attempt to simplify the script, the government of Tamil Nadu proposed the reform of certain letters and syllables. See writing_styles for details.

These reforms only spread in India and the digital world, whereas Sri Lanka, Singapore, Malaysia, Mauritius, Reunion and other Tamil speaking regions continue to use the traditional syllables.wss

Basic features

The Tamil script is an abugida, ie. consonants carry an inherent vowel sound that is overridden using vowel-signs. See the table to the right for a brief overview of features for the modern Tamil orthography.

There can be differences in letter shapes and other typographic approaches between the Tamil used in India and that used in places like Singapore and Malaysia (and even Sri Lanka).

The Tamil script is written horizontally, left to right.

Words are separated by spaces.

There are fewer consonants than in other Indic scripts. Tamil has no aspirated consonant letters, and symbols are allocated on a phonemic basis, rather than phonetic. This means that , for example, may be pronounced as the allophones k ɡ x ɣ or h, according to where it appears relative to other sounds in a word, but its pronunciation doesn't change the word.

The 18 consonant letters used for pure Tamil words are supplemented by 6 more Grantha consonant signs which are used for English and Sanskrit loan words. Repertoire extensions for 4 more non-native sounds are achieved by applying the āytam diacritic to characters.

Consonant clusters are indicated using the visible puḷḷi dot to indicate that no vowel follows a consonant. There are 2 ligated forms which are exceptions from the rule: க்ஷ k͓ʂ kʃʌஶ்ரீ ʃ͓ɾī ʃri

Word-initial clusters do not appear in Tamil. Syllable-/word-final consonants are just written using ordinary consonants with the puḷḷi overhead, eg. தமிழ் tmiɻ͓ (tamiḻ) Tamil

The Tamil orthography has an inherent vowel, and represents vowels using 11 vowel-signs, including 3 prescripts and 3 circumgraphs. All circumgraphs can be decomposed. All vowel-signs are combining marks, and are stored after the base character.

There are 12 independent vowels, one for each vowel sound, including the inherent vowel, and these are used to write all standalone vowel sounds.

The only composite vowels are those created by decomposition of the circumgraphs, and involve 2 glyphs, one on each side of the base consonant(s).

Tamil is diglossic: the classic form is preferred for writing and public speaking, and is mostly standard across the Tamil-speaking regions; the colloquial, spoken form differs widely from the written.

There can also be differences in letter shapes and other typographic approaches between the Tamil used in India and that used in places like Singapore and Malaysia (and even Sri Lanka).

Character index

Letters

Show

Basic consonants

ப␣த␣ச␣ட␣க␣ம␣ந␣ன␣ண␣ஞ␣ங␣வ␣ர␣ற␣ழ␣ல␣ள␣ய

Grantha consonants

ஜ␣ஸ␣ஶ␣ஷ␣ஹ

Independent vowels

இ␣ஈ␣உ␣ஊ␣எ␣ஏ␣ஒ␣ஓ␣அ␣ஆ␣ஐ␣ஔ

Other

ஃ␣ௐ

Combining marks

Show

Vowel-signs

ி␣ீ␣ு␣ூ␣ெ␣ே␣ொ␣ோ␣ா␣ை␣ௌ

Other

்␣ௗ

Not used for Tamil

𑌻

Numbers

Show

Not used for modern Tamil

௦␣௧␣௨␣௩␣௪␣௫␣௬␣௭␣௮␣௯␣௰␣௱␣௲

Punctuation

Show
“␣”␣‘␣’
।␣॥

ASCII

!␣(␣)␣,␣.␣:␣;␣?

Symbols

Show
௹␣₹

Not used for modern Tamil

௳␣௴␣௵␣௶␣௷␣௸␣௺␣₨
Character lists show:

Phonology

Click on the sounds to reveal locations in this document where they are mentioned.

Phones in a lighter colour are non-native or allophones. Source Wikipedia.

Vowel sounds

Plain vowels

i iː u uː e eː o oː a aː

Vowel length is distinctive; long vowels being about twice the length of short vowels.

Vowel quality can vary depending on the adjacent sounds. A good number of such variations are described in Comrieb.

Diphthongs

aɪ aʊ

Use of is restricted to a few lexical items.wp

Consonant sounds

labial dental alveolar post-
alveolar
retroflex palatal velar glottal
stop p b     ʈ ɖ   k ɡ  
affricate       t͡ʃ d͡ʒ        
fricative f
β
ð s z ʃ ʒ ʂ   x ɣ h
nasal m n   ɳ ɲ ŋ
approximant ʋ   l   ɻ ɭ j  
trill/flap     r ɾ   ɽ

Structure

Tamil has a very restricted set of consonant clusters, and no word-initial clusters.wp Geminated consonants, however, are common.

Some consonants cannot begin a word (eg. ɾ ɻ l) and others cannot appear at the end.ws,#Basic_consonants

Vowels

Inherent vowel

a following a consonant is not written, but is seen as an inherent part of the consonant letter, so ka is written by simply using the consonant letter [U+0B95 TAMIL LETTER KA].

Danielsd describes the inherent vowel as ʌ, though not consistently.

Vowel-signs

Non-inherent vowel sounds that follow a consonant are represented using vowel-signs, eg. kiː is written கீ [U+0B95 TAMIL LETTER KA + U+0BC0 TAMIL VOWEL SIGN II].

An orthography that uses vowel-signs is different from one that uses simple diacritics or letters for vowels, in that the vowel-signs are generally attached to the syllable, rather than just applied to the letter of the immediately preceding consonant. This means that pre-base vowel-signs and the left glyph of circumgraphs appears before a whole consonant cluster if it is rendered as a conjunct (see prescript_vowels).

Tamil vowel-signs are all combining characters. In principle a single Unicode character is used per base consonant, even if the vowel-signs appear on both sides of the base consonant (however see also circumgraphs for decomposed forms). All vowel-signs are typed and stored after the base consonant, and the glyph rendering system takes care of the positioning at display time.

All but one vowel-signs are spacing combining characters, ie. they expand the text width when applied to a consonant.

Combining marks used for vowels

Tamil uses the following dedicated combining marks for vowels.

ி␣ீ␣ு␣ூ␣ெ␣ே␣ொ␣ோ␣ ␣ை␣ௌ

The u and ū vowel-signs, and to some extent the i and ī signs, tend to form ligatures with the base consonant. See vowelligation.

Pre-base vowel-signs

ெ␣ே␣ை

Three vowel-signs appear to the left of the base consonant letter or conjunct, eg. கெடு

These are combining marks that are always stored after the base consonant. The font places the glyph before the base consonant.

Because modern Tamil usually indicates consonant clusters with a visible virama, pre-base vowel-signs normally appear before the consonant that immediately precedes them audially. However, in versions of the orthography that include conjunct forms the pre-base vowel appears before the whole consonant cluster at the beginning of the orthographic syllable, eg. எங்கே

Circumgraphs

ொ␣ோ␣ௌ

Three vowels are produced by a single combining character with visually separate parts, that appear on opposite sides of the consonant onset eg. கொடு

Encoding. All of these circumgraphs can be written as a single character, or as two.

  1. [U+0BCA TAMIL VOWEL SIGN O]
    ொ [U+0BC6 TAMIL VOWEL SIGN E + U+0BBE TAMIL VOWEL SIGN AA]
  2. [U+0BCB TAMIL VOWEL SIGN OO]
    ோ [U+0BC7 TAMIL VOWEL SIGN EE + U+0BBE TAMIL VOWEL SIGN AA]
  3. [U+0BCC TAMIL VOWEL SIGN AU]
    ௌ [U+0BC6 TAMIL VOWEL SIGN E + U+0BD7 TAMIL AU LENGTH MARK]

The single code point per vowel-sign, is the preferred form and the form in common use for Tamil. The parts are separated, however, in Unicode Normalisation Form D. [U+0BD7 TAMIL AU LENGTH MARK] is never used alone.

Whichever approach is used, the vowel-signs must be typed and stored after the consonant character(s) they surround, and in left to right order. In the case of decomposed vowel-signs, the order is also important and must be as shown above.

Composite vowels

Composite vowels only occur in Tamil when the circumgraphs listed above are decomposed.

Show details about vowel glyph positioning.

The following list summarises where vowel-signs are positioned around a base consonant to produce vowels, and how many instances of that pattern there are.

  • 3 prescript, eg. கெ ke
  • 4 postscript, eg. கூ
  • 1 superscript, eg. கீ
  • 3 pre+postscript, eg. கௌ kʌʷ

However, some of the vowel signs are tightly integrated with the consonant shape. See vowelligation.

Standalone vowels

Tamil represents syllable-initial vowels using a set of independent vowel letters.

இ␣ஈ␣உ␣ஊ␣எ␣ஏ␣ஒ␣ஓ␣அ␣ஆ␣ ␣ஐ␣ஔ

Independent vowel forms used to be used at the beginning of metrical groups, but now they are used at the beginning of a word, eg. இந்த ịṅt (inta) this

They are also used internally to represent 'overlong' vowel sounds, eg. compare பெரீய peɾīy (perīya) really bigபெரீஇஇய peɾīịịy (perīiiya) reeeeally big

Vowel ligatures

Vowel-signs for u and , and to some extent i and i:, produce significantly different, ligated shapes as they combine with the base consonant. The figure below shows the various alternative shapes produced by [U+0BC1 TAMIL VOWEL SIGN U] when combined with different base characters.

Base consonant Combination
கு
சு
ஞு
டு
ஜு
Ligatures with [U+0BC1 TAMIL VOWEL SIGN U].

Besides these significant transformations, special shaping is used to ensure a clean join between the consonant and vowel, eg. லி liIn the following sequence the vowel-sign is stretched slightly to fit the shape of the consonant.ஷி ʂi

A reform in 1978 by the Tamil Nadu government changed the shapes and relative positions of certain consonant+vowel-sign combinations in India, though not necessarily in other locations. See variants for a list of shape differences.

Expand the text just below for a table shapes arising from consonant+vowel combinations in Tamil.

Show a table of all consonant+vowel combinations.

Consonants with no following vowel

Tamil uses [U+0BCD TAMIL SIGN VIRAMA] (called puḷḷi in Tamil) to kill the inherent vowel after a consonant, eg. க் [U+0B95 TAMIL LETTER KA + U+0BCD TAMIL SIGN VIRAMA] explicitly represents just the sound k.

The puḷḷi may be rendered as a dot, or as a small, open circle.

The puḷḷi tends to be visible anywhere a vowel is dropped. For example, unlike Devanagari, it is used at the end of a word if there is no final vowel, eg. மனிதப் mnitp mənid̪əp human

The puḷḷi is also used to form conjuncts, although there are normally only 2 of those in modern Tamil (see conjuncts).

[U+0B82 TAMIL SIGN ANUSVARA] is not used for Tamil. Nor should it be used as a graphical variant of the pulli.s

Vowel to script mapping

The following tables show how vowel sounds commonly map to characters or sequences of characters. Both dependent vowel-signs (d) and independent vowels (i) are shown.

Plain vowels

i
d

ி [U+0BBF TAMIL VOWEL SIGN I], eg. கிரி kiɾi kɪɾɪˑ mountain.

 
i

[U+0B87 TAMIL LETTER I], eg. இலை ịlaʲ (ilai) ʔil̪aɪ̯ leaf.

d

[U+0BC0 TAMIL VOWEL SIGN II], eg. கீரி kīɾi kiːɾɪˑ mongoose.

 
i

[U+0B88 TAMIL LETTER II], eg. ஈரல் ị̄ɾl͓ (īral) ʔiːɾ̪al̪ liver.

u
d

[U+0BC1 TAMIL VOWEL SIGN U], eg. குடம் kuʈm͓ kʊɖəm pot, அழகு ạɻku əɻəgʉ beauty.

 
i

[U+0B89 TAMIL LETTER U], eg. உண் ụɳ͓ (uṇ) ʔuɳ to eat.

d

[U+0BC2 TAMIL VOWEL SIGN UU], eg. கூடம் kūʈm͓ kuːɖəm hall.

 
i

[U+0B8A TAMIL LETTER UU], eg. ஊது ụ̄tu (ūtu) ʔuːd̪u to blow.

e
d

[U+0BC6 TAMIL VOWEL SIGN E], eg. கெடு keʈu kɛɖʉ deadline.

 
i

[U+0B8E TAMIL LETTER E], eg. எலும்பு ẹlum͓pu (elumpu) ʔel̪umbu bone.

d

[U+0BC7 TAMIL VOWEL SIGN EE], eg. கேடு kēʈu keːɖʉ destruction.

 
i

[U+0B8F TAMIL LETTER EE], eg. ஏரி ẹ̄ɾi (ēri) ʔeːɾ̪i lake.

o
d

[U+0BCA TAMIL VOWEL SIGN O], eg. கொடு koʈu koɖʉ give.

ொ [U+0BC6 TAMIL VOWEL SIGN E + U+0BBE TAMIL VOWEL SIGN AA] when decomposed (not recommended) 

 
i

[U+0B92 TAMIL LETTER O], eg. ஒன்று ọn͓ru (oṉṟu) ʔonru one.

d

[U+0BCB TAMIL VOWEL SIGN OO], eg. கோடு kōʈu koːɖʉ dash.

ோ [U+0BC7 TAMIL VOWEL SIGN EE + U+0BBE TAMIL VOWEL SIGN AA]  when decomposed (not recommended) 

a
d

Inherent vowel, eg. கல் kl͓ kəl stone.

 
i

[U+0B85 TAMIL LETTER A], eg. அழகு ạɻku əɻəgʉ beauty.

d

[U+0BBE TAMIL VOWEL SIGN AA], eg. கால் kāl͓ kɑːl leg.

 
i

[U+0B86 TAMIL LETTER AA], eg. ஆண் ạ̄ɳ͓ (āṇ) ʔaːɳ man.

Diphthongs

d

[U+0BC8 TAMIL VOWEL SIGN AI], eg. கைது kaʲtu kəjd̪ʉ arrest.

 
i

[U+0B90 TAMIL LETTER AI], eg. ஐந்து ạʲṅ͓tu (aintu) ʔaɪ̯n̪d̪u five.

d

[U+0BCC TAMIL VOWEL SIGN AU], eg. கௌதாரி kaʷtāɾi kəʊd̪ɑːɾɪˑ partridge.

ௌ [U+0BC6 TAMIL VOWEL SIGN E + U+0BD7 TAMIL AU LENGTH MARK]  when decomposed (not recommended)

 
i

[U+0B94 TAMIL LETTER AU]

ஔ [U+0B92 TAMIL LETTER O + U+0BD7 TAMIL AU LENGTH MARK]  ]  when decomposed (not recommended)

Sources: Wikipedia, Anunaadam, and Google Translate.

Consonants

Basic consonants

The basic consonant sounds of the standard Tamil alphabet are represented by the following characters. Note that there are no consonants dedicated only to voiced stops or to fricative sounds.

Stops/fricatives

ப␣த␣ச␣ட␣க

This list uses hyphens to provide information about the context in which allophonic variants are used (see fig_allophone_table).

Nasals

ம␣ந␣ன␣ண␣ஞ␣ங

Liquids

வ␣ர␣ற␣ழ␣ல␣ள␣ய

Allophonic variants for Tamil plosives

The Tamil writing system only represents phonemic differences. The sounds in parentheses in the chart are allophonic variations or sounds used for foreign words. Allophonic variants are not usually indicated in Latin transcriptions.

Plosives are unvoiced if they occur word-initially or doubled. Elsewhere they are voiced, with a few becoming fricatives intervocalically. Nasals and approximants are always voiced.

Wikipedia provides the following useful table for the realisation of the plosive sounds in context.

  Letter Initial Geminate Intervocalic Post-nasal
Labial p β~w b
Dental t̪ː ð
Alveolar tːr r (d)r
Retroflex ʈː ɽ ɖ
Palatal tɕ~s tːɕ s
Velar k x~ ɡ
Allophonic variants for Tamil plosives.wp

The consonants are classified into three categories: vallinam (hard consonants), mellinam (soft consonants, including all nasals), and idayinam (medium consonants). These categories are important for the rules of pronunciation.

The mapping of consonants, in particular the plosives, to phonetic sounds is particularly varied for an indic script. These rules for the pronunciation of consonants for the written form of Tamil make for complementary distribution. However, the rules break down to varying degrees when dealing with Sanskrit loan words and the colloquial spoken form of Tamil (particularly in northern areas). For more read Tamil phonology and Krishnamurthi23-28.

Grantha consonants

Because the core set of Tamil consonants is quite a lot smaller than that of most indic scripts, Tamil adds additional letters from the Grantha script to cover sounds in Sanskrit and English, and complete the basic consonant set.

ஜ␣ஸ␣ஶ␣ஷ␣ஹ␣க்ஷ

The last item in the list just above is actually a cluster of two consonants, but is viewed as a single letter of the alphabet.

[U+0BB6 TAMIL LETTER SHA] is not commonly used, except in the ʃɾī ligature ஶ்ரீ. See shri.

Repertoire extension using āytam

For compatability with modern communication, Tamil presses into service [U+0B83 TAMIL SIGN VISARGA] (called āytam) to produce fricative sounds from stops.

ஃப␣ஃஜ␣ஃஸ␣ஃக

Examples: ஃபீசு ˑpīcu fiːsɯ feesஃஜிரொக்ஸ் ˑʤiɾok͓s͓ zɪɾoks Xeroxசெங்கிஸ் ஃகான் ceŋ͓kis͓ ˑkɑ̄n͓ t͡ʃɛŋgɪs xɑːn Gengis Khan

Note that a vowel-sign can occur between the visarga and the other consonant –  ie. the two are not treated as an indivisible unit, eg. ஃபோரியர் ˑpōɾiyɾ͓ foːɾɪjər Fourier

Other extension mechanisms

Superscript numbers

The Unicode Standard describes a method of extension that uses superscript or subscript digits, particularly to represent missing letters in transcriptions of languages such as Sanskrit and Saurashtra. Each number represents the sound that is unvoiced, unvoiced-aspirated, voiced, or voiced-aspirated, respectively, eg. ப¹ = pa, ப² = pha, ப³ = ba, and ப⁴ = bha.u

ட²பெத்வா அட்ட² ஸரே ஸேஸா அக்க²ரா ககாராத³யோ நிக்³க³ஹிதந்தா ப்³யஞ்ஜனா நாம ஹொந்தி.

Example of superscript numbers being used for allophone disambiguation.

Grantha script

The Grantha script is often also used by Tamil speakers to write Sanskrit because Grantha contains these missing consonants.u

Minority languages

A number of minority languages use a nukta symbol to identify sounds for that language. The shape of the nukta can vary. It always appears below the character, and the shape is most commonly a single dot below the letter (as or Chetti), a small open circle (as in Betta Kurumba), or 2 dots side-by-side (as in Irula). The code to use for such nuktas is 𑌻 [U+1133B COMBINING BINDU BELOW].

Final consonants

Syllable-/word-final consonants in Tamil are just written using ordinary consonants with the pulli overhead, eg. தமிழ் tmiɻ͓ (tamiḻ) Tamil

Consonant clusters

Rather than using conjunct glyphs like most other indic scripts, consonant clusters are normally represented using the puḷḷi dot over the character(s) that are not followed by a vowel, eg. தீர்ப்பு tīɾ͓p͓pu verdict

There are two common exceptions: க்ஷ k͓ʂ kʃʌஶ்ரீ ʃɾī ʃri

Representation of shrī

The syllable ʃri can be written with two different initial letters: [U+0BB6 TAMIL LETTER SHA] (ie. ஶ்ரீ ʃɾī) or [U+0BB8 TAMIL LETTER SA] (ie.ஸ்ரீ s͓ɾī). The result looks identical. Since 2005, the Unicode Consortium has recommended use of the former, but both are still in wide circulation, so Unicode 12 recommends that both be treated as equivalent sequences.u

Consonant sounds to characters

The following maps Tamil consonant sounds to common graphemes.

p

[U+0BAA TAMIL LETTER PA] when:

initial, eg. பத்து
geminated, eg. அப்பன், விற்பனை.

b

[U+0BAA TAMIL LETTER PA] when:
between vowels, eg. ஆபத்து ạ̄pt͓tu ɑːbət̪t̪ʉ danger
after a nasal, eg. அன்பு ạn͓pu ənbʉ love.

[U+0BA4 TAMIL LETTER TA] when:
initial, eg. தனி tni t̪ənɪˑ separate,
geminated, eg. பத்து pt͓tu pət̪t̪ʉ ten, or
after a stop, eg. யுக்தி yuk͓ti jʉkt̪ɪˑ tactical.

[U+0BB1 TAMIL LETTER RRA] inserted when this letter is geminated after a nasal, eg. பற்றி pr͓ri pət̺t̺ʳɪˑ about.

[U+0BA4 TAMIL LETTER TA] occurs:
after a nasal, eg. பந்து pṅ͓tu pən̪d̪ʉ ball.

[U+0BB1 TAMIL LETTER RRA] inserted when this letter follows a nasal, eg. ஒன்று ọn͓ru ʷond̺ʳʉ one.

ʈ

[U+0B9F TAMIL LETTER TTA] when
geminated, eg. பட்டு pʈ͓ʈu pəʈʈʉ plush.

Not used initially.

ɖ

[U+0B9F TAMIL LETTER TTA] when:
after a nasal, eg. வண்டி ʋɳ͓ʈi ʋəɳɖɪˑ carriage
between vowels, eg. படி pʈi pəɖɪˑ step.

k

[U+0B95 TAMIL LETTER KA] when:
initial, eg. கால் kāl͓ kɑːl call
geminated, eg. மக்கள் mk͓kɭ͓ məkkəɭ people
in a cluster, eg. கற்க kr͓k kərkə learn.

Also, the first part of க்ஷ [U+0B95 TAMIL LETTER KA + U+0BCD TAMIL SIGN VIRAMA + U+0BB7 TAMIL LETTER SSA]. , eg. காமாக்ஷி kāmāk͓ʂi kɑːmɑːkʂɪˑ Kamakshi.

And part of ஃஸ [U+0B83 TAMIL SIGN VISARGA + U+0BB8 TAMIL LETTER SA] ks for foreign words.

g

[U+0B95 TAMIL LETTER KA] when:
between vowels, eg. பாகு pāku pɑːgʉ Baku
after a nasal, eg. அங்கே ạŋ͓kē əŋgeˑ there
Not found word-initially.

t͡ʃ

[U+0B9A TAMIL LETTER CA] when:
initial, eg.
geminated, eg. .பேச்சு pēc͓cu peːttʃʉ talk
following a stop consonant, eg. கட்சி kʈ͓ci kəʈtʃɪˑ party.

d͡ʒ

[U+0B9A TAMIL LETTER CA] when:

following a nasal, eg. இஞ்சி ịɲ͓ci ʲɪɲd͡ʒɪˑ ginger

[U+0B9C TAMIL LETTER JA] (grantha consonant), eg. ஜானகி ʤānki d͡ʒɑːnəgɪˑ Janaki.

f

ஃப [U+0B83 TAMIL SIGN VISARGA + U+0BAA TAMIL LETTER PA] for foreign words, eg. ஃபீசு ˑpīcu fiːsɯ fees.

[U+0BAA TAMIL LETTER PA] for some foreign words. 

β

[U+0BAA TAMIL LETTER PA]
between vowels.

ð

[U+0BA4 TAMIL LETTER TA]
between vowels, eg. நல்லது ṅl͓ltu nəlləd̪ʉ good.

s

[U+0B9A TAMIL LETTER CA] when:
between vowels, eg. பாசம் pācm͓ pɑːsəm affection
at the beginning of some words, eg. சின்ன cin͓n sɪnnə small.

[U+0BB8 TAMIL LETTER SA] (grantha consonant), eg. ஸந்தியா sṅ͓tiyā sən̪d̪ɪjɑː Sandya.

z

[U+0B9C TAMIL LETTER JA] for borrowed words.

ஃஜ [U+0B83 TAMIL SIGN VISARGA + U+0B9C TAMIL LETTER JA] for foreign words, eg. ஃஜிரொக்ஸ் ˑziɾok͓s͓ Xerox.

ʃ

[U+0BB6 TAMIL LETTER SHA] (grantha consonant), eg. ஶிவா ʃiʋā ʃɪʋɑː Shiva.

ʂ

[U+0BB7 TAMIL LETTER SSA] (grantha consonant), eg. உஷா ụʂā ʷʊʂɑː Usha.

Also, as the second part of க்ஷ [U+0B95 TAMIL LETTER KA + U+0BCD TAMIL SIGN VIRAMA + U+0BB7 TAMIL LETTER SSA], eg. காமாக்ஷி kāmāk͓ʂi kɑːmɑːkʂɪˑ Kamakshi.

x

[U+0B95 TAMIL LETTER KA] between vowels.

ஃக [U+0B83 TAMIL SIGN VISARGA + U+0B95 TAMIL LETTER KA] for foreign words, eg. செங்கிஸ் ஃகான் ceŋ͓kis͓ ˑkɑ̄n͓ Gengis Khan.

ɣ

[U+0B95 TAMIL LETTER KA] between vowels.

h

[U+0BB9 TAMIL LETTER HA] (grantha consonant), eg. ஹரி hɾi ɦəɾɪˑ Hari.

[U+0B95 TAMIL LETTER KA] between vowels only in colloquial speech.

m

[U+0BAE TAMIL LETTER MA], when:
initial, eg. மலை mlaʲ mələj mountain,
geminated, eg. அம்மாள் ạm͓māɭ͓ əmmɑːɭ mummy,
in a cluster, eg. தம்பி tm͓pi t̪əmbɪˑ brother,
finally, eg. வண்ணம் ʋɳ͓ɳm͓ ʋəɳɳəm colour.

[U+0BA8 TAMIL LETTER NA] eg. பந்து pṅ͓tu pən̪d̪ʉ ball

ɳ

[U+0BA3 TAMIL LETTER NNA], when:
in a cluster, eg. வண்டி,
geminated, eg. வண்ணம் ʋɳ͓ɳm͓ ʋəɳɳəm colour.

ɲ

[U+0B9E TAMIL LETTER NYA] when:
initial, eg. ஞானம் ɲānm͓ ɲɑːnəm wisdom,
geminated, eg. அஞ்ஞானம் ạɲ͓ɲānm͓ əɲɲɑːnəm fineness,
in a cluster, eg. இஞ்சி ịɲ͓ci ʲɪɲd͡ʒɪˑ ginger.

ŋ

[U+0B99 TAMIL LETTER NGA],when:

in a cluster, eg. அங்கே ạŋ͓kē əŋgeˑ there,
geminated, eg. அங்ஙனம் ạŋ͓ŋnm͓ əŋŋənəm there.

ʋ

[U+0BB5 TAMIL LETTER VA] occurs:
initially, eg. வழி ʋɻi ʋəɻɪˑ path
geminated, eg. அவ்வழி ạʋ͓ʋɻi əʋʋəɻɪˑ passer

r

[U+0BB1 TAMIL LETTER RRA] check all this out பற்றினார், பன்றி pət̺t̺ʳɪnɑːr , pənd̺ʳɪˑ

ɾ

[U+0BB0 TAMIL LETTER RA] eg. கரி kɾi kəɾɪˑ charcoal check out வந்தார்/ʋṅ͓tāɾ͓/ʋən̪d̪ɑːr/came aslo கறி kri kəɾɪˑ curry ?

ɻ

[U+0BB4 TAMIL LETTER LLLA], occurs:
between vowels, eg. வழி ʋɻi ʋəɻɪˑ path
geminated, eg.

ɽ

[U+0BB4 TAMIL LETTER LLLA] between vowels.

l

[U+0BB2 TAMIL LETTER LA] when:
between vowels, eg. வலி ʋli ʋəlɪˑ pain
geminated, eg. வல்லி ʋl͓li ʋəllɪˑ Valli;
final, eg. வந்தால் ʋṅ͓tāl͓ ʋən̪d̪ɑːl if

ɭ

[U+0BB3 TAMIL LETTER LLA] when:
between vowels, eg. வளி ʋɭi ʋəɭɪˑ aerobic
geminated, eg. வள்ளி ʋɭ͓ɭi ʋəɭɭɪˑ Valli
final, eg. வந்தாள் ʋṅ͓tāɭ͓ ʋən̪d̪ɑːɭ she
Doesn't occur in word-initial position.

j

[U+0BAF TAMIL LETTER YA] when:
initial, eg. யானை yānaʲ jɑːnəj elephant
geminated, eg. உய்ய ụy͓y ʷʊjjə critical

Sources: Wikipedia, Anunaadam, and Google Translate.

Named character sequences

Tamil speakers tend to think of grapheme clusters containing consonant+vowel as a single entity. In some cases, people want to process Tamil using these grapheme clusters as a single unit.

To assist with this Unicode provides named character sequences that apply standardised names to whole syllables. These can then be mapped to the private use area for applications wanting to work with Tamil in this way. For normal Tamil data interchange, however, the standard codepoints should be used.u

Symbol

OM is a religious concept found in all three major religions born in India viz. Hinduism, Jainism and Buddhism. [U+0BD0 TAMIL OM] is widely used in Hindu religious texts, temple publications, and as neon lamps of sign boards in shops etc.

Tamil supplement

Unicode version 12 added the Tamil Supplement block. This contains numbers, symbols, and one punctuation mark that are not normally used in modern Tamil, although a few are sometimes used in traditional formats, such as wedding invitations.u

The number characters are for fractions, and the symbols include measures of grain, old currency symbols, symbols of weight, length, and area, agricultural symbols, clerical symbols, and other symbols and abbreviations. The punctuation marks the end of a text.

For more information see Sharmass.

Numbers, dates, currency, etc

There is a set of Tamil numbers, but modern Tamil text uses Western digits.

The CLDR standard-decimal pattern is #,##,##0.###. The standard-percent pattern is #,##,##0%.c

An interesting feature of large numbers written in India is that they apply groupings of two, rather than three, digits between commas (even when using european digits).

20,00,000

Two million, written with Indian comma separators.

Archaic digits & symbols

The Tamil digits can be used as a standard decimal counting system, but older versions of the Tamil system had no zero and inserted characters to indicate tens, hundreds, and thousands.

௦␣௧␣௨␣௩␣௪␣௫␣௬␣௭␣௮␣௯␣௰␣௱␣௲

For a description of the algorithm, see Predefined Counter Styles and Unicode Technical Note #21. You can experiment with this using the Counter styles converter tool (select Tamil, Ancient).

௲௨௱௩௰௪

The number 1,234 using the old Tamil numbering system.

The following signs were formerly used with numbers.

௺␣௶␣௷␣௳␣௸

Currency

₹␣௹

The CLDR standard format for currency is ¤ #,##,##0.00, and the Indian currency symbol is [U+20B9 INDIAN RUPEE SIGN]c. The latter sign was introduced by the Indian government in 2010.

[U+0BF9 TAMIL RUPEE SIGN] is the Tamil rupee sign.

௹. 6,000

The Tamil rupee sign used to indicate a sum of 6,000 rupees.

The Indian rupee sign is distinguished from [U+20A8 RUPEE SIGN], which is an older symbol not formally tied to any particular currency.u Follow that link for more information about the rupee.

Dates

The following signs were formerly used for dates.

௳␣௴␣௵␣ள

Text direction

The Tamil script is written horizontally, left to right.

Show default bidi_class properties for characters by the modern Tamil orthography.

Glyph shaping & positioning

This section brings together information about the following topics: writing styles; cursive text; context-based shaping; context-based positioning; baselines, line height, etc.; font styles; case & other character transforms.

You can experiment with examples using the Tamil character app.

Tamil printed text is not cursive, and has no special requirements for baseline alignment between mixed scripts or in general.

The orthography has no case distinction, and no special transforms are needed to convert between characters.

Writing styles

In 1978, in an attempt to simplify the script, the government of Tamil Nadu proposed the reform of certain letters and syllables. See fig_1978_reform.

Diagram of reforms

Proposed reforms of 1978f.

The fonts used in this article apply the simplifications introduced for most of the characters in that reform, apart from the first two in the list above, which have not been widely adopted.

These reforms only spread in India and the digital world, whereas Sri Lanka, Singapore, Malaysia, Mauritius, Reunion and other Tamil speaking regions continue to use the traditional syllables.wss

Context-based shaping

Although modern Tamil uses fewer conjunct ligatures than most other indic scripts, many ligatures are still needed for a Tamil font, mostly for combinations of base consonant and vowel-sign. See vowelligation.

Shaping RA

In certain contexts, [U+0BB0 TAMIL LETTER RA] may look identical to [U+0BBE TAMIL VOWEL SIGN AA] in some fonts, or may have a short tail in others. These letters looked the same in old manuscripts, especially palm leaves, and in early printed materials. The stroke was introduced by Father Beschi to differentiate the two, but only if it didn't have a vowel-sign or pulli attached, so ரா, ரெ, ரோ, etc. carried a stroke, but not ர், ரி and ரீ. This approach is still followed, particularly in India, but in Malaysia and Singapore, there is a government regulation requiring the use of the form with a bottom stroke in all contexts. People are comfortable with both forms and will hardly notice the difference.m

fig_ra_variants shows a text that distinguishes between the two variant glyphs. Compare the items circled in red. The orange circle indicates a vowel-sign that would be ambiguous if one was not expecting a tail for the consonant.

Newspaper clipping

Variant forms of [U+0BB0 TAMIL LETTER RA].

Context-based positioning

Observation: Tamil consonants tend to all be the same height, and so the vertical positioning of the pulli tends to be the same. Otherwise, apart from the vowel-signs and pulli, Tamil doesn't really have combining characters.

Observation: The only time Tamil has multiple combining marks attached to the same base character is when decomposed multi-part vowel-signs are used, see vowelsigns.

Font styles

Italics and bold are not traditional features of Tamil text.i,#h_segmentation

Some fonts have upright glyphs, whereas others have slightly slanted glyphs.

Observation: Panels of text in a Tamil newspaper that uses fonts that are more slanted than normal. Could this be an italic font face? Note that all the body text of the panel uses that font. There appear to be no instances where italic-looking fonts are applied to inline text. Other fonts used for the body text in other articles tended to also have a slight lean, though not as much. The verticals in headings tend to be upright.

Punctuation & inline features

Grapheme boundaries

With 3 exceptions, modern Tamil doesn't use conjuncts to represent consonant clusters. Instead, it uses a visible pulli over the characters without an inherent vowel. The basic typographic unit is therefore equivalent to a Unicode grapheme cluster, ie. a base character followed by combining character(s).

This also means that, in a consonant cluster, a prescript vowel is displayed immediately to the left of the consonant it follows, rather than at the beginning of the cluster (see the sequence ற்றை r͓rʌʲ in fig_yavattaiyum).

யாற்றையும்

Grapheme boundaries in a word with consonant clusters.

Note, however, that typographic units used for justification and word stretching are based on glyph boundaries, rather than grapheme clusters. See justification.

Conjuncts. The exceptions are the sequences க்ஷ k͓ʂ, and ஶ்ரீ ʃ͓ɾī / ஸ்ரீ s͓ɾī (the latter two being alternate ways of writing the same sound). These sequences should not be broken during segmentation. Note, that the 'shri' sequence must include the i vowel to produce a conjunct. With a different vowel, the sequence of characters is displayed using a visual pulli, eg. இஶ்ரேல் ịʃ͓ɾēl͓ Israel

Correct segmentation of these conjunct-forming sequences are not supported by default by Unicode grapheme clusters (which split them in two), and requires the application of tailored rules.

Word boundaries

Words are separated by spaces.

Phrase & section boundaries

,␣;␣:␣.␣?␣!␣।␣॥

Western punctuation is used generally. There are no punctuation marks in the Tamil Unicode block.

For separators at the sentence level and below, the following are used in Tamil language text.

phrase

, [U+002C COMMA]

; [U+003B SEMICOLON]

: [U+003A COLON]

sentence

. [U+002E FULL STOP]

? [U+003F QUESTION MARK]

! [U+0021 EXCLAMATION MARK] 

[U+0964 DEVANAGARI DANDA]
section [U+0965 DEVANAGARI DOUBLE DANDA] (occasionally)

The danda and double danda are sometimes used. They are punctuation marks in the Devanagari block that are also used for several other scripts.u

Parentheses & brackets

(␣)
  start end
standard

( [U+0028 LEFT PARENTHESIS]

) [U+0029 RIGHT PARENTHESIS]

Quotations

“␣”␣‘␣’
  start end
default

[U+201C LEFT DOUBLE QUOTATION MARK]

[U+201D RIGHT DOUBLE QUOTATION MARK]
nested

[U+2018 LEFT SINGLE QUOTATION MARK]

[U+2019 RIGHT SINGLE QUOTATION MARK]

The default quote marks for Tamil are [U+201C LEFT DOUBLE QUOTATION MARK] at the start, and [U+201D RIGHT DOUBLE QUOTATION MARK] at the end.c

When an additional quote is embedded within the first, the quote marks are [U+2018 LEFT SINGLE QUOTATION MARK] and [U+2019 RIGHT SINGLE QUOTATION MARK].c

Emphasis

tbd

Underlining is not a traditional feature of Tamil text.i,#text_decoration

One way to express emphasis is to elongate vowel sounds using extra independent vowels, eg. compare பெரீய peɾīy (perīya) really bigபெரீஇஇய peɾīịịy (perīiiya) reeeeally big

Abbreviation, ellipsis & repetition

tbd

Inline notes & annotations

tbd

Other inline ranges

tbd

Other punctuation

tbd

Line & paragraph layout

Line breaking & hyphenation

The primary break points for Tamil are word boundaries, however Tamil is an agglutinative language and Tamil words can be long. This can lead to large gaps during justification, and sometimes words that are longer than the available column width, so it is desirable to also hyphenate words.

Show (default) line-breaking properties for characters in the modern Tamil orthography.

Hyphenation

Because of the length of Tamil words, hyphenation is useful during layout, but it isn't easy to do because of the complexity of Tamil words.

Hyphenation must take place at syllable boundaries. A hyphen is not usually added at the end of the line when a word is hyphenated.st

Newspaper clipping

Text from the Tamil newpaper, Daily Thanthi, showing hyphenated words with yellow highlighting.

Prabhakarp proposes rules that single characters should be avoided at line start/end, especially characters with nukta at line start, and a word with 5 characters including 3 consecutive consonants can't be split. He says that due to the fact that Tamil is highly inflexional, morphological or pattern based approaches are needed, rather than simple dictionary lookup.

Line-edge rules

According to ilreq, a line should not start with any of the following characters.

,␣.␣:␣;␣।␣॥␣)␣]␣}␣>␣+␣*␣/␣=␣_␣|␣~␣%

Line breaking should also not move a danda or double danda to the beginning of a new line, even if they are preceded by a space character. These punctuation characters should behave in the same way as a full stop does in English text.

Presumably, similar rules apply for the end of a line.

Text alignment & justification

Justification

Tamil usually adjusts inter-word spacing in order to justify text on a line.

Justification can be helped significantly by hyphenating the text, although this isn't easy for Tamil (see hyphenation).

A workaround, in the absence of word-breaking, is to mitigate by adjusting inter-character spaces, especially when only a single word fits on a line.g58,#issuecomment-561995889 fig_justification_in_newsprint shows examples of inter-character space being compacted and stretched to minimise inter-word gaps.

Newspaper clipping

Newspaper clipping showing inter-character compaction and stretching.

Justification in narrow columns.Due to the length of Tamil words, justification can sometimes lead to large gaps between words in narrow columns (see fig_justification_gaps).

Newspaper clipping

A narrow column in newsprint with large gaps between justified words on a line.

If only a single word fits on a line, it is common practise to stretch the inter-character spacing so that the word fills the line, eg. the word பெரும்பான்மை peɾum͓pɑ̄n͓mʌʲ (perumpāṉmai) majority in fig_justification_one_word.

Newspaper clipping

Examples of a single word stretched to fill a line.

Observation: In the sources examined, when stretching inter-character spacing in this way equal amounts of space are added, not between characters, but between unconnected glyphs. This includes stretching the space between vowel-sign glyphs and their base consonant. (See, for example, the separation of and in fig_justification_one_word.) This includes combinations such as ஜு ʤu (see fig_justification_in_newsprint), but vowel-signs that are ligated with their base, such as கி ki in கிராமப்புறங்கலில் are not separated. The pulli also remains above the base.

Observation: Where the word being stretched across a whole line is the last word in the sentence, it appears that the sentence-final punctuation also participates in the letter-spacing (see fig_justification_full_stop).

Newspaper clipping

Full stop participating in the letter-spacing when a single word is stretched to fill a line.

Paragraph indents

Paragraph features are the same as in English. Paragraphs can start with or without indents.g26

Letter spacing

tbd

 

Counters, lists, etc.

You can experiment with counter styles using the Counter styles converter. Patterns for using these styles in CSS can be found in Ready-made Counter Styles, and we use the names of those patterns here to refer to the various styles.

Tamil commonly uses western numbering systems for lists, however, Tamil also has a native numeric style and an archaic additive style.

Numeric

The tamil numeric style for Tamil is decimal-based and uses the digits below.

௦␣௧␣௨␣௩␣௪␣௫␣௬␣௭␣௮␣௯

Examples:

௧␣௨␣௩␣௪␣௧௧␣௨௨␣௩௩␣௪௪␣௧௧௧␣௨௨௨␣௩௩௩␣௪௪௪

Additive

The ancient-tamil additive style uses the numbers below. It is specified for a range between 1 and 9,999.

௯௲␣௮௲␣௭௲␣௬௲␣௫௲␣௪௲␣௩௲␣௨௲␣௲␣௯௱␣௮௱␣௭௱␣௬௱␣௫௱␣௪௱␣௩௱␣௨௱␣௱␣௯௰␣௮௰␣௭௰␣௬௰␣௫௰␣௪௰␣௩௰␣௨௰␣௰␣௯␣௮␣௭␣௬␣௫␣௪␣௩␣௨␣௧

Examples:

௧␣௨␣௩␣௪␣௰௧␣௨௰௨␣௩௰௩␣௪௰௪␣௱௰௧␣௨௱௨௰௨␣௩௱௩௰௩␣௪௱௪௰௪

Alphabetic

Observation: Alphabetic counters are seen, but are not very common.g57 See an example. It is not clear whether the counters extend beyond the vowel range.

Styling initials

It is possible to find the first letter in a paragraph styled in a distinctive way – usually larger and dropping down from the top of the first line. Some rules for positioning south Indian scripts are proposed by [ilreq].

Initials should not just include the first character on the line, but should include any associated combining characters. If the first character is the beginning of the sequences க்ஷ k͓ʂ, and ஶ்ரீ ʃ͓ɾī/ஸ்ரீ s͓ɾī, all of the characters making up the conjunct should be included in the styling. See an example of a highlighted syllable in fig_drop_caps.

Any punctuation such as opening quotes and opening parentheses should also be included in the initial styling.

Indian languages generally use the drop style or a boxed letter. Contour-filling is not needed for Indian text.i

Examples of dropped highlights Examples of dropped highlights

Two example paragraphs showing dropped highlighted initials.

For the drop style, the alphabetic baseline of the highlighted letter(s) should match the bottom of the row that determines the size of the highlighted letter(s). In box examples in fig_drop_caps the highlighted text is set to 3 lines in height. In the second example, the highlighted text descends below the baseline, so an extra line is cleared to accommodate it. The tall vowel-sign in the first example rises slightly higher than the normal character height, and slightly exceeds the height of the first line of text.

The exact positioning of the normal character height relative to the characters in the rest of the first line needs further research. The examples in fig_drop_caps show the default result for the Safari browser.

For the raised style of initial, the highlighted characters sit on the same baseline as the first line of the paragraph, but rise above it (see fig_raised_initial). The inter-line spacing needs to accommodate any descenders.

Example of raised highlight

Example of a raised highlighted initial.

Another common approach in Indic text is to create a box around the enlarged letter(s), often with a background colour. In this case the box dimensions are associated with the other lines in the paragraph, and the highlighted letters float within the box.

Page & book layout

This section is for any features that are specific to Tamil and that relate to the following topics: general page layout & progression; grids & tables; notes, footnotes, etc; forms & user interaction; page numbering, running headers, etc.

Languages using the Tamil script

According to ScriptSource, the Tamil script is used for the following languages:

Online resources

  1. Universal Declaration of Human Rights - Assyrian Neo-Aramaic
  2. List of Newspapers in Chennai

References