Updated 16 June, 2018 • tags tamil, scriptnotes.
This page provides basic information about the Tamil script. It is not authoritative, peer-reviewed information – these are just notes I have gathered or copied from various places as i learned. For similar information related to other scripts, see the Script comparison table.
Clicking on red text examples, or highlighting part of the sample text shows a list of characters, with links to more details. Click on the vertical blue bar (bottom right) to change font settings for the sample text.
உறுப்புரை 1 மனிதப் பிறிவியினர் சகலரும் சுதந்திரமாகவே பிறக்கின்றனர்; அவர்கள் மதிப்பிலும், உரிமைகளிலும் சமமானவர்கள், அவர்கள் நியாயத்தையும் மனச்சாட்சியையும் இயற்பண்பாகப் பெற்றவர்கள். அவர்கள் ஒருவருடனொருவர் சகோதர உணர்வுப் பாங்கில் நடந்துகொள்ளல் வேண்டும்.
உறுப்புரை 2 இனம், நிறம், பால், மொழி, மதம், அரசியல் அல்லது வேறு அபிப்பிராயமுடைமை, தேசிய அல்லது சமூகத் தோற்றம், ஆதனம், பிறப்பு அல்லது பிற அந்தஸ்து என்பன போன்ற எத்தகைய வேறுபாடுமின்றி, இப்பிரகடனத்தில் தரப்பட்டுள்ள எல்லா உரிமைகளுக்கும் சுதந்திரங்களுக்கும் எல்லோரும் உரித்துடையவராவர். மேலும், எவரும் அவருக்குரித்துள்ள நாட்டின் அல்லது ஆள்புலத்தின் அரசியல், நியாயாதிக்க அல்லது நாட்டிடை அந்தஸ்தின் அடிப்படையில் — அது தனியாட்சி நாடாக, நம்பிக்கைப் பொறுப்பு நாடாக, தன்னாட்சியற்ற நாடாக அல்லது இறைமை வேறேதேனும் வகையில் மட்டப்படுத்தப்பட்ட நாடாக இருப்பினுஞ்சரி — வேறுபாடெதுவும் காட்டப்படுதலாகாது.
The Tamil script, also called tamiz ezuttu, is used for writing the Tamil language, a Dravidian language spoken by over 65,500,000 people in India, Sri Lanka, Singapore, Malaysia and Mauritius. Tamil is an official language in the south Indian state of Tamil Nadu as well as in Sri Lanka and Malaysia. The script is derived from Brahmi, so is related to many of the scripts used for writing Indian Indo-Aryan languages, to which the Tamil language itself is unrelated.
The Tamil script (தமிழ் அரிச்சுவடி; Tamiḻ ariccuvaṭi; [t̪ɐmɨɻ ˈɐɾit͡ɕːuʋəɽi];) is an abugida script that is used by Tamils and Tamil speakers in India, Sri Lanka, Malaysia, Singapore and elsewhere to write the Tamil language, as well as to write the liturgical language Sanskrit, using consonants and diacritics not represented in the Tamil alphabet. Certain minority languages such as Saurashtra, Badaga, Irula, and Paniya are also written in the Tamil script.
The script is an abugida, ie. consonants carry an inherent vowel sound that is overridden using vowel signs. In Tamil, consonants carry an inherent vowel ʌ, usually written a. See the table to the right for a brief overview of features, taken from the Script Comparison Table.
There are less consonants than in other Indic scripts. Tamil has no aspirated consonants, and symbols are allocated on a phonemic basis, rather than phonetic. This means that க, for example, may be pronounced as the allophones k ɡ x ɣ or h, according to where it appears relative to other sounds in a word, but its pronunciation doesn't change the word.
Tamil is diglossic: the classic form is preferred for writing and public speaking, and is mostly standard across the Tamil-speaking regions; the colloquial, spoken form differs widely from the written.
Text runs from left to right.
The Tamil script characters in Unicode 10.0 are in the following block:
The following links give information about characters used for languages associated with this script. The numbers in parentheses are for non-ASCII characters.
Tamil (35 letters, 13 marks, 13 punctuation, 25 infrequent : total 61+25)
For character-specific details see Tamil character notes.
The basic Tamil alphabet consists of the following consonants.
Because the core set of Tamil consonants is quite a lot smaller than that of most indic scripts, Tamil adds additional 'grantha' letters to cover sounds in Sanskrit and English, and complete the basic consonant set.
The last item in the list just above is actually a cluster of two consonants, but it viewed as a single letter of the alphabet.
For compatability with modern communication, Tamil presses into service ஃ [U+0B83 TAMIL SIGN VISARGA] (called āytam) to produce fricative sounds from stops.
Examples: ஃபீசு fiːsɯ fees, ஃஜிரொக்ஸ் ziroks Xerox, செங்கிஸ் ஃகான் ceṅkis khāṉ Gengis Khan .
Plosives are unvoiced if they occur word-initially or doubled. Elsewhere they are voiced, with a few becoming fricatives intervocalically. Nasals and approximants are always voiced.
The consonants are classified into three categories: vallinam (hard consonants), mellinam (soft consonants, including all nasals), and idayinam (medium consonants), which are important for the rules of pronunciation.
The mapping of consonants, in particular the plosives, to phonetic sounds is particularly varied for an indic script. These rules for the pronunciation of consonants for the written form of Tamil make for complementary distribution. However, the rules break down to varying degrees when dealing with Sanskrit loan words and the colloquial spoken form of Tamil (particularly in northern areas). For more read Tamil phonology and Krishnamurthi pp23-28.
The Unicode Standard, also describes a method of extension that uses superscript or subscript digits, particularly to represent transcriptions of languages such as Sanskrit and Saurashtra. Each number represents the unvoiced, unvoiced aspirated, voiced, voiced aspirated, respectively, eg. ப¹ = pa, ப² = pha, ப³ = ba, and ப⁴ = bha Unicode 471. This isn't currently supported for Unicode text, because such numbers would need to appear between consonants and any following vowel sign.
Rather than using conjunct glyphs, like most other Indic scripts, consonant clusters are normally represented using a dot over the character(s) not followed by a vowel. The dot is called puḷḷi (the Tamil virama), and is represented in Unicode by ◌் [U+0BCD TAMIL SIGN VIRAMA].
There are more conjunct forms in older versions of the Tamil script. The modern script has two common exceptions: க்ஷ kʃa and ஶ்ரீ ʃri.
There are independent and combining forms of all vowels, except the inherent vowel, which has no combining form.
The Tamil block has 12 independent vowels.
Independent vowel forms used to be used at the beginning of metrical groups, but now they are used at the beginning of a word, eg. இந்த inta this, but also internally to represent 'overlong' vowel sounds, eg. பெரீஇஇய periːiiya reeeeally big.
There are 11 corresponding vowel signs, used to modify the inherent ʌ vowel of a consonant.
Some vowel signs precede the consonant or consonant cluster, eg. மே mē May, and others are represented by glyphs on both sides of it, eg. ஜிபௌட்டி jipauṭṭi Djibouti.
The three two-part vowel signs can be written in two different ways: either as a single code point, or as an initial code point followed by either ◌ா [U+0BBE TAMIL VOWEL SIGN AA] or ◌ௗ [U+0BD7 TAMIL AU LENGTH MARK].
The single code point per vowel sign, is the preferred form and the form in common use for Tamil.
Whichever approach you use, the vowel signs must come after the consonant or consonant cluster that they surround. In the case of multi-character vowel signs, the order is also important and should be as shown above.
Tamil speakers tend to think of grapheme clusters containing consonant+vowel as a single entity. In some cases, people want to process Tamil using these grapheme clusters as a single unit.
To assist with this Unicode provides named character sequences that apply standardised names to whole syllables. These can then be mapped to the private use area for applications wanting to work with Tamil in this way. For normal Tamil data interchange, however, the standard codepoints should be used.Unicode 477
Although modern Tamil uses fewer conjunct ligatures than most other indic scripts, there are still many ligatures needed for a Tamil font, mostly for combinations of base consonant and vowel sign.
Some vowel signs produce significantly different, commonly ligated, shapes as they combine with the base consonant. The figure below shows just a few examples of shapes produced by just one vowel sign, ◌ு [U+0BC1 TAMIL VOWEL SIGN U], when combined with various different base characters.
See The Unicode Standard, pp 473-476, for a list and description of many more Tamil ligatures.
There are a set of Tamil numbers, but modern Tamil text typically uses Western digits.
The Tamil digits can be used as a standard decimal counting system, but older versions of the Tamil system had no zero and inserted characters to indicate tens, hundreds, and thousands. For a description of the algorithm, see Predefined Counter Styles and Unicode Technical Note #21. You can experiment with this using the Counter styles converter tool (select Tamil, Ancient).
The following signs are for use with numbers. Not sure how they work.
For currency, there is ௹ [U+0BF9 TAMIL RUPEE SIGN].
Tamil has the following signs. Not sure how they are used.
Two further characters are used as symbols in Tamil.
OM is a religious concept found in all three major religions born in India viz. Hinduism, Jainism and Buddhism. ௐ [U+0BD0 TAMIL OM] is widely used in Hindu religious texts, temple publications, and as neon lamps of sign boards in shops etc.
The Tamil script is written horizontally, left to right.
Words are separated by spaces.
Western punctuation is used generally.
For separators at the sentence level and below, the following are used in Tamil language text.
|phrase||, [U+002C COMMA]
; [U+003B SEMICOLON]
: [U+003A COLON]
|sentence||. [U+002E FULL STOP]
। [U+0964 DEVANAGARI DANDA] (occasionally)
|question mark||? [U+003F QUESTION MARK]|
|section||॥ [U+0965 DEVANAGARI DOUBLE DANDA] (occasionally)|
The danda and double danda are sometimes used. They are punctuation marks in the Devanagari block that are used for several other scripts. Unicode 477
Tamil is preferably wrapped at word boundaries, however tamil words can be long, leaving large gaps during justification, so it is possible to hyphenate words.11
Hyphenation must take place at syllable boundaries. These are defined by [ilreq] however there is a question about whether ilreq's syllable boundaries are correct for Tamil. A hyphen is added at the end of the line when a word is hyphenated.11
According to [ilreq], a line should not start with any of the following characters.
Tamil is usually justified by adjusting inter-word spacing.
Use the control below to see how your browser justifies the text sample here.
உறுப்புரை 7 எல்லோரும் சட்டத்தின் முன்னர் சமமானவர்கள். பாரபட்சம் எதுவுமின்றிச் சட்டத்தின் சமமான பாதுகாப்புக்கும் உரித்துடையவர்கள். இப்பிரகடனத்தை மீறிப் புரியப்பட்ட பாரபட்சம் எதற்கேனும் எதிராகவும் அத்தகைய பாரபட்சம் காட்டுவதற்கான தூண்டுதல் யாதொன்றிற்கும் எதிராகவும் எல்லோரும் சமமான பாதுகாப்புக்கு உரித்துடையவர்கள்.
It is possible to find the first letter in a paragraph styled in a distinctive way – usually larger and dropping down from the top of the first line. Some rules for positioning south Indian scripts are proposed by [ilreq].
Further information needed for this section includes: baselines, emphasis & highlighting, text decoration, abbreviations & ellipsis, hyphens & dashes character transforms, quotations, line breaking, hyphenation, justification & alignment, notes & footnotes, page layout