Tamil script

Updated 25 July, 2017 • tags tamil, scriptnotes.

This page provides basic information about the Tamil script. It is not authoritative, peer-reviewed information – these are just notes I have gathered or copied from various places as i learned. For similar information related to other scripts, see the Script comparison table.

Clicking on red text examples, or highlighting part of the sample text shows a list of characters. Click on the vertical blue bar (bottom right) to change font settings for the sample text.

For more details see: Character notes Script links

Sample (Tamil)

உறுப்புரை 1 மனிதப் பிறிவியினர் சகலரும் சுதந்திரமாகவே பிறக்கின்றனர்; அவர்கள் மதிப்பிலும், உரிமைகளிலும் சமமானவர்கள், அவர்கள் நியாயத்தையும் மனச்சாட்சியையும் இயற்பண்பாகப் பெற்றவர்கள். அவர்கள் ஒருவருடனொருவர் சகோதர உணர்வுப் பாங்கில் நடந்துகொள்ளல் வேண்டும்.

உறுப்புரை 2 இனம், நிறம், பால், மொழி, மதம், அரசியல் அல்லது வேறு அபிப்பிராயமுடைமை, தேசிய அல்லது சமூகத் தோற்றம், ஆதனம், பிறப்பு அல்லது பிற அந்தஸ்து என்பன போன்ற எத்தகைய வேறுபாடுமின்றி, இப்பிரகடனத்தில் தரப்பட்டுள்ள எல்லா உரிமைகளுக்கும் சுதந்திரங்களுக்கும் எல்லோரும் உரித்துடையவராவர். மேலும், எவரும் அவருக்குரித்துள்ள நாட்டின் அல்லது ஆள்புலத்தின் அரசியல், நியாயாதிக்க அல்லது நாட்டிடை அந்தஸ்தின் அடிப்படையில் — அது தனியாட்சி நாடாக, நம்பிக்கைப் பொறுப்பு நாடாக, தன்னாட்சியற்ற நாடாக அல்லது இறைமை வேறேதேனும் வகையில் மட்டப்படுத்தப்பட்ட நாடாக இருப்பினுஞ்சரி — வேறுபாடெதுவும் காட்டப்படுதலாகாது.

Brief script introduction

The script is an abugida, ie. consonants carry an inherent vowel sound that is overridden using vowel signs. In Tamil, consonants carry an inherent vowel ʌ, usually written a.

There are less consonants than in other Indic scripts. Tamil has no aspirated consonants, and symbols are allocated on a phonemic basis, rather than phonetic. This means that , for example, may be pronounced as the allophones k ɡ x ɣ or h, according to where it appears relative to other sounds in a word, but its pronunciation doesn't change the word.

Tamil is diglossic: the classic form is preferred for writing and public speaking, and is mostly standard across the Tamil-speaking regions; the colloquial, spoken form differs widely from the written.

Text runs from left to right.

Consonants

Extended letters

Because the core set of Tamil consonants is quite a lot smaller than that of most indic scripts, Tamil adds additional 'grantha' letters to cover sounds in Sanskrit and English.

For compatability with modern communication it also presses into service [U+0B83 TAMIL SIGN VISARGA] (called āytam) to produce fricative sounds from stops. ஃப gives f, eg. ஃபீசு fiːsɯ fees. ஃஜ gives z, eg. ஃஜிரொக்ஸ் ziroks Xerox.

The Unicode Standard, also describes a method of extension that uses superscript digits to represent transcriptions of languages such as Sanskrit and Saurashtra, eg. ² = pha, ³ = ba, and ⁴ = bhaUnicode 471.

Consonant pronunciation

Plosives are unvoiced if they occur word-initially or doubled. Elsewhere they are voiced, with a few becoming fricatives intervocalically. Nasals and approximants are always voiced.

The consonants are classified into three categories: vallinam (hard consonants), mellinam (soft consonants, including all nasals), and idayinam (medium consonants), which are important for the rules of pronunciation.

Rules for the pronunciation of consonants, in particular for plosives, for the written form of Tamil make for complementary distribution. These rules break down to varying degrees when dealing with Sanskrit loan words and the colloquial spoken form of Tamil (particularly in northern areas). For more read Tamil phonology and Krishnamurthi pp23-28.

Consonant clusters

Rather than using conjunct glyphs, like most other Indic scripts, consonant clusters are normally represented using a dot over the character(s) not followed by a vowel. The dot is called puḷḷi (the Tamil virama). There are more conjunct forms in older versions of the Tamil script. The modern script has two common exceptions: க்ஷ kʃa and ஶ்ரீ ʃri.

Vowels

There are independent and combining forms of all vowels, except the inherent vowel, which has no combining form.

Independent vowel forms used to be used at the beginning of metrical groups, but now they are used at the beginning of a word, eg. இந்த inta this, but also internally to represent 'overlong' vowel sounds, eg. பெரீஇஇய periːiiya reeeeally big.

Some vowel signs precede the consonant or consonant cluster, and others are represented by glyphs on both sides of it.

Some vowel signs produce significantly different, commonly ligated, shapes as they combine with the base consonant. The figure below shows just a few examples of shapes produced by just one vowel sign, ◌ு [U+0BC1 TAMIL VOWEL SIGN U​], when combined with various different base characters

Base consonant Combination
கு
சு
ஞு
டு
ஜு
Ligatures with the vowel sign U

Alternative vowel forms

The three two-part vowel signs can be written in two different ways. The single code point per vowel sign, is the preferred form and the form in common use for Tamil.

+ + +
+ + +
+ + +
Alternate code point sequences for vowel signs that surround the base.

Whichever approach you use, the vowel signs must come after the consonant or consonant cluster that they surround. In the case of multi-character vowel signs, the order is also important and should be as shown above.

Named character sequences

Tamil speakers tend to think of grapheme clusters containing consonant+vowel as a single entity. In some cases, people want to process Tamil using these grapheme clusters as a single unit.

To assist with this Unicode provides named character sequences that apply standardised names to whole syllables. These can then be mapped to the private use area for applications wanting to work with Tamil in this way. For normal Tamil data interchange, however, the standard codepoints should be used.Unicode 477

Context-based rendering

Although modern Tamil uses fewer conjunct ligatures than most other indic scripts, there are still many ligatures needed for a Tamil font, mostly for combinations of base consonant and vowel sign.

See The Unicode Standard, pp 473-476, for a list and description of many Tamil ligatures.

Numbers

There are a set of Tamil numbers, but modern Tamil text typically uses Western digits.

The Tamil digits can be used as a standard decimal counting system, but older versions of the Tamil system had no zero and inserted characters to indicate tens, hundreds, and thousands. For a description of the algorithm, see Predefined Counter Styles and Unicode Technical Note #21. You can experiment with this using the Counter styles converter tool (select Tamil, Ancient).

Punctuation

Western punctuation appears to be used generally.

The danda and double danda are sometimes used, along with other unified punctuation in the Devanagari block. Unicode 477

List of basic symbols

Tamil

This is a list of main characters or character combinations needed for Tamil. Clicking on these characters will open a page in another window. If the character is underlined, the new page will display additional information about that character.

 

Consonants
Grantha consonants
Independent vowels
Vowel signs   ா   ி   ீ   ு   ூ   ெ   ே   ை   ொ   ோ   ௌ   ௗ
Symbols   ்
Numbers

Further reading

  1. [Daniels] Peter T. Daniels and William Bright, The World's Writing Systems, Oxford University Press, ISBN 0-19-507993-0
  2. [WPScript] Wikipedia, Tamil Script
  3. [WPLanguage] Tamil language
  4. [Unicode] The Unicode Standard
  5. [Omniglot] Tamil
  6. [IQTamil] Tamil language - Definition
  7. [UnicodeFAQ] Tamil Language and Script
  8. [NamedSequences] Tamil Vowels, Consonants, and Syllables: Alternative Formats
  9. [Krishnamurthi] Anandam Krishnamurthi, Learn Tamil in a Month, Readwell's, ISBN 9788187782049
  10. Tamil Language and Script (Unicode FAQ)
First published 3 February, 2010. This version 2017-07-25 7:15 GMT.  •  Copyright r12a@w3.org. Licence CC-By.