Tamil script notes

Updated Tue 17 Feb 2015 • tags tamil, scriptnotes.

In these notes I synthesize information from various sources, encountered as I explore தமிழ் எழுத்து tamiḻ eḻuttu, the Tamil script as used for the Tamil language. They may be updated from time to time.

The page contains brief notes on general script features. See also the companion document, Tamil Character Notes, which describes the characters used in Tamil script one by one.

For more detailed information, especially about the history and phonology of the Tamil script, follow the links in the text and at the bottom of the page.

When you see red text (examples of Tamil) you can click on it to reveal the component characters.

Click the blue vertical bar at the bottom right of the page to apply various fonts, if you have them on your system. For the IPA transcriptions, I recommend installing the free Doulos Sil or Gentium Plus fonts.

Brief script introduction

The script is an abugida, ie. consonants carry an inherent vowel sound that is overridden using vowel signs.

Text runs from left to right.

Example of Tamil:

மனிதப் பிறிவியினர் சகலரும் சுதந்திரமாகவே பிறக்கின்றனர்; அவர்கள் மதிப்பிலும், உரிமைகளிலும் சமமானவர்கள், அவர்கள் நியாயத்தையும் மனச்சாட்சியையும் இயற்பண்பாகப் பெற்றவர்கள். அவர்கள் ஒருவருடனொருவர் சகோதர உணர்வுப் பாங்கில் நடந்துகொள்ளல் வேண்டும்.

Tamil script used for Tamil

Overview

Consonants carry an inherent vowel ʌ, usually written a.

There are less consonants than in other Indic scripts. Tamil has no aspirated consonants, and symbols are allocated on a phonemic basis, rather than phonetic. This means that , for example, may be pronounced as the allophones k ɡ x ɣ or h, according to where it appears relative to other sounds in a word, but its pronunciation doesn't change the word.

Tamil is diglossic: the classic form is preferred for writing and public speaking, and is mostly standard across the Tamil-speaking regions; the colloquial, spoken form differs widely from the written.

Extended letters

Because the core set of Tamil consonants is quite a lot smaller than that of most indic scripts, Tamil adds additional 'grantha' letters to cover sounds in Sanskrit and English.

For compatability with modern communication it also presses into service U+0B83 TAMIL SIGN VISARGA (called āytam) to produce fricative sounds from stops. ஃப gives f, eg. ஃபீசு fiːsɯ fees. ஃஜ gives z, eg. ஃஜிரொக்ஸ் ziroks Xerox.

The Unicode Standard, also describes a method of extension that uses superscript digits to represent transcriptions of languages such as Sanskrit and Saurashtra, eg. ² = pha, ³ = ba, and ⁴ = bhaUnicode 471.

Consonant pronunciation

Plosives are unvoiced if they occur word-initially or doubled. Elsewhere they are voiced, with a few becoming fricatives intervocalically. Nasals and approximants are always voiced.

The consonants are classified into three categories: vallinam (hard consonants), mellinam (soft consonants, including all nasals), and idayinam (medium consonants), which are important for the rules of pronunciation.

Rules for the pronunciation of consonants, in particular for plosives, for the written form of Tamil make for complementary distribution. These rules break down to varying degrees when dealing with Sanskrit loan words and the colloquial spoken form of Tamil (particularly in northern areas). For more read Tamil phonology and Krishnamurthi pp23-28.

Consonant clusters

Rather than using conjunct glyphs, like most other Indic scripts, consonant clusters are normally represented using a dot over the character(s) not followed by a vowel. The dot is called puḷḷi (the Tamil virama). There are more conjunct forms in older versions of the Tamil script. The modern script has two common exceptions: க்ஷ kʃa and ஶ்ரீ ʃri.

Vowels

There are independent and combining forms of all vowels, except the inherent vowel, which has no combining form.

Independent vowel forms used to be used at the beginning of metrical groups, but now they are used at the beginning of a word, eg. இந்த inta this, but also internally to represent 'overlong' vowel sounds, eg. பெரீஇஇய periːiiya reeeeally big.

Some vowel signs precede the consonant or consonant cluster, and others are represented by glyphs on both sides of it.

Some vowel signs produce significantly different, commonly ligated, shapes as they combine with the base consonant. The figure below shows just a few examples of shapes produced by just one vowel sign, U+0BC1 TAMIL VOWEL SIGN U , when combined with various different base characters

Base consonant Combination
கு
சு
ஞு
டு
ஜு
Ligatures with the vowel sign U

Alternative vowel forms. The three two-part vowel signs can be written in two different ways. The single code point per vowel sign, is the preferred form and the form in common use for Tamil.

+ + +
+ + +
+ + +
Alternate code point sequences for vowel signs that surround the base.

Whichever approach you use, the vowel signs must come after the consonant or consonant cluster that they surround. In the case of multi-character vowel signs, the order is also important and should be as shown above.

Shaping

Although modern Tamil uses fewer conjunct ligatures than most other indic scripts, there are still many ligatures needed for a Tamil font, mostly for combinations of base consonant and vowel sign.

See The Unicode Standard, pp 473-476, for a list and description of many Tamil ligatures.

Numbers

There are a set of Tamil numbers, but modern Tamil text typically uses Western digits.

The Tamil digits can be used as a standard decimal counting system, but older versions of the Tamil system had no zero and inserted characters to indicate tens, hundreds, and thousands. For a description of the algorithm, see Predefined Counter Styles and Unicode Technical Note #21. You can experiment with this using the Counter styles converter tool (select Tamil, Ancient).

Named character sequences

Tamil speakers tend to think of grapheme clusters containing consonant+vowel as a single entity. In some cases, people want to process Tamil using these grapheme clusters as a single unit.

To assist with this Unicode provides named character sequences that apply standardised names to whole syllables. These can then be mapped to the private use area for applications wanting to work with Tamil in this way. For normal Tamil data interchange, however, the standard codepoints should be used.Unicode 477

Punctuation

Western punctuation appears to be used generally.

The danda and double danda are sometimes used, along with other unified punctuation in the Devanagari block. Unicode 477

List of basic symbols

Tamil

This is a list of main characters or character combinations needed for Tamil. Clicking on these characters will open a page in another window. If the character is underlined, the new page will display additional information about that character.

 

Consonants
Grantha consonants
Independent vowels
Vowel signs   ா   ி   ீ   ு   ூ   ெ   ே   ை   ொ   ோ   ௌ   ௗ
Symbols   ்
Numbers

Further reading

  1. [Daniels] Peter T. Daniels and William Bright, The World's Writing Systems, Oxford University Press, ISBN 0-19-507993-0
  2. [WPScript] Wikipedia, Tamil Script
  3. [WPLanguage] Tamil language
  4. [Unicode] The Unicode Standard
  5. [Omniglot] Tamil
  6. [IQTamil] Tamil language - Definition
  7. [UnicodeFAQ] Tamil Language and Script
  8. [NamedSequences] Tamil Vowels, Consonants, and Syllables: Alternative Formats
  9. [Krishnamurthi] Anandam Krishnamurthi, Learn Tamil in a Month, Readwell's, ISBN 9788187782049
  10. Tamil Language and Script (Unicode FAQ)
First published 3 February, 2010. This version 2015-02-22 17:45 GMT.  •  Copyright r12a@w3.org. Licence CC-By.