Telugu orthography notes

Basic features

The Telugu script is an abugida. Consonants carry an inherent vowel which can be modified by appending vowel signs to the consonant. See the table to the right for a brief overview of features for the modern Telugu orthography.

The script is visually different from scripts like Devanagari and Bengali due to the rounded bases of the letters, which sit on rather hanging from a baseline. It also differs in the replacement of the flat, joining headstroke with a hook above the top of many characters. The hook is removed to accommodate superscript vowel signs and the virama.

Telugu text runs left-to-right in horizontal lines.

Words are separated by spaces.

35 consonant letters are used for Telugu and one vocalic letter. ❯ consonants

Consonant clusters at any location are normally indicated using the virama between consonants. This results in a large number of conjunct forms expressed using half-forms, stacked consonants, and ligated glyphs. Occasionally, a visible virama is used. ❯ clusters

Conjuncts are fairly regular and nearly always consist of a full-form initial consonant followed by a subjoined version of the second. The subjoined version loses the hook, and in about 50% of cases is transformed. Many subjoined forms rise above the baseline to the right of the initial consonant, but any vowel signs attached to the cluster appear above or to the right of the initial consonant (which may be between the two consonant glyphs in the latter case). Some conjuncts are formed from conjoined pairs where the second letter is reduced and extends below the baseline. Gemination is quite common.

As part of a cluster, RA is formed in the same way as other conjuct members in the modern orthography, however historically there was a special behaviour.

Syllable-final consonant sounds may be represented by 2 dedicated combining marks (anusvara & visarga), but are generally ordinary consonants that are not marked by a virama. ❯ finals

The Hindi orthography has an inherent vowel, and represents vowels using 11 vowel signs, including no pre-bases and 1 circumgraph. All vowel signs are combining marks, and are stored after the base character. ❯ vowels

There are 12 independent vowels, one for each vowel sound, including the inherent vowel, and these are used to write all standalone vowel sounds. ❯ standalone

There are no composite vowels, in principle. However, the circumgraph decomposes in NFD.

Telugu has native number digits but doesn't often use them in modern texts. Until the adoption of the metric system, Telugu used a complicated system for writing fractions, with dedicated symbols that were combined in various ways. ❯ numbers

Structure

The typical unit of the orthography is the orthographic syllable, consisting of a consonant and vowel (CV) core and, optionally, one or more preceding consonants. Consonant letters by themselves constitute a CV unit, where the V is an inherent vowel. A dependent vowel sign is used to represent the V in CV units where V is not the inherent vowel.

In some cases, diacritics can be used to represent a syllable-final nasal.

Telugu words almost always end in short vowels, although modern Telugu allows m, n, y, and w to end a word, and loan words sometimes end with long vowels.wl,#Phonology

Geminate consonants are also a common feature of Telugu, mostly in word-medial positions, and are distinctive,wl,#Phonology eg. compare గది గద్ది

Unlike some other Dravidian languages, voiced sounds were always part of the Telugu language.wl,#Phonology

Retroflex consonants appear initially (apart from ɳ and ɭ), and medially, where they may be part of a retroflex cluster.wl,#Phonology

j occurs in word-initial position only in borrowed words, such as jɐnɡu young.wl,#Phonology

Vowels

There are 12 independent vowels, one for each vowel sound, including the inherent vowel, and these are used to write all standalone vowel sounds.

There are no composite vowels, in principle. However, the circumgraph decomposes in NFD.

Inherent vowel

a following a consonant is not written, but is seen as an inherent part of the consonant letter.

క ka [U+0C15 TELUGU LETTER KA]

Vowel signs

Non-inherent vowel sounds that follow a consonant are mostly represented using vowel signs, eg.

కి ki [U+0C15 TELUGU LETTER KA + U+0C3F TELUGU VOWEL SIGN I]

Telugu vowel signs are all combining characters, and in principle a single code point is used per vowel sign (however, see vs_encoding). All vowel signs are stored after the base consonant, and the rendering process puts them in the correct place for display. This also applies for the circumgraph, where a single code point produces glyphs on more than one side of the consonant base. There is no pre-base vowel.

Most vowel signs interact typographically with their base consonant, replacing the check mark above the base. A few also produce slightly different joined shapes. See shaping.

Two vowel signs are spacing marks, meaning that they consume horizontal space when added to a base consonant.

When a vowel is pronounced after a conjunct, the vowel sign is attached to the first consonant in the cluster, even where the vowel sign pushes the two consonant glyphs apart (see fig_vowel_position).

స్స్పూ — In the sequence స్పి s͓pi the i (coloured) appears above the s, and in స్పూ s͓pū the u (coloured) appears to the right of the s (not the p!).

In the sequence స్పి s͓pi the i (coloured) appears above the s, and in స్పూ s͓pū the u (coloured) appears to the right of the s (not the p!).

The text is still entered following the spoken order: it is incorrect to type the vowel sign after the first consonant. (Click on the image in fig_vowel_position to see the composite code points, arranged in order of storage.)

Combining marks used for vowels

Telugu uses the following dedicated combining marks for vowels. They may be used on their own, or in combination with others (see circumgraphs).

ి␣ీ␣ు␣ూ␣ె␣ే␣ొ␣ో␣ా␣ ␣ై␣ౌ

Circumgraph

ై

One vowel is produced by a single combining character with visually separate parts, that appears on opposite sides of the consonant onset eg. కై.

Encoding. The circumgraph can be written as a single character, or as two.

ై [U+0C48 TELUGU VOWEL SIGN AI]
ై [U+0C46 TELUGU VOWEL SIGN E + U+0C56 TELUGU AI LENGTH MARK]

The single code point per vowel sign is the form preferred by the Unicode Standard and the form in common use for Telugu. The parts are separated, however, in Unicode when normalised using Normalisation Form D (NFD).

Whichever approach is used, the vowel signs must be typed and stored after the consonant characters they surround. In the case of decomposed vowel signs, the order is also important and must be as shown above.

Vowel sign placement

The following list shows where vowel signs are positioned around a base consonant to produce vowels for precomposed text, and how many instances of that pattern there are. The figure after the + sign represents combinations of Unicode characters,

2 post-base, eg. కు ku
8 superscript, eg. కి ki
1 super+subscript. eg. కై kaʲ

At maximum, vowel components can occur concurrently on 2 sides of the base.

Vowel absence

In traditional Telugu there is always a vowel sound at the end of a word, however in the modern language it is possible for m, n, y, and w to end a word.wl,#Phonology

Telugu uses ్ [U+0C4D TELUGU SIGN VIRAMA] (called virāmamu in Telugu) to show that the inherent vowel after a consonant is not pronounced, eg. క్ [U+0C15 TELUGU LETTER KA + U+0C4D TELUGU SIGN VIRAMA] explicitly represents just the sound k.

The virama is usually hidden when the consonant is part of a consonant cluster.

Standalone vowels

Telugu represents standalone vowels using a set of independent vowel letters. The set includes a character to represent the inherent vowel sound.

ఇ␣ఈ␣ఉ␣ఊ␣ఎ␣ఏ␣ఒ␣ఓ␣అ␣ఆ␣ ␣ఐ␣ఔ

Consonants

35 consonant letters are used for Telugu and one vocalic letter.

As part of a cluster, RA is formed in the same way as other conjuct members in the modern orthography, however historically there was a special behaviour.

Syllable-final consonant sounds may be represented by 2 dedicated combining marks (anusvara & visarga), but are generally ordinary consonants that are not marked by a virama.

Basic consonants

ప␣బ␣ఫ␣భ␣త␣ద␣థ␣ధ␣ట␣డ␣ఠ␣ఢ␣క␣గ␣ఖ␣ఘ

చ␣జ␣ఛ␣ఝ

స␣శ␣ష␣హ

మ␣న␣ఞ␣ణ␣ఙ

వ␣ర␣ఱ␣ల␣ళ␣య

There doesn't appear to be a method of extending the repertoire to cover non-native sounds (other than those used for Sanskrit), such as using a nukta or similar approach.

Final consonants

There are several ways of representing syllable-final nasals.

Nasals can be written using ం [U+0C02 TELUGU SIGN ANUSVARA]. Before a plosive this is pronounced as a homorganic nasal, eg. అంగము It is pronounced m when followed by a non-plosive consonant, or at the end of a word,d,415 eg. సింహ లగాం

Syllable final h (usually pronounced ha) can be written with ః [U+0C03 TELUGU SIGN VISARGA], which is principally used for Sanskrit words, eg. పునః

The Unicode Standard describes a final n sound that is represented by న్ [U+0C28 TELUGU LETTER NA + U+0C4D TELUGU SIGN VIRAMA], but there is an older shape called nakāra pollu which can be produced, if the font supports it. The difference is only font based. To prevent the virama joining the final n with an immediately following consonant, use ‌ U+200C ZERO WIDTH NON-JOINER after the virama.u,500

Consonant clusters

The absence of a vowel sound between two or more consonants is visually indicated by using conjunct forms, where the second consonant is a subscript. There are some variant behaviours here:

Simple stacking : A reduced-size version of the 2nd consonant has any headstroke removed and is simply positioned below the 1st.
Stacking with transformations : An alternative shape for the 2nd consonant is positioned directly below the 1st.
Conjoined with reduced forms : The shape of the 2nd consonant is transformed into something that sits alongside the 1st, but extends above and below the baseline.

Conjunct formation

See a table of 2-consonant clusters.
The table allows you to test results for various fonts.

్

In Unicode, the transformation of the 2nd consonant is achieved by adding ్ [U+0C4D TELUGU SIGN VIRAMA] between the consonants. The font hides the glyph automatically.

Simple stacking

The following table shows consonants subjoined below themselves that are simply reduced in size, stripped of any headstroke, and positioned directly below the initial consonant.

ద్ద␣థ్థ␣ధ్ధ␣ట్ట␣డ్డ␣ఠ్ఠ␣ఢ్ఢ␣గ్గ␣ఖ్ఖ␣ఘ్ఘ␣ష్ష␣హ్హ␣ఞ్ఞ␣ణ్ణ␣ఙ్ఙ␣ఱ్ఱ␣ఴ్ఴ

ష [U+0C37 TELUGU LETTER SSA] is irregular when it follows క [U+0C15 TELUGU LETTER KA]. Unlike other combinations, that produces a transformed subjoined glyph, ie. క్ష k͓ʂ

Stacking with transformations

The following consonants are also subjoined below the first, but are rendered with a very different shape.

త్త␣ర్ర␣ల్ల

Conjoined with reduced, spacing form

The second consonant in these clusters is transformed into a spacing glyph that extends above and below the baseline.

ప్ప␣బ్బ␣ఫ్ఫ␣భ్భ␣క్క␣స్స␣శ్శ␣మ్మ␣న్న␣వ్వ␣ళ్ళ␣య్య

In a cluster of this kind, any vowel sign is attached to the initial consonant letter, eg. అన్నీ

Reph form of RA

In modern Telugu, clusters that begin with r are formed in the normal way, eg. గూర్చి

In older texts, however, the r is represented as a special form to the right of the second consonant, and the second consonant is left intact and carries any vowel signs. See fig_gurci for an example. This effect can be produced in Unicode using ‍ U+200D ZERO WIDTH JOINER immediately after the virama, eg. in this case, ర్‍చ [U+0C30 TELUGU LETTER RA + U+0C4D TELUGU SIGN VIRAMA + U+200D ZERO WIDTH JOINER + U+0C1A TELUGU LETTER CA], although not all fonts support it.

In a font that shows the reph form by default, it should be possible to disable it using the ZWJ before the virama instead of after it, as long as the font supports it.

Syllable-final diacritics

As mentioned in finals, Telugu represents some final consonants using diacritics. Such syllable-final diacritics are followed by ordinary consonant shapes in consonant clusters.

Encoding choices

Visually, several of the standalone vowels and some vowel signs look as it they could be composed of smaller parts. This section compares approaches and considers the relevance of Unicode Normalisation Form D (NFD) and Unicode Normalisation Form C (NFC) to give guidance on which approach is best.

Vowel signs

The vowel signs for iː, eː, and oː should each be written with one of the single code points listed above. Those code points do not decompose in NFD. The Unicode Standard warns not to make up the shapes using combinations of characters.

Use	Do not use!
ీ [U+0C40 TELUGU VOWEL SIGN II]	ి + ౕ [U+0C3F TELUGU VOWEL SIGN I + U+0C55 TELUGU LENGTH MARK]
ే [U+0C47 TELUGU VOWEL SIGN EE]	ె + ౕ [U+0C46 TELUGU VOWEL SIGN E + U+0C55 TELUGU LENGTH MARK]
ో [U+0C4B TELUGU VOWEL SIGN OO]	ొ + ౕ [U+0C4A TELUGU VOWEL SIGN O + U+0C55 TELUGU LENGTH MARK]

The vowel sign for ai decomposes and recomposes during normalisation. It is therefore possible to find it encoded as either ై [U+0C48 TELUGU VOWEL SIGN AI] or ె + ౖ [U+0C46 TELUGU VOWEL SIGN E + U+0C56 TELUGU AI LENGTH MARK]. It is normal to use the precomposed version.

Precomposed (recommended)	Decomposed
ై [U+0C48 TELUGU VOWEL SIGN AI]	ె + ౖ [U+0C46 TELUGU VOWEL SIGN E + U+0C56 TELUGU AI LENGTH MARK]

ౖ [U+0C56 TELUGU AI LENGTH MARK] is only used in decomposed versions of ై [U+0C48 TELUGU VOWEL SIGN AI].

Independent vowels

The independent letters also should each be written with one of the single code points listed just above. The Unicode Standard warns not to make up the shapes using a combination of characters for aw and oː.

Use	Do not use!
ఓ [U+0C13 TELUGU LETTER OO]	ఒ + ౕ [U+0C12 TELUGU LETTER O + U+0C55 TELUGU LENGTH MARK]
ఔ [U+0C14 TELUGU LETTER AU]	ఒ + ౌ [U+0C12 TELUGU LETTER O + U+0C4C TELUGU VOWEL SIGN AU]

Numbers

Digits

Telugu has native digits, but they are only used infrequently.

౦␣౧␣౨␣౩␣౪␣౫␣౬␣౭␣౮␣౯

The CLDR standard-decimal pattern is #,##,##0.###. The standard-percent pattern is #,##0%.

Fractions

The Unicode Standard describes the use of fraction characters as follows.

Prior to the adoption of the metric system, Telugu fractions were used as part of the system of measurement. Telugu fractions are quaternary (base-4), and use eight marks, which are conceptually divided into two sets. The first set represents odd-numbered negative powers of four in fractions. The second set represents even-numbered negative powers of four in fractions. Different zeros are used with each set. The zero from the first set is known as hakki, ౸ [U+0C78 TELUGU FRACTION DIGIT ZERO FOR ODD POWERS OF FOUR]. The zero for the second set is ౦ [U+0C66 TELUGU DIGIT ZERO].u,501

౸␣౹␣౺␣౻␣౼␣౽␣౾␣౿

Currency

The CLDR standard format for currency is ¤#,##,##0.00, and the symbol for the Indian rupee is ₹ [U+20B9 INDIAN RUPEE SIGN]

₹

Glyph shaping & positioning

This section brings together information about the following topics: writing styles; cursive text; context-based shaping; context-based positioning; baselines, line height, etc.; font styles; case & other character transforms.

You can experiment with examples using the Telugu character app.

Telugu has no special requirements for baseline alignment between mixed scripts and in general.

The orthography has no case distinction, and no special transforms are needed to convert between characters.

Context-based shaping & positioning

Consonant headstrokes

A majority of the Telugu consonants have a v-shaped headstroke, which is equivalent to the horizontal line in Devanagari. This headstroke is replaced when marks appear above the consonant, eg. క [U+0C15 TELUGU LETTER KA] becomes క్ when followed by an explicit virama and కి when followed by the i vowel sign.u,499

Out of 36 consonants, the following are the 9 that don't have a headstroke:

ఖ␣ఙ␣జ␣ఞ␣ట␣ణ␣బ␣ల␣ఱ

Vowel signs

Most vowel signs interact typographically with their base consonant, replacing the check mark above the base.

A few also produce slightly different joined shapes. For example, in addition to the standard కి kiి [U+0C3F TELUGU VOWEL SIGN I] produces shapes such as the following: గి giచి ciAlso, ొ [U+0C4A TELUGU VOWEL SIGN O] and ో [U+0C4B TELUGU VOWEL SIGN OO] adopt very different shapes when they follow m or y, ie. మొ moమో mōయొ yoయో yō

Context-based positioning

Vowel signs need to be correctly positioned relative to the base character, and multiple marks can be combined with a single base character, eg. in అర్పించాలి ạr͓pim̽cāli offer the i vowel sign needs to be positioned over the initial consonant in the cluster, even though it occurs after the second in memory. The glyph also has to be carefully positioned with respect to the base character it is attached to.

That example also shows the use of multiple diacritics attached to the same base consonant.

See also the special positioning rules described in reph.

Explicit shaping controls

‌ U+200C ZERO WIDTH NON-JOINER (ZWNJ) is used to prevent the formation of a conjunct (see finals).

‍ U+200D ZERO WIDTH JOINER (ZWJ) is used to control font glyph selection (see reph).

Font styling & weight

tbd

Punctuation & inline features

Word boundaries

Words are separated by spaces.

Phrase & section boundaries

,␣:␣;␣.␣?␣!␣।␣॥

Telugu uses ASCII punctuation, but may also use a couple of indic punctuation marks.

phrase	, [U+002C COMMA] ; [U+003B SEMICOLON] : [U+003A COLON] । [U+0964 DEVANAGARI DANDA] (infrequent)
sentence	. [U+002E FULL STOP] ? [U+003F QUESTION MARK] ! [U+0021 EXCLAMATION MARK] ॥ [U+0965 DEVANAGARI DOUBLE DANDA] (infrequent)

phrase

, [U+002C COMMA]

; [U+003B SEMICOLON]

: [U+003A COLON]

। [U+0964 DEVANAGARI DANDA] (infrequent)

sentence

. [U+002E FULL STOP]

? [U+003F QUESTION MARK]

! [U+0021 EXCLAMATION MARK]

॥ [U+0965 DEVANAGARI DOUBLE DANDA] (infrequent)

। [U+0964 DEVANAGARI DANDA] and ॥ [U+0965 DEVANAGARI DOUBLE DANDA] are used primarily in the domain of religious texts to indicate the equivalent of a comma and full stop, respectively.u,501

Bracketed text

(␣)

Telugu commonly uses ASCII parentheses to insert parenthetical information into text.

	start	end
standard	( [U+0028 LEFT PARENTHESIS]	) [U+0029 RIGHT PARENTHESIS]

Quotations & citations

‘␣’␣“␣”

Telugu texts use quotation marks around quotations. Of course, due to keyboard design, quotations may also be surrounded by ASCII double and single quote marks.

	start	end
initial	“ [U+201C LEFT DOUBLE QUOTATION MARK]	” [U+201D RIGHT DOUBLE QUOTATION MARK]
nested	‘ [U+2018 LEFT SINGLE QUOTATION MARK]	’ [U+2019 RIGHT SINGLE QUOTATION MARK]

Single quotation marks are used for quotations within quotations.

Emphasis

tbd

Abbreviation, ellipsis & repetition

tbd

Inline notes & annotations

tbd

Other punctuation

tbd

Other inline text decoration

tbd

Line & paragraph layout

Line breaking & hyphenation

Spaces provide the main line break opportunities, however Telugu is an agglutinative language and Telugu words can be long. This can lead to large gaps during justification, and sometimes words that are longer than the available column width, so it is desirable to also hyphenate words.

Show (default) line-breaking properties for characters in the modern Telugu orthography.

Hyphenation

Because of the length of Telugu words, hyphenation is very common and needed during layout, especially in narrow columns, such as newsprint.

Hyphenation mostly takes place at syllable boundaries, however there are also occasional exceptions and special cases.

Observation:InDesign produces a hyphen at the end of a line to mark that hyphenation has split a word.

Newspaper clipping — Hyphenated Telugu text as produced by InDesign.

Line-edge rules

Like most writing systems, certain characters are expected not to start or end a line. For example, periods and commas shouldn't start a line, and opening parentheses shouldn't end a line.

Text alignment & justification

tbd

Text spacing

tbd

This section looks at ways in which spacing is applied between characters over and above that which is introduced during justification.

Baselines, line height, etc.

tbd

Telugu uses the so-called 'alphabetic' baseline, which is the same as for Latin and many other scripts.

Counters, lists, etc.

You can experiment with counter styles using the Counter styles converter. Patterns for using these styles in CSS can be found in Ready-made Counter Styles, and we use the names of those patterns here to refer to the various styles.

The modern Telugu orthography uses a native numeric style.

Numeric

The telugu numeric style is decimal-based and uses these digits.rmcs

౧␣౨␣౩␣౪␣౫␣౬␣౭␣౮␣౯␣౦

Examples:

౧␣౨␣౩␣౪␣౧౧␣౨౨␣౩౩␣౪౪␣౧౧౧␣౨౨౨␣౩౩౩␣౪౪౪

Prefixes and suffixes

Telugu commonly uses a full stop + space as a suffix.

Examples:

౧. ౨. ౩. ౪. ౫.

Separator for Telugu list counters: full stop + space.

Styling initials

tbd

	labial	dental	alveolar	post- alveolar	retroflex	palatal	velar	glottal
stops	p b	t d			ʈ ɖ	c ɟ	k ɡ	ʔ
aspirated	pʰ bʰ	tʰ dʰ			ʈʰ ɖʰ		kʰ ɡʰ
affricates			t͡s d͡z	t͡ʃ d͡ʒ
aspirated				t͡ʃʰ d͡ʒʰ
fricatives	f		s	ʃ ɕ	ʂ			h
nasals	m		n		ɳ	ɲ	ŋ
approximants	ʋ		l		ɭ	j
trills/flaps			ɾ

Telugu (draft) Telugu

Sample

Usage & history

Orthographic development & variants

Basic features

Character index

Letters

Basic consonants

Vowels

Vocalics

Not used for modern Telugu

Combining marks

Vowels

Vocalics

Bindu

Virama

Visarga

Numbers

Not used for modern Telugu

Punctuation

ASCII

Symbols

Not used for modern Telugu

Other

Structure

Phonology

Vowel sounds

Plain vowels.

Diphthongs

Consonant sounds

Vowels

Inherent vowel

Vowel signs

Combining marks used for vowels

Circumgraph

Vowel sign placement

Vowel absence

Standalone vowels

Vocalics

Consonants

Basic consonants

Final consonants

Consonant clusters

Conjunct formation

Simple stacking

Stacking with transformations

Conjoined with reduced, spacing form

Reph form of RA

Syllable-final diacritics

Encoding choices

Vowel signs

Independent vowels

Other letters

Numbers

Digits

Fractions

Currency

Text direction

Glyph shaping & positioning

Context-based shaping & positioning

Consonant headstrokes

Vowel signs

Context-based positioning

Explicit shaping controls

Font styling & weight

Graphemes

Grapheme clusters

Punctuation & inline features

Word boundaries

Phrase & section boundaries

Bracketed text

Quotations & citations

Emphasis

Abbreviation, ellipsis & repetition

Inline notes & annotations

Other punctuation

Other inline text decoration

Line & paragraph layout

Line breaking & hyphenation

Hyphenation

Telugu (draft)
Telugu