Updated 08-Apr-2019 • tags devanagari, scriptnotes
This page provides basic information about the Devanagari script. It is not authoritative, peer-reviewed information – these are just notes I have gathered or copied from various places as i learned. For character-specific details follow the links to the Devanagari character notes.
For similar information related to other scripts, see the Script comparison table.
अनुच्छेद १. सभी मनुष्यों को गौरव और अधिकारों के मामले में जन्मजात स्वतन्त्रता और समानता प्राप्त है । उन्हें बुद्धि और अन्तरात्मा की देन प्राप्त है और परस्पर उन्हें भाईचारे के भाव से बर्ताव करना चाहिए ।
अनुच्छेद २. सभी को इस घोषणा में सन्निहित सभी अधिकारों और आज़ादियों को प्राप्त करने का हक़ है और इस मामले में जाति, वर्ण, लिंग, भाषा, धर्म, राजनीति या अन्य विचार-प्रणाली, किसी देश या समाज विशेष में जन्म, सम्पत्ति या किसी प्रकार की अन्य मर्यादा आदि के कारण भेदभाव का विचार न किया जाएगा । इसके अतिरिक्त, चाहे कोई देश या प्रदेश स्वतन्त्र हो, संरक्षित हो, या स्त्रशासन रहित हो या परिमित प्रभुसत्ता वाला हो, उस देश या प्रदेश की राजनैतिक, क्षेत्रीय या अन्तर्राष्ट्रीय स्थिति के आधार पर वहां के निवासियों के प्रति कोई फ़रक़ न रखा जाएगा ।
Devanagari is a Northern Brahmic script related to many other South Asian scripts including Gujarati, Bengali, and Gurmukhi, and, more distantly, to a number of South-East Asian scripts including Thai, Balinese, and Baybayin. The script is used for over 120 spoken Indo-Aryan languages, including Hindi, Nepali, Marathi, Maithili, Awadhi, Newari and Bhojpuri. It is also used for writing Classical Sanskrit texts. Generally the orthography of the script reflects the pronunciation of the language.
Devanagari (/ˌdeɪvəˈnɑːɡəri/ DAY-və-NAH-gə-ree; देवनागरी, IAST: Devanāgarī, a compound of "deva" देव and "nāgarī" नागरी; Hindi pronunciation: [d̪eːʋˈnaːɡri]), also called Nagari (Nāgarī, नागरी), is an abugida (alphasyllabary) used in India and Nepal. It is written from left to right, has a strong preference for symmetrical rounded shapes within squared outlines, and is recognisable by a horizontal line that runs along the top of full letters. In a cursory look, the Devanagari script appears different from other Indic scripts such as Bengali-Assamese, Odia, or Gurmukhi, but a closer examination reveals they are very similar except for angles and structural emphasis. ...
The Devanagari script is used for over 120 languages, making it one of the most used and adopted writing systems in the world. Among the languages using it – as either their only script or one of their scripts – are Awadhi, Bhili, Bhojpuri, Bodo, Chhattisgarhi, Dogri, Haryanvi, Hindi, Kashmiri, Konkani, Magahi, Maithili, Marathi, Mundari, Nepalbhasa, Nepali, Pali, Rajasthani, Sanskrit, Santali and Sindhi. The Devanagari script is closely related to the Nandinagari script commonly found in numerous ancient manuscripts of South India, and it is distantly related to a number of southeast Asian scripts.
Devanagari is an abugida. Consonant letters have an inherent vowel sound. Combining vowel-signs are attached to the consonant to indicate that a different vowel follows the consonant. See the table in the right-hand column for a brief overview of features, taken from the Script Comparison Table.
Actually, devanagari is based on orthographic syllables, so the vowel-sign is in fact attached to the syllable. An orthographic syllable includes clusters of consonants without intervening vowel sounds. These clusters are typically represented as partially merged forms, called conjuncts.
The devanagari block contains more characters than other indic scripts, partly because it serves as a pivot script for transliterations of other scripts.
The effective unit of the writing systems is the orthographic syllable, consisting of a consonant and vowel (CV) core and, optionally, one or more preceding consonants, with a canonical structure of (((C)C)C)V. u
Consonant letters by themselves constitute a CV unit, where the V is an inherent vowel, whose exact phonetic value may vary by writing system. Independent vowels also constitute a CV unit, where the C is considered to be null. A dependent vowel sign is used to represent a V in CV units where C is not null and V is not the inherent vowel. u
In some cases, a phonological diphthong (such as Hindi जाओ ɟāọ̄) is actually written as two orthographic CV units, where the second of these units is an independent vowel letter. u
Two diacritics (generally classified as vowels) can be used to represent a syllable-final nasal or an unvoiced aspiration. Medial consonants are catered for by the consonant cluster model. Diacritics are also used to nasalise vowel sounds.
The Devanagari script characters in Unicode 10.0 are contained in 2 blocks (not counting shared characters, such as punctuation):
The following links give information about characters used for languages associated with this script. The numbers in parentheses are for non-ASCII characters.
For character-specific details see Devanagari character notes.
Text is normally written horizontally, left to right.
Each consonant symbol has an inherent following vowel sound, typically transcribed as a, and pronounced ə. So क [U+0915 DEVANAGARI LETTER KA] is actually pronounced kə.
The inherent vowel is not always pronounced. For example in Hindi it is not usually pronounced at the end of a word, although a ghost echo may appear after a word-final cluster of consonants, eg. योग्य yōg͓y jogjᵊ, or राष्ट्र rāʂ͓ʈ͓r ɾəstɾᵊ.
In addition Hindi has a general rule that when a word has three or more syllables and ends in a vowel other than the inherent a, the penultimate vowel is not pronounced, eg. समझ smjʰ səməɟʰ but समझा smjʰā səmɟʰaː; रहन rhn rəhən but रहना rhnā rəhnaː. (For a number of reasons, however, this rule does not always hold.)
Devanagari uses ् [U+094D DEVANAGARI SIGN VIRAMA] (called halant in Hindi) to kill the inherent vowel after a consonant. The virama is rarely seen. As just mentioned, no virama is used at the end of a word, or in the penultimate syllable where the above rules apply. The virama is also usually hidden when the consonant is part of a consonant cluster (see clusters). The virama is visible, however, if it isn't followed by a consonant, eg. क् k͓ explicitly represents just the sound k.
To produce a different vowel than the inherent one, Devanagari attaches vowel-signs (Sanskrit matra) to the preceding consonant, eg. की kiː.
For Devanagari, this is always a single character, is always combining, and is always typed and stored after the preceding consonant.
Vowel-signs used for Hindi:
CLDR also includes the following:
The other vowel-signs in the Unicode Devanagari block:
Half the vowel-signs are spacing combining characters, meaning that they consume horizontal space when added to a base consonant.
There are no vowel-signs with multiple parts, and only one vowel-sign appears per base consonant.
All vowel-signs are typed and stored after the base consonant, whether or not they precede it when displayed. The font takes care of the glyph positioning.
See also vocalics.
ँ [U+0901 DEVANAGARI SIGN CANDRABINDU] nasalises the vowel in a syllable, eg. मुँह muŋ̽h mouth. Any vowel in Hindi can be nasalised in this way, except for the vocalics. s
When a vowel-sign rises above the head line, the glyph for this character may be simplified to just a dot (which resembles the anusvara). s It appears that this is normally achieved by using an anusvara instead of candrabindu, eg. हैं hɛ̄m̽ ɦɛ̃ are.
The following list shows where vowel-signs are positioned around a base consonant to produce vowels, and how many instances of that pattern there are. The number used by Hindi is shown in parentheses.
Devanagari represents standalone vowels using a set of independent vowel letters. The set contains a character to represent the inherent vowel sound.
Independent vowels used by Hindi:
CLDR also includes the following:
The other independent vowels in the Unicode Devanagari block:
Note the difference between नई nị̄ (nəị̄) new and नी nī.
In Devanagari, vocalics are available both as vowel-signs and independent vowels.
Hindi generally uses just one vocalic.
Other vocalics are used for Sanskrit.
Basic set of consonants, used for Hindi and Sanskrit. (Phonetic information for Hindi.)
Additional consonants in the Unicode block encoded as single characters.
़ [U+093C DEVANAGARI SIGN NUKTA] is used to represent foreign sounds, eg. in ख़ारीदारी kʰˑārīdārī (xārīdārī) shopping, the dot changes ख kʰ to ख़ x. Here is a list of graphemes used in Hindi that combine nukta with an existing consonant.
And here are some additional combinations:
The nukta should always be typed and stored immediately after the consonant it modifies, and before any combining vowels or diacritics.
The way the Unicode Standard recommends to represent these graphemes with code points is a little complicated.
In practise, it's hard to envisage content authors being aware of, let alone respecting these rules. Keyboards or other input mechanisms, or perhaps sometimes normalising applications, can help, but it's likely that Devanagari text will always contain a mixture of forms for these graphemes.
The Unicode block contains the following precomposed code points.
Although traditionally classified as vowels, 2 diacritics represent syllable-final consonant sounds.
ं [U+0902 DEVANAGARI SIGN ANUSVARA] represents a nasal that is homorganic with a following consonant. It is positioned over the previous consonant, eg. हिंदी him̽dī hindiː Hindi.
Most words that use the anusvara can also be written using the consonant itself, eg. हिन्दी hin͓dī, although in some cases the anusvara form is more common. For example, पंजाब pm̽jāb pañjāb Punjab is much more common than पञ्जाब pɲ͓jāb.
ः [U+0903 DEVANAGARI SIGN VISARGA] represents a voiceless h used after a vowel, usually at the end of a word, eg. छः cʰh̽ (cʰəh̽) six, दुःख duh̽kʰ grief. Mostly limited to Sanskrit loan words.
See also the candrabindu diacritic, which nasalises a vowel.
The absence of a vowel sound between two or more consonants can be visually indicated in one of the following ways.
In all cases except the last, the underlying mechanism in terms of codepoints involves adding ् [U+094D DEVANAGARI SIGN VIRAMA] between the consonants in the cluster, eg. क्ष is produced by the sequence क + ् + ष [U+0915 DEVANAGARI LETTER KA + U+094D DEVANAGARI SIGN VIRAMA + U+0937 DEVANAGARI LETTER SSA].
The font usually determines which visual method is used, although it is possible to influence this (see below).
A half-form is typically created by removing the vertical line in the consonant shape, where there is one. (The vertical line is associated with the inherent vowel, and around two-thirds of Devanagari consonant shapes contain one.) There is often some additional tweaking of glyphs in order to join the components neatly.
This is more common for Sanskrit, and few modern fonts reorder glyphs in this way, or do so for a limited number of combinations.
Typically, only a small number of clusters are combined in a way that makes it difficult to spot the component parts. This is, however, the default for two particular clusters: क्ष k͓ʂ and ज्ञ ɟ͓ɲ.
When ra follows another consonant, it is typically rendered as a small, diagonal line to the left, eg. क्र ग्र भ्र. After 6 consonants, however, it is rendered as an upside-down v shape below, ie. ट्र ठ्र ड्र ढ्र ड़्र छ्र. After त it produces त्र.
When ra precedes another consonant, it is rendered as a small hook above the vertical line in the cluster, eg. र्क r͓k and र्ल r͓l. Where it precedes a cluster using half-forms, it is aligned with the vertical line of the trailing consonant, eg. र्स्प r͓s͓p. However, if there is a spacing vowel-sign with a vertical line to the right of the cluster, it aligns with that, eg. र्का r͓kā, and र्की r͓kī. (This illustrates how the basic units of the script are orthographic syllables.)
The ability to form conjuncts depends on the richness of the font. Where a font is not able to produce a half-form or ligature, etc., it will leave a visible virama glyph below the initial consonant(s) to indicate the missing vowel sound, eg. क्क k͓k.
An important consequence of representing clusters in this way is that the syllable boundaries are different. For example, if we follow the cluster with a left-positioned vowel-sign, it will now appear after the virama, rather than before the cluster, eg. compare क्कि and क्कि. This change is also reflected in segmentation of the text for line-breaking, inter-character spacing, etc.
A visible virama may also be used with a single consonant, to indicate that it is to be pronounced without the inherent vowel, eg. क् k.
The examples in the previous subsection used U+200C ZERO WIDTH NON-JOINER to force the production of a visible virama, rather than a half-form. For example:
U+200D ZERO WIDTH JOINER can be used to produce a half-form, such as क्ष, rather than क्ष. It can also be used to produce standalone half-forms (for educational text) such as घ्.
CLDR lists the following additional characters with the general property of letter.
Used for vowel elision in Sanskrit, the main function of ऽ [U+093D DEVANAGARI SIGN AVAGRAHA] in Hindi is to show that a vowel is sustained in a cry or a shout, eg. आईऽऽऽ! ạ̄ị̄´´´! w.
ॐ [U+0950 DEVANAGARI OM] is a religious symbol.
Apart from the vowel-signs, the commonly-used combining marks in Hindi include the following. These are all described earlier. Follow the links for more information.
The Devanagari block also contains the following diacritics.
The Unicode Devanagari block contains 3 punctuation marks.
See phrase and abbrev for more information.
Devanagari has a set of digits, that can be referred to as 'hindi' numerals. They are used regularly.
An interesting feature of large numbers written in India is that they apply groupings of two, rather than three, digits between commas (even when using european digits).
₹ [U+20B9 INDIAN RUPEE SIGN] is the symbol introduced by the Government of India in 2010 as the official currency symbol for the Indian rupee (INR). u
It is distinguished from ₨ [U+20A8 RUPEE SIGN], which is an older symbol not formally tied to any particular currency. u Follow that link for more information about the rupee.
Devanagari is not cursive (like Arabic), and there are no character transforms needed (such as uppercase to lowercase functions).
The shape of a character when displayed can vary, often dramatically, according to the context.
One very common example in most indic scripts is the handling of 'conjunct consonants', ie. groups of consonants with no intervening vowel sounds. Since consonants in indic scripts have an inherent vowel sound, when two consonants are combined this way you have to indicate that the vowel of the initial consonant is suppressed. This is normally done by altering the shape of the first consonant, or merging the shape of the two consonants.
To tell the font to do this, in Unicode you add ् [U+094D DEVANAGARI SIGN VIRAMA] between the two consonants. This produces the change in the shapes of the glyphs that indicates to the reader that this is a conjunct. The actual outcome is font dependent. For the word below which contains a conjunct of two ल [U+0932 DEVANAGARI LETTER LA] characters (making a long L sound) you may see a 'half-form' used for the first LA (shown on the left) or you may see (as shown on the right) a ligated form.
There are other types of context-based shaping, which are font specific. One is shown below. The width of the glyph for ि [U+093F DEVANAGARI VOWEL SIGN I] differs according to the base character to which it is attached.
Diacritics regularly combine with a vowel-sign attached to the same consonant or consonant cluster. The example below shows two combining characters that are positioned above the base character in a very common form of the verb 'to be'. One is ै [U+0948 DEVANAGARI VOWEL SIGN AI], and the other the nasalisation mark ं [U+0902 DEVANAGARI SIGN ANUSVARA].
Combining characters need to be placed in different positions, according to the context.
The example on the left below displays the dot (anusvara) immediately over the long vertical stroke. The example to the right has moved the dot slightly to the right in order to accomodate the vowel sign.
In the following the image to the left shows the normal position of ू [U+0942 DEVANAGARI VOWEL SIGN UU], beneath the first letter. The example on the right shows that character displayed higher up and to the right when combined with the base character र [U+0930 DEVANAGARI LETTER RA].
The basic grapheme for Devanagari text is the orthographic syllable, ie. one or more consonants (with virama if more than one), and a vowel. (What about finals?)
Some of the time, the Unicode concept of grapheme cluster covers the needs for segmentation of the text into text character units, but for the numerous cases where consonant clusters are used, the grapheme cluster stops after the virama.
This creates a problem because applications that rely on grapheme clusters to indicate boundaries for line-breaking, cursor movement, backspacing, vertical setting, etc. break syllabic units that should remain intact.
For Devanagari, and other north Indian scripts, applications need to provide tailored extensions to correctly segment the text at this level.
Words are separated by spaces.
Devanagari has hyphenated words – mainly conjoined nouns, eg. लाभ-हानि lābʰ-hāni profit-loss, and माता-पिता mātā-pitā parents. i
Devanagari uses standard Latin punctuation, but also has its own version of a full stop, । [U+0964 DEVANAGARI DANDA], which can be seen in the sample text above. (Although an ASCII full stop is also seen after the lead-in text for each article.)
For boundaries of text above the sentence level there is ॥ [U+0965 DEVANAGARI DOUBLE DANDA].
Both of these characters (using the same code point) are used in a number of other indic scripts.
॰ [U+0970 DEVANAGARI ABBREVIATION SIGN] is used to indicate abbreviations of words, eg. रुपया rupyā (rupayā) rupee can be abbreviated as रु॰.
Italicisation and bolding are not traditionally used for highlighting text in Devanagari.
Devanagari text can be hyphenated during line wrap.
Justification is done, principally, by adjusting the space between words. (I have no information about whether high-end systems also adjust inter-character spacing slightly if inter-word doesn't resolve the issue, or to improve aesthetics.)
Use the control below to see how your browser justifies the text sample here.
क़ानून की निग़ाह में सभी समान हैं और सभी बिना भेदभाव के समान क़ानूनी सुरक्षा के अधिकारी हैं । यदि इस घोषणा का अतिक्रमण करके कोई भी भेद-भाव किया जाया उस प्रकार के भेद-भाव को किसी प्रकार से उकसाया जाया, तो उसके विरुद्ध समान संरक्षण का अधिकार सभी को प्राप्त है ।
Devanagari content does sometimes enlarge the first part of the first word in a paragraph, in a similar way to drop caps. Instead of enlarging just the first letter in the word, it is normal to enlarge the first syllable.
In theory, the top line of the characters should align in the large text and the following first line, however many implementations I've seen don't deal well with such niceties.
It is very common to see such initial-syllable enlargement centred inside a coloured box.
Counter-styles are used to number lists. The Ready-made Counter Styles document contains a two counter styles for Devanagari.
One is numeric, and uses the decimal-based Hindi numerals, '०' '१' '२' '३' '४' '५' '६' '७' '८' '९'.
The other is alphabetic, and uses Devanagari letters in the following order, 'क' 'ख' 'ग' 'घ' 'ङ' 'च' 'छ' 'ज' 'झ' 'ञ' 'ट' 'ठ' 'ड' 'ढ' 'ण' 'त' 'थ' 'द' 'ध' 'न' 'प' 'फ' 'ब' 'भ' 'म' 'य' 'र' 'ल' 'व' 'श' 'ष' 'स' 'ह'.
You can experiment with these styles using the Counter styles converter.
Are there any special requirements for page layout or book binding?
Further information needed for this section includes:
Structural boundaries & markers Hyphens & dashes Bracketing information Quotations Abbreviations, ellipsis, & repetition Emphasis & highlights Inline notes & annotations Inline layout Inline text spacing Bidirectional text Line & paragraph layout Line breaking Hyphenation Text alignment & justification Baselines & inline alignment Page & book layout General page layout & progression Directional layout features Grids & tables Notes, footnotes, etc. Forms & user interaction Page numbering, running headers, etc.