Updated 6 July, 2020
This page gathers together basic information about the Gujarati script and its use for the Gujarati language. It aims (generally) to provide an overview of the orthography and typographic features, and (specifically) to advise how to write Gujarati using Unicode.
See also the companion document, Gujarati character notes, for detailed information about specific Unicode characters.
Phonetic transcriptions on this page should be treated as an approximate guide, only. Many are more phonemic than phonetic, and there may be variations depending on the source of the transcription.
અનુચ્છેદ ૧: પ્રતિષ્ઠા અને અધિકારોની દૃષ્ટિએ સર્વ માનવો જન્મથી સ્વતંત્ર અને સમાન હોય છે. તેમનામાં વિચારશક્તિ અને અંતઃકરણ હોય છે અને તેમણે પરસ્પર બંધુત્વની ભાવનાથી વર્તવું જોઇએ.
અનુચ્છેદ ૨: દરેક વ્યક્તિને જાતિ, રંગ, લિંગ, ભાષા, ધર્મે, રાજકીય અથવા બીજા અભિપ્રાય, રાષ્ટ્રીય અથવા સામાજિક ઉદ્ભવસ્થાન, મિલકત, જન્મ અથવા મોભા જેવા કોઇપણ જાતના ભેદભાવ વગર આ ધોષણામાં રજૂ કરવામાં આવેલા સધળા અધિકારો અને સ્વતંત્રતા ભોગવવાનો હક્ક છે. વધુમાં કોઇપણ વ્યક્તિ તે સ્વતંત્ર, ટ્રસ્ટ હેઠળના સ્વશાસન હેઠળ ન હોય તેવા અથવા સાર્વભામત્વની બીજી કોઇપણ મર્યાદા હેઠળ આવેલા દેશ અથવા પ્રદેશની હોય તો પણ રાજકીય, હફમવવિષયક અથવા આંતરરાષ્ટ્રીય મોભાના ધોરણે તેની સાથે કોઇપણ ભેદભાવ રાખવામાં આવશે નહિ.
The Gujarati script is used for writing the Gujarati and Chodri languages, together spoken by almost 47 million people. It is also used alongside the Devanagari script for writing a number of languages used by the Bhil people, one of India's largest indigenous groups. The script is related to Devanagari, with modifications to some of the letters, and without the headstroke which characterizes most of the Nagari scripts. The loss of the headstroke reflects the script's origins in informal writing; until the mid-19th century it was used primarily for bookkeeping and personal correspondence, but since printing facilities have become widely available to Gujarati speakers the script is used in schools, for printing books and newspapers, in government offices and public signage, and is one of the official scripts of India.
The Gujarati script was adapted from the Devanagari script to write the Gujarati language. Gujarati language and script developed in three distinct phases — 10th to 15th century, 15th to 17th century and 17th to 19th century. The first phase is marked by use of Prakrit, Apabramsa and its variants such as Paisaci, Shauraseni, Magadhi and Maharashtri. In second phase, Old Gujarati script was in wide use. The earliest known document in the Old Gujarati script is a handwritten manuscript Adi Parva dating from 1591–92, and the script first appeared in print in a 1797 advertisement. The third phase is the use of script developed for ease and fast writing. The use of shirorekha (the topline as in Sanskrit) was abandoned. Until the 19th century it was used mainly for writing letters and keeping accounts, while the Devanagari script was used for literature and academic writings. It is also known as the śarāphī (banker's), vāṇiāśāī (merchant's) or mahājanī (trader's) script. This script became basis for modern script. Later the same script was adopted by writers of manuscripts. Jain community also promoted its use for copying religious texts by hired writers.
The script is an abugida. Consonants carry an inherent vowel which can be modified by appending vowel-signs to the consonant. See the table to the right for a brief overview of features of the modern Gujarati orthography. (See the key. Character counts exclude ASCII characters.)
The following list describes some distinctive characteristics of the Gujarati script.
The absence of the inherent vowel is not always indicated. It is generally not pronounced at the end of a word, but also it is sometimes elided without indications within a word.
Some vowel-signs can be visually analysed into subcomponents, but Unicode usage involves only one combining character per consonant.
The consonant repertoire can be extended for transliteration of Arabic and Avestan texts.
Final consonant combining characters include the anusvara and visarga.
Gujarati text is written horizontally, left to right.
Click on the sound groups to see where else in the document each of the sounds are referred to.
Sourcewp. Phones in a lighter colour are non-native or allophones.
The inherent vowel is usually transcribed as a and pronounced ə. So ક is pronounced kə.
Other than the inherent vowel, vowel sounds that follow a consonant sound are represented using vowel-signs, eg. કી kī.
Gujarati vowel-signs are all combining characters. A single Unicode character is used per base consonant, and there are no vowel-signs with multiple parts. All vowel-signs are typed and stored after the base consonant, whether or not they precede it when displayed, and the font puts them in the correct place for display.
About half of the vowel-signs are spacing marks, meaning that they consume horizontal space when added to a base consonant.
There are separate vowel-signs for the historical short and long variants, but length is no longer distinctive in modern pronunciation. It is only found in metrical structures of verse.w
ૅ [U+0AC5 GUJARATI VOWEL SIGN CANDRA E] and ૉ [U+0AC9 GUJARATI VOWEL SIGN CANDRA O] are used to represent the English æ and ɔ sounds, respectively.w
See also vocalics.
ં [U+0A82 GUJARATI SIGN ANUSVARA] nasalises the vowel in a syllable, eg. મેં me˜ I.
The anusvara may also represent a nasal before a plosive.
The following list shows where vowel-signs are positioned around a base consonant to produce vowels, and how many instances of that pattern there are.
The inherent vowel is not always pronounced, even if there is no visual indication of its absence.
For example, the inherent vowel is typically not pronounced at the end of a word, eg. ઘર. The inherent vowel is still dropped when the root word is followed by suffixes or in compounds, eg. ઘરપર, and ઘરકામ. w
The inherent vowel may also be elided when combining morphemes, eg. the root પકડ઼ is written the same even when inflected as પકડ઼ે even though the pronunciation is pakṛe. w
In other cases, the inherent vowel is simply missing, eg. વરસાદ is pronounced varsaːd. w
Gujarati can also use ્ [U+0ACD GUJARATI SIGN VIRAMA] (called halant in Gujarati) to explicitly kill the inherent vowel after a consonant. The virama is rarely seen. Apart from the situations just mentioned, the virama is also usually hidden when part of a consonant cluster (see clusters).
The virama is visible, however, if it isn't followed by a consonant, eg. ક્ k͓ explicitly represents just the sound k.
Gujarati represents standalone vowels using a set of independent vowel letters. The set includes a character to represent the inherent vowel sound.
Visually, several of the standalone vowels and some vowel-signs look as it they could be composed of smaller parts, juxtaposed. For example, આ [U+0A86 GUJARATI LETTER AA] looks like it could be composed of અ + ા [U+0A85 GUJARATI LETTER A + U+0ABE GUJARATI VOWEL SIGN AA]. Similarly, ો [U+0ACB GUJARATI VOWEL SIGN O] looks like ા + ે [U+0ABE GUJARATI VOWEL SIGN AA + U+0AC7 GUJARATI VOWEL SIGN E].
These compositions and decompositions do not exist in Unicode normalisation forms, and the Unicode Standard requires the use of single codepoints rather than sequences in all cases.
In Gujarati, vocalics are available both as vowel-signs and independent vowels. According to CLDR, only two of the vocalics are regularly used for the Gujarati language.
The other vocalics in the Gujarati block are:
Click on the sounds to see where else in the document they are referred to.
Sourcewp. Phones in a lighter colour are non-native or allophones.
The Unicode Gujarati block provides mechanisms for extending the basic set of consonants, in particular for the transcription of Arabic and Avestan.
A set of combining characters for the range 0AFA..0AFF was added to the block in Unicode v10 to allow representation of Arabic sounds. For more details, see the Unicode Standard. →u480
઼ [U+0ABC GUJARATI SIGN NUKTA] is used to transliterate sounds in Avestan found in the texts of the Zoroastrians, who fled to Gujarat from Persia and are known as Parsis. They include the following;
The Gujarati block also has an additional consonant character to represent the letter ʒ in those transliterations.
For more information on this and other aspects of Avestan transliteration, see Proposal to encode Gujarati Letter ZHA.
Two combining characters can follow a consonant or vowel to produce a final consonant sound in a phonetic syllable.
ં [U+0A82 GUJARATI SIGN ANUSVARA] nasalises the vowel in a syllable (see nasalisation), or represents a homorganic nasal before a plosive.
ઃ [U+0A83 GUJARATI SIGN VISARGA] is a rarely used and silent hangover from Sanskrit, representing a final h.
The absence of a vowel sound between two or more consonants can be indicated visually in the following ways.
One slightly unusual aspect of Gujarati is that it sometimes borrows Devanagari glyph shapes for one or more components of a conjunct (though the characters are still the normal Gujarati ones).
In all cases except the last, the underlying mechanism in terms of codepoints involves adding ્ [U+0ACD GUJARATI SIGN VIRAMA] between the consonants in the cluster, eg. શ્ચ is produced by the sequence શ + ્ + ચ [U+0AB6 GUJARATI LETTER SHA + U+0ACD GUJARATI SIGN VIRAMA + U+0A9A GUJARATI LETTER CA].
The font usually determines which visual method is used, although it is possible to influence this (see below).
A half-form is typically created by removing the vertical line in the consonant shape, where there is one. (The vertical line is associated with the inherent vowel, and around two-thirds of Gujarati consonant shapes contain one.) There is often some additional tweaking of glyphs in order to join the components neatly.
Vertical combinations are particularly common for gemination.
Some vertical combinations hang the subjoined second consonant from the bottom-left corner.
Certain clusters use devanagari shapes for one or more of the consonants participating in the conjunct. This is a reminder that conjuncts are to a large extent a legacy of Sanskrit text.
Other clusters combine components into special ligated forms, often in a way that makes it difficult to spot the component parts.
Other clusters, particularly where there is no vertical stroke in the preceding consonant, move the components closer together, without major shape changes, so that they touch.
When ra follows another consonant, it is typically rendered as a small, diagonal line pointing downwards to the left, eg. ક્ર ગ્ર ભ્ર હ્ર શ્ર. After 5 consonants, however, it is rendered as an upside-down v shape below, ie. ટ્ર ઠ્ર ડ્ર ઢ્ર દ્ર. After ત it produces ત્ર.
When ra precedes another consonant, it is rendered as a small hook above the following consonant, or a following vertical line in the cluster, eg. ર્ક r͓k and ર્સ r͓s. Where it precedes a cluster, it is aligned with the vertical line of the trailing consonant, eg. ર્સ્પ r͓s͓p. However, if there is a spacing vowel-sign with a vertical line to the right of the cluster, it aligns with that, eg. ર્કા r͓kə, and ર્કી r͓kī. (This illustrates how the basic units of the script are orthographic syllables.)
The ability to form conjuncts depends on the richness of the font. Where a font is not able to produce a half-form or ligature, etc., it will leave a visible virama glyph below the initial consonant(s) to indicate the missing vowel sound, eg. ટ્બ ʈ͓b.
The examples in the previous subsection used U+200C ZERO WIDTH NON-JOINER to force the production of a visible virama, rather than a half-form. For example, થ + ્ + ZWNJ + થ [U+0AA5 GUJARATI LETTER THA + U+0ACD GUJARATI SIGN VIRAMA + U+200C ZERO WIDTH NON-JOINER + U+0AA5 GUJARATI LETTER THA] produces થ્થ, rather than થ્થ.
U+200D ZERO WIDTH JOINER can be used to produce a half-form, such as સ્ચ, rather than શ્ચ. It can also be used to produce standalone half-forms (for educational text) such as સ્.
In addition to the consonants and vowel letters already mentioned, the Devanagari block contains the following letters, which CLDR lists as needed for writing the Gujarati language.
ઽ [U+0ABD GUJARATI SIGN AVAGRAHA] is used to indicate elision when writing Sanskrit.
ૐ [U+0AD0 GUJARATI OM] is used for religious texts.
In addition to the vowel-signs and vocalics already mentioned, the Devanagari block contains the following diacritics, which are also introduced in the following sections, respectively, extensions, and absence.
ઁ [U+0A81 GUJARATI SIGN CANDRABINDU] and ં [U+0A82 GUJARATI SIGN ANUSVARA] produce nasalisation (see nasalisation), and anusvara can also represent a final consonant, as does ઃ [U+0A83 GUJARATI SIGN VISARGA] (see finals).
્ [U+0ACD GUJARATI SIGN VIRAMA] kills the inherent vowel and creates conjuncts (see absence).
઼ [U+0ABC GUJARATI SIGN NUKTA] and the set of combining characters just below are used to extend the consonant repertoire (see extensions).
Gujurati uses western punctuation, including the following non-ASCII characters.
CLDR also lists the following non-ASCII characters.
The Unicode Gujarati block only contains one characters with the general property of punctuation. It is used for abbreviations (see abbrev). CLDR for Gujarati lists this as auxiliary.
Gujurati sometimes uses the dandas from the Unicode Devanagari block (see phrase). CLDR for Gujarati doesn't mention these.
The only character in the Unicode Gujarati block with the symbol property is the Gujarati rupee sign.
This can also be written using ordinary characters, ie. રૂ૰, which is short for રૂપિયો. Note that rupee can also be abbreviated using a period, ie. રૂ..
There is a set of Gujarati digits, and they are used in the same way as Latin digits.
You can experiment with examples using the Gujarati character app.
Words are separated by spaces.
Gujarati uses standard western punctuation, but may also use the Devanagari version of a full stop, । [U+0964 DEVANAGARI DANDA] (although an ASCII full stop is seen in the sample text above.)
For boundaries of text above the sentence level there is ॥ [U+0965 DEVANAGARI DOUBLE DANDA].
૰ [U+0AF0 GUJARATI ABBREVIATION SIGN] is a commonly-used character in Gujarati and appears in printed materials. It is used to write abbreviations of words in Gujarati, eg. ડોક્ટર can be abbreviated as ડો૰. The Latin full stop is used interchangably with this character, eg. ડો..
Generally, Gujarati text breaks on the spaces between words.
Characters used for the thisLanguage language have the following assignments related to line-break properties.
|AL||102||૰ ઇ ઈ ઊ ઉ એ ઓ અ ઑ ઍ આ ઔ ઐ ઋ ૠ પ બ ભ ત થ દ ધ ટ ઠ ડ ઢ ક ખ ગ ઘ ચ છ જ ઝ ફ સ શ ષ હ મ ન ઞ ણ ઙ વ ર લ ળ ય ઽ ૐ|
|CM||36||િ ી ુ ૂ ે ો ૉ ૅ ા ૈ ૌ ૃ ૄ ઁ ં ઃ ઼ ્|
|NU||20||૦ ૧ ૨ ૩ ૪ ૫ ૬ ૭ ૮ ૯|
|QU||8||‘ ’ “ ”|
AL (ordinary alphabetic and symbol characters) requires other characters to provide break opportunities; otherwise, unless tailored rules are applied, no line breaks are allowed between pairs of them.
BA (break after) indicates that it is normal to break after that character.
CM (combining mark) takes on the behaviour of its base character.
NU (number) behaves like ordinary characters (AL) in the context of most characters but activate the prefix and postfix behavior of prefix and postfix characters.
PR (numeric prefix) may not be separated from following numeric characters or following opening characters, even if a space character intervenes. For example, there is no break opportunity in “฿ (100.00)”.
QU (quotation) characters can be opening or closing, or even both, depending on usage. The default is to treat them as both opening and closing.
Ready-made Counter Styles lists one, numeric, counter style for use with the Gujarati language. You can experiment with these styles using the Counter styles converter.
The gujarati numeric style is decimal-based and uses the digits shown below.
The Gujarati script characters in Unicode 11.0 are in the following block:
The modern Gujarati orthography described here uses characters from the following Unicode blocks.
The infrequently used characters come from these blocks.
See also the Character usage lookup page, and the Script Comparison Table.
According to ScriptSource, the Gujarati script is used for the following languages: