Updated 14 January, 2020
This page gathers together basic information about the Gujarati script and its use for the Gujarati language. It aims (generally) to provide an overview of the orthography and typographic features, and (specifically) to advise how to write Gujarati using Unicode; for greater details follow the footnote links (especially those with an arrow alongside them).
For similar information related to other scripts, see the script links pages.
Clicking on red text examples, or highlighting part of the sample text shows a list of characters. Click on the vertical blue bar (bottom right) to change font settings for the sample text. Colours and annotations on panels listing characters are relevant to their use for the Gujarati language.
Unless in parentheses, the transcriptions in italics that follow Tamil text are a transliteration developed for these pages. Those in parentheses are usually ISO transcriptions. Transcriptions in ⌈ brackets ⌋ may be phonemic or phonetic.
અનુચ્છેદ ૧: પ્રતિષ્ઠા અને અધિકારોની દૃષ્ટિએ સર્વ માનવો જન્મથી સ્વતંત્ર અને સમાન હોય છે. તેમનામાં વિચારશક્તિ અને અંતઃકરણ હોય છે અને તેમણે પરસ્પર બંધુત્વની ભાવનાથી વર્તવું જોઇએ.
અનુચ્છેદ ૨: દરેક વ્યક્તિને જાતિ, રંગ, લિંગ, ભાષા, ધર્મે, રાજકીય અથવા બીજા અભિપ્રાય, રાષ્ટ્રીય અથવા સામાજિક ઉદ્ભવસ્થાન, મિલકત, જન્મ અથવા મોભા જેવા કોઇપણ જાતના ભેદભાવ વગર આ ધોષણામાં રજૂ કરવામાં આવેલા સધળા અધિકારો અને સ્વતંત્રતા ભોગવવાનો હક્ક છે. વધુમાં કોઇપણ વ્યક્તિ તે સ્વતંત્ર, ટ્રસ્ટ હેઠળના સ્વશાસન હેઠળ ન હોય તેવા અથવા સાર્વભામત્વની બીજી કોઇપણ મર્યાદા હેઠળ આવેલા દેશ અથવા પ્રદેશની હોય તો પણ રાજકીય, હફમવવિષયક અથવા આંતરરાષ્ટ્રીય મોભાના ધોરણે તેની સાથે કોઇપણ ભેદભાવ રાખવામાં આવશે નહિ.
The Gujarati script is used for writing the Gujarati and Chodri languages, together spoken by almost 47 million people. It is also used alongside the Devanagari script for writing a number of languages used by the Bhil people, one of India's largest indigenous groups. The script is related to Devanagari, with modifications to some of the letters, and without the headstroke which characterizes most of the Nagari scripts. The loss of the headstroke reflects the script's origins in informal writing; until the mid-19th century it was used primarily for bookkeeping and personal correspondence, but since printing facilities have become widely available to Gujarati speakers the script is used in schools, for printing books and newspapers, in government offices and public signage, and is one of the official scripts of India.
The Gujarati script was adapted from the Devanagari script to write the Gujarati language. Gujarati language and script developed in three distinct phases — 10th to 15th century, 15th to 17th century and 17th to 19th century. The first phase is marked by use of Prakrit, Apabramsa and its variants such as Paisaci, Shauraseni, Magadhi and Maharashtri. In second phase, Old Gujarati script was in wide use. The earliest known document in the Old Gujarati script is a handwritten manuscript Adi Parva dating from 1591–92, and the script first appeared in print in a 1797 advertisement. The third phase is the use of script developed for ease and fast writing. The use of shirorekha (the topline as in Sanskrit) was abandoned. Until the 19th century it was used mainly for writing letters and keeping accounts, while the Devanagari script was used for literature and academic writings. It is also known as the śarāphī (banker's), vāṇiāśāī (merchant's) or mahājanī (trader's) script. This script became basis for modern script. Later the same script was adopted by writers of manuscripts. Jain community also promoted its use for copying religious texts by hired writers.
The script is an abugida. Consonants carry an inherent vowel which can be modified by appending vowel-signs to the consonant. See the table to the right for a brief overview of features, taken from the Script Comparison Table.
The following list describes some distinctive characteristics of the Gujarati script.
The absence of the inherent vowel is not always indicated. It is generally not pronounced at the end of a word, but also it is sometimes elided without indications within a word.
Some vowel-signs can be visually analysed into subcomponents, but Unicode usage involves only one combining character per consonant.
The consonant repertoire can be extended for transliteration of Arabic and Avestan texts.
Final consonant combining characters include the anusvara and visarga.
Gujarati text is written horizontally, left to right.
Click on the sound groups to see where else in the document each of the sounds are referred to.
Sourcewp. Phones in a lighter colour are non-native or allophones.
The inherent vowel is usually transcribed as a and pronounced ə. So ક is pronounced kə.
Other than the inherent vowel, vowel sounds that follow a consonant sound are represented using vowel-signs, eg. કી kī.
Gujarati vowel-signs are all combining characters. A single Unicode character is used per base consonant, and there are no vowel-signs with multiple parts. All vowel-signs are typed and stored after the base consonant, whether or not they precede it when displayed, and the font puts them in the correct place for display.
About half of the vowel-signs are spacing marks, meaning that they consume horizontal space when added to a base consonant.
There are separate vowel-signs for the historical short and long variants, but length is no longer distinctive in modern pronunciation. It is only found in metrical structures of verse.w
ૅ [U+0AC5 GUJARATI VOWEL SIGN CANDRA E] and ૉ [U+0AC9 GUJARATI VOWEL SIGN CANDRA O] are used to represent the English æ and ɔ sounds, respectively.w
See also vocalics.
ં [U+0A82 GUJARATI SIGN ANUSVARA] nasalises the vowel in a syllable, eg. મેં me˜ I.
The anusvara may also represent a nasal before a plosive.
The following list shows where vowel-signs are positioned around a base consonant to produce vowels, and how many instances of that pattern there are.
The inherent vowel is not always pronounced, even if there is no visual indication of its absence.
For example, the inherent vowel is typically not pronounced at the end of a word, eg. ઘર gʰr ɡar house. The inherent vowel is still dropped when the root word is followed by suffixes or in compounds, eg. ઘરપર gʰrpr ɡarpar on the house, and ઘરકામ gʰrkəm ɡarkəm housework. w
The inherent vowel may also be elided when combining morphemes, eg. the root પકડ઼ pkɖ̣ (pakaṛ) hold is written the same even when inflected as પકડ઼ે pkɖ̣e (pakaṛe) holds even though the pronunciation is pakṛe. w
In other cases, the inherent vowel is simply missing, eg. વરસાદ ʋrsəd (varasād) rain is pronounced varsaːd. w
Gujarati can also use ્ [U+0ACD GUJARATI SIGN VIRAMA] (called halant in Gujarati) to explicitly kill the inherent vowel after a consonant. The virama is rarely seen. Apart from the situations just mentioned, the virama is also usually hidden when part of a consonant cluster (see clusters).
The virama is visible, however, if it isn't followed by a consonant, eg. ક્ k͓ explicitly represents just the sound k.
Gujarati represents standalone vowels using a set of independent vowel letters. The set includes a character to represent the inherent vowel sound.
Visually, several of the standalone vowels and some vowel-signs look as it they could be composed of smaller parts, juxtaposed. For example, આ [U+0A86 GUJARATI LETTER AA] looks like it could be composed of અ + ા [U+0A85 GUJARATI LETTER A + U+0ABE GUJARATI VOWEL SIGN AA]. Similarly, ો [U+0ACB GUJARATI VOWEL SIGN O] looks like ા + ે [U+0ABE GUJARATI VOWEL SIGN AA + U+0AC7 GUJARATI VOWEL SIGN E].
These compositions and decompositions do not exist in Unicode normalisation forms, and the Unicode Standard requires the use of single codepoints rather than sequences in all cases.
In Gujarati, vocalics are available both as vowel-signs and independent vowels. According to CLDR, only two of the vocalics are regularly used for the Gujarati language.
The other vocalics in the Gujarati block are:
Click on the sounds to see where else in the document they are referred to.
Sourcewp. Phones in a lighter colour are non-native or allophones.
The Unicode Gujarati block provides mechanisms for extending the basic set of consonants, in particular for the transcription of Arabic and Avestan.
A set of combining characters for the range 0AFA..0AFF was added to the block in Unicode v10 to allow representation of Arabic sounds. For more details, see the Unicode Standard. →u480
઼ [U+0ABC GUJARATI SIGN NUKTA] is used to transliterate sounds in Avestan found in the texts of the Zoroastrians, who fled to Gujarat from Persia and are known as Parsis. They include the following;
The Gujarati block also has an additional consonant character to represent the letter ʒ in those transliterations.
For more information on this and other aspects of Avestan transliteration, see Proposal to encode Gujarati Letter ZHA.
Two combining characters can follow a consonant or vowel to produce a final consonant sound in a phonetic syllable.
ં [U+0A82 GUJARATI SIGN ANUSVARA] nasalises the vowel in a syllable (see nasalisation), or represents a homorganic nasal before a plosive.
ઃ [U+0A83 GUJARATI SIGN VISARGA] is a rarely used and silent hangover from Sanskrit, representing a final h.
The absence of a vowel sound between two or more consonants can be indicated visually in the following ways.
One slightly unusual aspect of Gujarati is that it sometimes borrows Devanagari glyph shapes for one or more components of a conjunct (though the characters are still the normal Gujarati ones).
In all cases except the last, the underlying mechanism in terms of codepoints involves adding ્ [U+0ACD GUJARATI SIGN VIRAMA] between the consonants in the cluster, eg. શ્ચ is produced by the sequence શ + ્ + ચ [U+0AB6 GUJARATI LETTER SHA + U+0ACD GUJARATI SIGN VIRAMA + U+0A9A GUJARATI LETTER CA].
The font usually determines which visual method is used, although it is possible to influence this (see below).
A half-form is typically created by removing the vertical line in the consonant shape, where there is one. (The vertical line is associated with the inherent vowel, and around two-thirds of Gujarati consonant shapes contain one.) There is often some additional tweaking of glyphs in order to join the components neatly.
Vertical combinations are particularly common for gemination.
Some vertical combinations hang the subjoined second consonant from the bottom-left corner.
Certain clusters use devanagari shapes for one or more of the consonants participating in the conjunct. This is a reminder that conjuncts are to a large extent a legacy of Sanskrit text.
Other clusters combine components into special ligated forms, often in a way that makes it difficult to spot the component parts.
Other clusters, particularly where there is no vertical stroke in the preceding consonant, move the components closer together, without major shape changes, so that they touch.
When ra follows another consonant, it is typically rendered as a small, diagonal line pointing downwards to the left, eg. ક્ર ગ્ર ભ્ર હ્ર શ્ર. After 5 consonants, however, it is rendered as an upside-down v shape below, ie. ટ્ર ઠ્ર ડ્ર ઢ્ર દ્ર. After ત it produces ત્ર.
When ra precedes another consonant, it is rendered as a small hook above the following consonant, or a following vertical line in the cluster, eg. ર્ક r͓k and ર્સ r͓s. Where it precedes a cluster, it is aligned with the vertical line of the trailing consonant, eg. ર્સ્પ r͓s͓p. However, if there is a spacing vowel-sign with a vertical line to the right of the cluster, it aligns with that, eg. ર્કા r͓kə, and ર્કી r͓kī. (This illustrates how the basic units of the script are orthographic syllables.)
The ability to form conjuncts depends on the richness of the font. Where a font is not able to produce a half-form or ligature, etc., it will leave a visible virama glyph below the initial consonant(s) to indicate the missing vowel sound, eg. ટ્બ ʈ͓b.
The examples in the previous subsection used U+200C ZERO WIDTH NON-JOINER to force the production of a visible virama, rather than a half-form. For example, થ + ્ + ZWNJ + થ [U+0AA5 GUJARATI LETTER THA + U+0ACD GUJARATI SIGN VIRAMA + U+200C ZERO WIDTH NON-JOINER + U+0AA5 GUJARATI LETTER THA] produces થ્થ, rather than થ્થ.
U+200D ZERO WIDTH JOINER can be used to produce a half-form, such as સ્ચ, rather than શ્ચ. It can also be used to produce standalone half-forms (for educational text) such as સ્.
In addition to the consonants and vowel letters already mentioned, the Devanagari block contains the following letters, which CLDR lists as needed for writing the Gujarati language.
ઽ [U+0ABD GUJARATI SIGN AVAGRAHA] is used to indicate elision when writing Sanskrit.
ૐ [U+0AD0 GUJARATI OM] is used for religious texts.
In addition to the vowel-signs and vocalics already mentioned, the Devanagari block contains the following diacritics, which are also introduced in the following sections, respectively, extensions, and absence.
ઁ [U+0A81 GUJARATI SIGN CANDRABINDU] and ં [U+0A82 GUJARATI SIGN ANUSVARA] produce nasalisation (see nasalisation), and anusvara can also represent a final consonant, as does ઃ [U+0A83 GUJARATI SIGN VISARGA] (see finals).
્ [U+0ACD GUJARATI SIGN VIRAMA] kills the inherent vowel and creates conjuncts (see absence).
઼ [U+0ABC GUJARATI SIGN NUKTA] and the set of combining characters just below are used to extend the consonant repertoire (see extensions).
Gujurati uses western punctuation, including the following non-ASCII characters.
CLDR also lists the following non-ASCII characters.
The Unicode Gujarati block only contains one characters with the general property of punctuation. It is used for abbreviations (see abbrev). CLDR for Gujarati lists this as auxiliary.
Gujurati sometimes uses the dandas from the Unicode Devanagari block (see phrase). CLDR for Gujarati doesn't mention these.
The only character in the Unicode Gujarati block with the symbol property is the Gujarati rupee sign.
This can also be written using ordinary characters, ie. રૂ૰, which is short for રૂપિયો rūpiyo rupee. Note that rupee can also be abbreviated using a period, ie. રૂ..
There is a set of Gujarati digits, and they are used in the same way as Latin digits.
You can experiment with examples using the Gujarati picker.
Are special glyph forms needed, depending on the context in which a character is used? Do glyphs interact in some circumstances?
Are there requirements to position diacritics or other items specially, depending on context? Does the script have multiple diacritics competing for the same location relative to the base?
Does the script have special requirements for baseline alignment between mixed scripts and in general?
Are italicisation, bolding, oblique, etc relevant? Do italic fonts lean in the right direction? Is synthesised italicisation problematic? Are there other problems relating to bolding or italicisation - perhaps relating to generalised assumptions of applicability?
If the script is bicameral, are the special rules about case conversion? Are there other correspondences between glyphs, such as half- vs fullwidth presentation forms?
Do Unicode grapheme clusters appropriately segment character units for the script? Are there special requirements when double-clicking on the text, or moving through the text with the cursor, or backspace, etc.?
Are there special requirements when double-clicking on the text? Are words hyphenated?
Words are separated by spaces.
What characters are used to indicate the boundaries of phrases, sentences, and sections?
Gujarati uses standard western punctuation, but may also use the Devanagari version of a full stop, । [U+0964 DEVANAGARI DANDA] (although an ASCII full stop is seen in the sample text above.)
For boundaries of text above the sentence level there is ॥ [U+0965 DEVANAGARI DOUBLE DANDA].
What characters are used as parentheses, or to bracket information?
What characters are used to indicate quotations? Do quotations within quotations use different characters? What characters are used to indicate dialogue?
What characters are used to indicate abbreviation, ellipsis & repetition?
૰ [U+0AF0 GUJARATI ABBREVIATION SIGN] is a commonly-used character in Gujarati and appears in printed materials. It is used to write abbreviations of words in Gujarati, eg. ડોક્ટર ɖok͓ʈr (dokṭar) doctor can be abbreviated as ડો૰. The Latin full stop is used interchangably with this character, eg. ડો..
How are emphasis and highlighting achieved? If lines are drawn alongside, over or through the text, do they need to be a special distance from the text itself? Is it important to skip characters when underlining, etc? How do things change for vertically set text?
What mechanisms, if any, are used to create inline notes and annotations? (For referent-type notes such as footnotes, see below.)
Are there special rules about the way text wraps when it hits the end of a line? Does line-breaking wrap whole 'words' at a time, or characters, or something else (such as syllables in Tibetan and Javanese)? What characters should not appear at the end or start of a line, and what should be done to prevent that?
Generally, Gujarati text breaks on the spaces between words.
Character properties. Characters used for the thisLanguage language have the following assignments related to line-break properties.
|AL||102||૰ ઇ ઈ ઊ ઉ એ ઓ અ ઑ ઍ આ ઔ ઐ ઋ ૠ પ બ ભ ત થ દ ધ ટ ઠ ડ ઢ ક ખ ગ ઘ ચ છ જ ઝ ફ સ શ ષ હ મ ન ઞ ણ ઙ વ ર લ ળ ય ઽ ૐ|
|CM||36||િ ી ુ ૂ ે ો ૉ ૅ ા ૈ ૌ ૃ ૄ ઁ ં ઃ ઼ ્|
|NU||20||૦ ૧ ૨ ૩ ૪ ૫ ૬ ૭ ૮ ૯|
|QU||8||‘ ’ “ ”|
AL (ordinary alphabetic and symbol characters) requires other characters to provide break opportunities; otherwise, unless tailored rules are applied, no line breaks are allowed between pairs of them.
BA (break after) indicates that it is normal to break after that character.
CM (combining mark) takes on the behaviour of its base character.
NU (number) behaves like ordinary characters (AL) in the context of most characters but activate the prefix and postfix behavior of prefix and postfix characters.
PR (numeric prefix) may not be separated from following numeric characters or following opening characters, even if a space character intervenes. For example, there is no break opportunity in “฿ (100.00)”.
QU (quotation) characters can be opening or closing, or even both, depending on usage. The default is to treat them as both opening and closing.
Is hyphenation used, or something else?
Does text in a paragraph needs to have flush lines down both sides? Does the script need assistance to conform to a grid pattern? Does the script allow punctuation to hang outside the text box at the start or end of a line? Where adjustments are need to make a line flush, how is that done? Does the script shrink/stretch space between words and/or letters? Are word baselines stretched, as in Arabic? What about paragraph indents?
Does the script create emphasis or other effects by spacing out the words, letters or syllables in a word? (For justification related spacing, see above.).
Are there list or other counter styles in use? If so, what is the format used? Do counters need to be upright in vertical text? Are there other aspects related to counters and lists that need to be addressed?
Ready-made Counter Styles lists one, numeric, counter style for use with the Gujarati language. You can experiment with these styles using the Counter styles converter.
The gujarati numeric style is decimal-based and uses the digits shown below.
Does the script use special styling of the initial letter of a line or paragraph, such as for drop caps or similar? How about the size relationship between the large letter and the lines alongide? where does the large letter anchor relative to the lines alongside? is it normal to include initial quote marks in the large letter? is the large letter really a syllable? etc.
How are the main text area and ancilliary areas positioned and defined? Are there any special requirements here, such as dimensions in characters for the Japanese kihon hanmen? The book cover for scripts that are read right-to-left scripts is on the right of the spine, rather than the left. When content can flow vertically and to the left or right, how to specify the location of objects, text, etc. relative to the flow? Do tables and grid layouts work as expected? How do columns work in vertical text? Can you mix block of vertical and horizontal text? Does text scroll in a different direction?
Does the script have special requirements for character grids or tables?
Does the script have special requirements for notes, footnotes, endnotes or other necessary annotations of this kind? (There is a section above for purely inline annotations, such as ruby or warichu. This section is more about annotation systems that separate the reference marks and the content of the notes.)
Are vertical form controls needed? Are scroll bars in an unusual position? Other special requirements for user interaction?
Are there special conventions for page numbering, or the way that running headers and the like are handled?
The Gujarati script characters in Unicode 11.0 are in the following block:
The following links give information about characters used for languages associated with this script. The numbers in parentheses are for non-ASCII characters.
For character-specific details see the Gujarati character notes.
According to ScriptSource, the Gujarati script is used for the following languages: