Punjabi notes (draft)
Gurmukhi

Updated 24 January, 2023

This page brings together basic information about the Gurmukhi script and its use for the Punjabi language. It aims to provide a brief, descriptive summary of the modern, printed orthography and typographic features, and to advise how to write Punjabi using Unicode.

Sample

Select part of this sample text to show a list of characters, with links to more details. Source
Change size:   28px

ਆਰਟੀਕਲ: 1 ਸਾਰਾ ਮਨੁੱਖੀ ਪਰਿਵਾਰ ਆਪਣੀ ਮਹਿਮਾ, ਸ਼ਾਨ ਅਤੇ ਹੱਕਾਂ ਦੇ ਪੱਖੋਂ ਜਨਮ ਤੋਂ ਹੀ ਆਜ਼ਾਦ ਹੈ ਅਤੇ ਸੁਤੇ ਸਿੱਧ ਸਾਰੇ ਲੋਕ ਬਰਾਬਰ ਹਨ । ਉਨ੍ਹਾਂ ਸਭਨਾ ਨੂੰ ਤਰਕ ਅਤੇ ਜ਼ਮੀਰ ਦੀ ਸੌਗਾਤ ਮਿਲੀ ਹੋਈ ਹੈ ਅਤੇ ਉਨ੍ਹਾਂ ਨੂੰ ਭਰਾਤਰੀਭਾਵ ਦੀ ਭਾਵਨਾ ਰਖਦਿਆਂ ਆਪਸ ਵਿਚ ਵਿਚਰਣਾ ਚਾਹੀਦਾ ਹੈ ।

ਆਰਟੀਕਲ: 2 ਹਰੇਕ ਵਿਅਕਤੀ ਨੂੰ ਭਾਵੇਂ ਉਸ ਦੀ ਕੋਈ ਨਸਲ, ਰੰਗ, ਲਿੰਗ, ਭਾਸ਼ਾ, ਧਰਮ, ਰਾਜਨੀਤਕ ਵਿਚਾਰਧਾਰਾ ਜਾਂ ਕੋਈ ਹੋਰ ਵਿਚਾਰਧਾਰਾ ਹੋਏ, ਭਾਵੇਂ ਉਸ ਦੀ ਕੋਈ ਵੀ ਜਾਇਦਾਦ ਹੋਵੇ ਅਤੇ ਭਾਵੇਂ ਉਸ ਦਾ ਕਿਤੇ ਵੀ ਜਨਮ ਹੋਇਆ ਹੋਵੇ ਤੇ ਉਸਦਾ ਕੋਈ ਵੀ ਰੁਤਬਾ ਹੋਵੇ, ਉਹ ਐਲਾਨਨਾਮੇ ਵਿਚ ਮਿਲੇ ਅਧਿਕਾਰਾਂ ਤੇ ਆਜ਼ਾਦੀਆਂ ਨੂੰ ਪ੍ਰਾਪਤ ਕਰਨ ਦਾ ਹੱਕ ਰਖਦਾ ਹੈ । ਇਸ ਤੋਂ ਵੀ ਅੱਗੇ ਇਸ ਗੱਲ ਦਾ ਕੋਈ ਭੇਦ ਭਾਵ ਨਹੀਂ ਰਖਿਆ ਜਾਏਗਾ ਕਿ ਉਹ ਵਿਅਕਤੀ ਕਿਹੜੇ ਮੁਲਕ ਦਾ ਹੈ ਅਤੇ ਉਸ ਮੁਲਕ ਦਾ ਅੰਤਰਰਾਸ਼ਟਰੀ ਰੁਤਬਾ ਕਿਹੋ ਜਿਹਾ ਹੈ । ਇਸ ਗੱਲ ਦਾ ਵੀ ਖਿਆਲ ਨਹੀਂ ਰਖਿਆ ਜਾਏਗਾ ਕਿ ਉਹ ਵਿਅਕਤੀ ਕਿਸੇ ਆਜ਼ਾਦ ਮੁਲਕ ਦਾ ਹੈ, ਜਾਂ ਉਹ ਮੁਲਕ ਕਿਸੇ ਟਰੱਸਟ ਅਧੀਨ ਹੈ ਜਾਂ ਉਸ ਦਾ ਆਪਣਾ ਸਵੈਸ਼ਾਸਨ ਨਹੀਂ ਅਤੇ ਜਾਂ ਉਹ ਕਿਸੇ ਅਜਿਹੇ ਇਲਾਕੇ ਵਿਚ ਰਹਿੰਦਾ ਹੈ ਜਿਸ ਦੀ ਪ੍ਰਭੂਸੱਤਾ ਸੀਮਤ ਹੈ ।

Usage & history

The Gurmukhi script is used in the Punjab in India, where it is the official script of the Punjabi language. The original Sikh scriptures and most of the historic Sikh literature were written in the Gurmukhi script.

Muslim speakers of Punjabi in Pakistan use a Persian version of the Arabic script (called shahmukhi).

ਗੁਰਮੁਖੀ

The current form of Gurmukhi was developed in the 16th century by Guru Angad, successor to the founder of the Sikh religion, Guru Nanak. It's roots lie in the historical Brahmi script.

Sources: Scriptsource and Wikipedia.

Basic features

The Gurmukhi script is an abugida, ie. consonants carry an inherent vowel sound that is overridden using vowel signs. See the table to the right for a brief overview of features for the modern Panjabi orthography.

Gurmukhi text runs left to right in horizontal lines.

Words are separated by spaces.

Gujarati uses 32 consonant letters. The repertoire can be extended by applying the nukta diacritic to 5 characters, to represent foreign sounds, particularly for words from Persian. ❯ consonants

A final h can be indicated using the visarga, but otherwise final consonants are written using ordinary characters. ❯ finals

Although consonant clusters are frequent, there are very few conjuncts, mostly just r and h, which are subjoined. This leads to difficulties for automatic transcription. ❯ clusters

Consonant gemination is indicated, unusually for an Indian script, by a special diacritic that appears before the letter being lengthened. ❯ gemination

The Punjabi orthography has an inherent vowel, and represents other vowels using 9 vowel signs, including 1 pre-base vowel and no circumgraphs. All vowel signs are combining marks, and are stored after the base character. The inherent vowel is usually not pronounced at the end of a word, however there is often a ghost . ❯ vowels

There are 10 independent vowels, one for each vowel sound, including the inherent vowel, and these are used to write all standalone vowel sounds. There are no unique shapes for independent vowels. Instead vowel signs are added to one of three consonants that are used only as vowel carriers, however Unicode provides separate code points for all the combinations and deprecates the use of 2 of the carriers. ❯ standalone

There are two diacritics for nasalisation, tippi and bindi, each used in different phonetic contexts. ❯ nasalisation

Punjabi is a tonal language. Tones are normally indicated by the use of certain consonants, rather than diacritics. ❯ tones

Gurmukhi has its own set of native digits, however modern text tends to use ASCII digits. ❯ numbers

Punctuation is mostly western, but dandas are used for sentence and verse final punctuation.

Character index

Letters

Show

Basic consonants

ਪ␣ਭ␣ਬ␣ਫ␣ਤ␣ਧ␣ਦ␣ਥ␣ਚ␣ਝ␣ਜ␣ਛ␣ਟ␣ਢ␣ਡ␣ਠ␣ਕ␣ਘ␣ਗ␣ਖ␣ਵ␣ਸ␣ਹ␣ਮ␣ਨ␣ਞ␣ਣ␣ਙ␣ਯ␣ਰ␣ੜ␣ਲ

Extended consonants

ਫ਼␣ਜ਼␣ਸ਼␣ਖ਼␣ਗ਼␣ਲ਼

Vowels

ਈ␣ਊ␣ਇ␣ਉ␣ਏ␣ਓ␣ਅ␣ਐ␣ਔ␣ਆ

Other

Not used for modern Punjabi

ੲ␣ੳ

Combining marks

Show

Vowels

ੀ␣ੂ␣ਿ␣ੁ␣ੇ␣ੋ␣ੈ␣ੌ␣ਾ

Bindu

ੰ␣ਂ

Gemination

Medial

Nukta

Virama

Visarga

Not used for modern Punjabi

Numbers

Show
੦␣੧␣੨␣੩␣੪␣੫␣੬␣੭␣੮␣੯

Punctuation

Show
‘␣’␣“␣”

ASCII

(␣)␣,␣.␣:␣;␣?␣!

CLDR additions

–␣—␣′␣″

Symbols

Show

Other

Show
‌␣‍
Items to show in lists

Phonology

These are sounds of the Punjabi language.

Click on the sounds to reveal locations in this document where they are mentioned.

Phones in a lighter colour are non-native or allophones. Source Wikipedia.

Vowel sounds

ɪ ʊ ə ə ɛː ɛː ɔː ɔː äː äː

Consonant sounds

labial dental alveolar post-
alveolar
retroflex palatal velar glottal
stops p b t d     ʈ ɖ   k ɡ  
aspirated     ʈʰ    
affricates       t͡ʃ d͡ʒ        
aspirated       t͡ʃʰ        
fricatives f v   s z ʃ     x ɣ ɦ
nasals m   n   ɳ ɲ ŋ
approximants ʋ~w   l   ɭ j  
trills/flaps     ɾ   ɽ

Note that Punjabi has no voiced aspirated stops. The letters for these sounds do exist in Gurmukhi, but are redirected to indicate tone.

Structure

The following summary is from Wikipediawl.

The three retroflex consonants /ɳ, ɽ, ɭ/ do not occur initially, and the nasals /ŋ, ɲ/ occur only as allophones of /n/ in clusters with velars and palatals.

The well-established phoneme /ʃ/ may be realised allophonically as the voiceless retroflex fricative /ʂ/ in learned clusters with retroflexes.

The phonemic status of the fricatives /f, z, x, ɣ/ varies with familiarity with Hindustani norms, more so with the Gurmukhi script, with the pairs /f, pʰ/, /z, d͡ʒ/, /x, kʰ/, and /ɣ, g/ systematically distinguished in educated speech.

The retroflex lateral is most commonly analysed as an approximant as opposed to a flap.

Gurmukhi uses spaces to separate text into words. The inherent vowel is usually not pronounced at the end of a word, however there is often a ghost sound , eg.

ਜਾਲ਼

Gurmukhi tends to use independent vowels rather than semi-vowels for sequences of vowel sounds, eg.

ਓਹਾਇਓ

Tone

Punjabi is a tonal language with three tones: high (transcribed as á), low (transcribed as à), and level (not transcribed). The tones cover one or two syllables.d

About 75% of words have a level tone.wl,#Tone

Sometimes these are described as contour tones: high rising falling, and low rising.

In some respects there appears to be a lack of clarity about the fine detail of how the tonal system works.b

Vowels

Inherent vowel

U+0A15 GURMUKHI LETTER KA

ə following a consonant is not written, but is seen as an inherent part of the consonant letter, so is written by simply using the consonant letter. The sound is transcribed as a.

Other vowels

Non-inherent vowel sounds that follow a consonant are represented using vowel signs.

Gujarati vowel signs are all combining characters. A single Unicode character is used per base consonant, and there are no circumgraphs. All vowel signs are typed and stored after the base consonant, whether or not they precede it when displayed, and the rendering process puts them in the correct place for display.

An orthography that uses vowel signs is different from one that uses simple diacritics or letters for vowels, in that the vowel signs are generally rendered relative to an orthographic syllable, rather than just applied to the letter of the immediately preceding consonant (see prebase_vowels for an example).

Three of the vowel signs are spacing marks, meaning that they consume horizontal space when added to a base consonant.

Combining marks used for vowels

ਕੀ ki U+0A15 GURMUKHI LETTER KA + U+0A40 GURMUKHI VOWEL SIGN II

Punjabi uses the following dedicated combining marks for vowels.

ਿ␣ੀ␣ੂ␣ੁ␣ੇ␣ੋ␣ੈ␣ੌ␣ਾ

Vowels ɪ and ʊ tend to be pronounced differently in certain contexts. Followed by [U+0A39 GURMUKHI LETTER HA] they become the high tone éː and óː, respectively,d, eg.

ਕਿਹੜਾ

ਕੁਹੜਾ

The combination of an inherent vowel followed by [U+0A39 GURMUKHI LETTER HA] and then one of these 2 letters produces ɛ́ː and ɔ́ː, respectivelyd, eg.

ਕਹਿਣਾ

ਵਹੁਟੀ

Pre-base vowel sign

ਕਿ U+0A15 GURMUKHI LETTER KA + U+0A3F GURMUKHI VOWEL SIGN I

One vowel sign appears to the left of the base consonant letter or cluster, eg. ਕਿ

ਿ

This is a combining mark that is always stored after the base consonant. The rendering process places the glyph before the base consonant. Click on the following word to see the sequence of characters in storage.

ਕਹਿਣਾ

ਬਹਿਣਾ
A prebase vowel, rendered to the left of the consonant after which it is pronounced.
detailsਬਹਿਣਾ

Vowel sign placement

The following list shows where vowel signs are positioned around a base consonant to produce vowels, and how many instances of that pattern there are.

At maximum, vowel components can occur concurrently on 1 side of the base.

Standalone vowels

Gurmukhi represents standalone vowels using a set of independent vowel letters. The set includes a character to represent the inherent vowel sound.

ਈ␣ਊ␣ਇ␣ਉ␣ਏ␣ਓ␣ਅ␣ਐ␣ਔ␣ਆ

[U+0A05 GURMUKHI LETTER A] is actually classified as a null consonant with an inherent vowel.

In fact, all independent vowels in Gurmukhi are graphically a combination of one of the following three vowel carriers and a vowel sign.

ੲ␣ੳ␣ਅ

However while it's also possible to type them in this way, the Unicode Standard actually recommends that the precomposed characters be used instead. The precomposed letters don't decompose in Normalization Form D.

The use of the following characters is therefore deprecated by the Unicode Standard.

ੲ␣ੳ

Nasalisation

ੰ␣ਂ

Two separate diacritics are used to indicate nasalisation.

[U+0A70 GURMUKHI TIPPI] is used with vowels a, i, u, and with final ū, eg.

ਮੂੰਡਾ

[U+0A02 GURMUKHI SIGN BINDI] is used for all other vowels, eg.

ਸ਼ਾਂਤ

These diacritics can also signal gemination of a following m or n, eg.

ਲੰਮੀ

Note that if a tippi is used in a location where bindi is more appropriate, some fonts may silently convert the shape to a dot.

ਧੂੰਆਂ
The word for smoke contains both bindi and tippi.
details

ਧੂੰਆਂ

Vowel absence

Unlike most other indic scripts, there is generally no indication when a consonant is not pronounced with a following inherent vowel. (For the few occasions where this is made clear see clusters.) Generally speaking, the reader simply has to know whether an inherent vowel is pronounced or not, eg. ਉਤਸੁਕ

The inherent vowel is generally not pronounced at the end of a word (see the previous example), however the last letter is often followed by a ghost, eg.

ਅੱਜ

Gurmukhi can use [U+0A4D GURMUKHI SIGN VIRAMA] (called halant in Punjabi) to kill the inherent vowel after a consonant, but it is rarely seen. It isn't used at the end of a word, and is normally only used in modern Punjabi for subjoined r, ʋ (rare), and h, in which case it is invisible.

The virama may also be used occasionally to suppress the vowel in Sanskritised text, or in dictionaries for extra phonetic information.

Vowel sounds to characters

This section maps Punjabi vowel sounds to common graphemes in the Gurmukhi orthography, grouped by dependent ( d ) or standalone ( i ) forms. Click on a grapheme to find other mentions on this page (links appear at the bottom of the page). Click on the character name to see examples and for detailed descriptions of the character(s) shown.

dependent

[U+0A40 GURMUKHI VOWEL SIGN II​]

 
standalone

[U+0A08 GURMUKHI LETTER II]

ɪ
dependent

ਿ [U+0A3F GURMUKHI VOWEL SIGN I​]

 
standalone

[U+0A07 GURMUKHI LETTER I]

ʊ
dependent

[U+0A41 GURMUKHI VOWEL SIGN U​]

 
standalone

[U+0A09 GURMUKHI LETTER U]

dependent

[U+0A42 GURMUKHI VOWEL SIGN UU​]

 
standalone

[U+0A0A GURMUKHI LETTER UU]

dependent

[U+0A47 GURMUKHI VOWEL SIGN EE​]

ਿਹ [U+0A3F GURMUKHI VOWEL SIGN I​ + U+0A39 GURMUKHI LETTER HA] – applies a high tone.

 
standalone

[U+0A0F GURMUKHI LETTER EE]

dependent

[U+0A4B GURMUKHI VOWEL SIGN OO​]

ੁਹ [U+0A41 GURMUKHI VOWEL SIGN U​ + U+0A39 GURMUKHI LETTER HA] – with a high tone. 

 
standalone

[U+0A13 GURMUKHI LETTER OO]

ə
dependent

The inherent vowel.

 
standalone

[U+0A05 GURMUKHI LETTER A]

ɛː
dependent

[U+0A48 GURMUKHI VOWEL SIGN AI​]

ਹਿ [U+0A39 GURMUKHI LETTER HA + U+0A3F GURMUKHI VOWEL SIGN I​] – applies a high tone.

 
standalone

[U+0A10 GURMUKHI LETTER AI]

 
standalone

[U+0A14 GURMUKHI LETTER AU]

dependent

[U+0A3E GURMUKHI VOWEL SIGN AA​]

 
standalone

[U+0A06 GURMUKHI LETTER AA]

◌̃

[U+0A71 GURMUKHI ADDAK​] with the inherent vowel or i and u.

[U+0A02 GURMUKHI SIGN BINDI​] with all other vowels.

Sources: Wikipedia, and Google Translate.

Tones

Gurmukhi doesn't normally use tone diacritics. Instead, certain character combinations serve to indicate high and low tones. The level tone is not marked.

Tonal stop letters

Five of the consonants – those nominally representing voiced, aspirated sounds in the Brahmi model – indicate changes in tone. The articulatory pronunciation is unaspirated and, when syllable-initial, unvoiced.

ਘ␣ਝ␣ਢ␣ਧ␣ਭ

These letters indicate a low tone when they appear at the beginning of a word or syllable or medially between a short and long vowel, eg.

ਘੋੜਾ

ਪਘਾਰਨਾ

They indicate a high tone when elsewhere,o eg.

ਕੁਝ

ਬਾਘ

The letter HA

[U+0A39 GURMUKHI LETTER HA] is typically pronounced h when syllable-initial.

ਹਰੀ

ਅਹਾਰ

However, in unstressed syllables in this position it is often elided and the vowel takes on a high tone.

ਕਹਿਣਾ

ਬਹੁਤ

In non-initial positions the letter serves as a tone marker, and is not pronounced, but instead indicates that that syllable has a high tone.

After an open syllable the full letter shape is used, but if the syllable has a coda, this letter appears subjoined below the coda (see stacks).

ਮੀਹ

ਚੜ੍ਹ

When the letter ha follows a short i or u, it changes the vowel's phonetic value from [ɪ] and [ʊ] to [é] and [ó], respectively.

(The sound h after a vowel can be produced using [U+0A03 GURMUKHI SIGN VISARGA​], but it is only rarely used.)

Observation: Wiktionary contains at least one apparent exception to the above: the word ਮੂੰਹ ends with a pronounced h.

Udaat

The diacritic [U+0A51 GURMUKHI SIGN UDAAT​] can also be used in older texts to indicate a high tone.

Consonants

Gujarati uses 32 consonant letters. The repertoire can be extended by applying the nukta diacritic to 5 characters, to represent foreign sounds, particularly for words taken from Persian.

A final h can be indicated using the visarga, but otherwise final consonants are written using ordinary characters.

Although consonant clusters are frequent, there are very few conjuncts, mostly just r and h, which are subjoined. This leads to difficulties for automatic transcription.

Consonant gemination is indicated, unusually for an Indian script, by a special diacritic that appears before the letter being lengthened.

Basic consonants

Gurmukhi has a set of consonants that mostly map onto the traditional Brahmi phonetic matrix, though not all are used for articulatory distinctions.

ਪ␣ਭ␣ਬ␣ਫ␣ਤ␣ਧ␣ਦ␣ਥ␣ਚ␣ਝ␣ਜ␣ਛ␣ਟ␣ਢ␣ਡ␣ਠ␣ਕ␣ਘ␣ਗ␣ਖ
ਵ␣ਸ␣ਹ
ਮ␣ਨ␣ਞ␣ਣ␣ਙ
ਯ␣ਰ␣ੜ␣ਲ

[U+0A05 GURMUKHI LETTER A] is also classified as a null consonant and is described in independentvowels.

Repertoire extension

[U+0A3C GURMUKHI SIGN NUKTA] is used to represent foreign sounds, particularly for words taken from Persian, eg. in the following example the dot changes ɡ to ਗ਼ ɣ and s to ਜ਼ z

ਕਾਗ਼ਜ਼

The following graphemes combine nukta with an existing consonant.

ਫ਼␣ਜ਼␣ਸ਼␣ਖ਼␣ਗ਼␣ਲ਼

The nukta should always be typed and stored immediately after the consonant it modifies, and before any combining vowels or diacritics.

ਲ਼ [U+0A32 GURMUKHI LETTER LA + U+0A3C SIGN NUKTA​] is different from the other extended letters in that it represents a native Punjabi sound. This letter was only recently added to the Gurmukhī alphabet. Some sources do not consider it a separate letterws, and it may not always be written. It tends to get used in dictionaries to clarify pronunciation.

These graphemes are normally represented by decomposed sequences, and in fact that is what is produced by Unicode Normalization Form C (NFC). However, there are also a set of precomposed code points in the Unicode Gurmukhi block.

ਫ਼␣ਜ਼␣ਸ਼␣ਖ਼␣ਗ਼␣ਲ਼

The Unicode Standard recommends not to use the precomposed code points for Gurmukhi, but instead to use the base + nukta sequences. See also encoding.

Tone-related consonants

Gurmukhi uses certain characters to indicate high and low tones. These include the letters that are nominally associated with aspirated, voiced plosives (which sounds don't exist in Punjabi) and the letter [U+0A39 GURMUKHI LETTER HA].

See tones for more information.

Final consonants

Syllable-final consonant sounds are generally represented by ordinary consonant characters (or perhaps a conjunct with h for tonal indications). However, a final h can sometimes be represented by the visarga ( [U+0A03 GURMUKHI SIGN VISARGA]).

Consonant clusters

Consonant clusters are written in one of the following ways:

  1. No special rendering. This is the standard approach in Gurmukhi.
  2. Conjuncts, where the second character appears below the first in a stack.
  3. Using yakash below the initial character (quite rare).

See also gemination.

No special rendering

Clusters of consonants without intervening vowel sounds are generally not marked in Gurmukhi. It is necessary to just know that the vowel should not be pronounced, eg. ਉਤਸੁਕ

Vertical stacks

Normally Gurmukhi produces stacks to indicate a consonant cluster for only 3 combinations. In each case, the second letter in the cluster appears subjoined to the first.

The character h in non-initial position is used to indicate tones (see consonant_tones). When the h follows a consonant, it is subjoined to it, eg. ਚੜ੍ਹ

Syllable-initial clusters also occur with r and occasionally v, and are also indicated using subjoined forms, eg. ਪ੍ਰਬੰਧ ਸ੍ਵਰਗ Subjoined v is much less common in modern text.

To indicate conjunct clusters in Unicode text add [U+0A4D GURMUKHI SIGN VIRAMA] before the subjoined character, eg. ਪ੍ਰ is produced by the sequence + + [U+0A2A GURMUKHI LETTER PA + U+0A4D GURMUKHI SIGN VIRAMA + U+0A30 GURMUKHI LETTER RA].

The virama may also be used occasionally to suppress the vowel in Sanskritised text, or in dictionaries for extra phonetic information.ws In some fonts subjoined forms are available for other consonant clusters, though they are not used unless the content author uses a virama.

ਪ੍ਤ ਪ੍ਦ ਪ੍ਥ ਪ੍ਚ ਪ੍ਟ ਪ੍ਠ ਪ੍ਗ ਪ੍ਹ ਪ੍ਵ ਪ੍ਨ ਪ੍ਰ
Conjunct forms produced after the letter p by the Mukta Mahee font. They include the following sounds, in order of appearance: pt pd ptʰ pt͡ʃ pʈ pʈʰ pg pɦ pʋ pn pr.

See a table of 2-consonant clusters.
The table allows you to test results for various fonts.

Yakash

Occasionally, a cluster ending with y is rendered using [U+0A75 GURMUKHI SIGN YAKASH​], though this appears to be quite rare, eg.

ਕਲੵਚਰੈ kly̆ʧrɛ culture

Geminated consonants

Doubling or reinforcement of a consonant sound is indicated, unusually for an indic script, using a diacritic, [U+0A71 GURMUKHI ADDAK​]. It is typed before the consonant (In this way it resembles the small tsu in Japanese), and is placed to the left of the consonant it affects (not over it), eg. ਪੱਕੀ

The diacritic may appear over the right side of the preceding consonant, but if that consonant has a vowel sign or extension above the horizontal topline, it may be displayed on a short extension of the joining line.

ਉੱਛਲ ਭੇੱਲ ਭੁੱਲ
Placement of the addak diacritic

Geminated mm and nn may be written using a nasalisation diacritic associated with the preceding vowel,d eg. ਲੰਮੀ

Consonant sounds to characters

This section maps Punjabi consonant sounds to common graphemes in the Gurmukhi orthography. Click on a grapheme to find other mentions on this page (links appear at the bottom of the page). Click on the character name to see examples and for detailed descriptions of the character(s) shown.

Stops

p

[U+0A2A GURMUKHI LETTER PA]

[U+0A2D GURMUKHI LETTER BHA] syllable-initial, signals a low tone 

b

[U+0A2C GURMUKHI LETTER BA]

[U+0A2D GURMUKHI LETTER BHA] NOT syllable-initial, signals a high tone 

t

[U+0A24 GURMUKHI LETTER TA]

[U+0A27 GURMUKHI LETTER DHA]  syllable-initial, signals a low tone

d

[U+0A26 GURMUKHI LETTER DA]

[U+0A27 GURMUKHI LETTER DHA]  NOT syllable-initial, signals a high tone

ʈ

[U+0A1F GURMUKHI LETTER TTA]

[U+0A22 GURMUKHI LETTER DDHA]  syllable-initial, signals a low tone

ɖ

[U+0A21 GURMUKHI LETTER DDA]

[U+0A22 GURMUKHI LETTER DDHA]  NOT syllable-initial, signals a high tone

k

[U+0A15 GURMUKHI LETTER KA]

[U+0A18 GURMUKHI LETTER GHA]  syllable-initial, signals a low tone

ɡ

[U+0A17 GURMUKHI LETTER GA]

[U+0A18 GURMUKHI LETTER GHA]  NOT syllable-initial, signals a high tone

Affricates

t͡ʃ

[U+0A1A GURMUKHI LETTER CA]

[U+0A1D GURMUKHI LETTER JHA]  syllable-initial, signals a low tone

d͡ʒ

[U+0A1C GURMUKHI LETTER JA]

[U+0A1D GURMUKHI LETTER JHA]  NOT syllable-initial, signals a high tone

Fricatives

f

ਫ਼ [U+0A2B GURMUKHI LETTER PHA + U+0A3C GURMUKHI SIGN NUKTA​]

[U+0A5E GURMUKHI LETTER FA]  (decomposes in NFC and doesn't recompose) 

z

ਜ਼ [U+0A1C GURMUKHI LETTER JA + U+0A3C GURMUKHI SIGN NUKTA​]

[U+0A5B GURMUKHI LETTER ZA]  (decomposes in NFC and doesn't recompose) 

ʃ

ਸ਼ [U+0A38 GURMUKHI LETTER SA + U+0A3C GURMUKHI SIGN NUKTA​]

[U+0A36 GURMUKHI LETTER SHA]  (decomposes in NFC and doesn't recompose)

x

ਖ਼ [U+0A16 GURMUKHI LETTER KHA + U+0A3C GURMUKHI SIGN NUKTA​]

[U+0A59 GURMUKHI LETTER KHHA]  (decomposes in NFC and doesn't recompose)

ɣ

ਗ਼ [U+0A17 GURMUKHI LETTER GA + U+0A3C GURMUKHI SIGN NUKTA​]

[U+0A5A GURMUKHI LETTER GHHA]  (decomposes in NFC and doesn't recompose)

Nasals

m

[U+0A2E GURMUKHI LETTER MA]

[U+0A71 GURMUKHI ADDAK​] with the inherent vowel or i and u, followed by a labial consonant.

[U+0A02 GURMUKHI SIGN BINDI​] with other vowels before a labial consonant. 

n

[U+0A28 GURMUKHI LETTER NA]

[U+0A71 GURMUKHI ADDAK​] with the inherent vowel or i and u, followed by an alveolar consonant.

[U+0A02 GURMUKHI SIGN BINDI​] with other vowels before an alveolar consonant. 

ɲ

[U+0A28 GURMUKHI LETTER NA] before a palatal consonant

[U+0A71 GURMUKHI ADDAK​] with the inherent vowel or i and u, followed by a palatal consonant.

[U+0A02 GURMUKHI SIGN BINDI​] with other vowels before a palatal consonant. 

[U+0A1E GURMUKHI LETTER NYA] (rare) 

ŋ

[U+0A28 GURMUKHI LETTER NA] before a velar consonant

[U+0A71 GURMUKHI ADDAK​] with the inherent vowel or i and u, followed by a velar consonant.

[U+0A02 GURMUKHI SIGN BINDI​] with other vowels before a velar consonant. 

[U+0A19 GURMUKHI LETTER NGA] (rare)

Other

w

[U+0A35 GURMUKHI LETTER VA] occasionally as a variant of ʋ

ɭ

ਲ਼ [U+0A32 GURMUKHI LETTER LA + U+0A3C GURMUKHI SIGN NUKTA​]

[U+0A33 GURMUKHI LETTER LLA]   (decomposes in NFC and doesn't recompose) 

Sources: Wikipedia, and Google Translate.

Encoding choices

This section looks at alternative strategies for typing and storing the nukta and considers the effects of normalising the text using Unicode Normalisation Form D (NFD), and Normalisation Form C (NFC).

The decomposed form is recommended by the Unicode Standard. NFC does not recombine the parts into precomposed characters. Instead, normalisation produces decomposed forms for both NFC and NFD. So both alternatives are canonically equivalent, but decomposed is recommended.

Precomposed Decomposed (recommended)
[U+0A5E GURMUKHI LETTER FA] + [U+0A2B GURMUKHI LETTER PHA + U+0A3C GURMUKHI SIGN NUKTA]
[U+0A5B GURMUKHI LETTER ZA] + [U+0A1C GURMUKHI LETTER JA + U+0A3C GURMUKHI SIGN NUKTA]
[U+0A36 GURMUKHI LETTER SHA] + [U+0A38 GURMUKHI LETTER SA + U+0A3C GURMUKHI SIGN NUKTA]
[U+0A59 GURMUKHI LETTER KHHA] + [U+0A16 GURMUKHI LETTER KHA + U+0A3C GURMUKHI SIGN NUKTA]
[U+0A5A GURMUKHI LETTER GHHA] + [U+0A17 GURMUKHI LETTER GA + U+0A3C GURMUKHI SIGN NUKTA]
[U+0A33 GURMUKHI LETTER LLA] + [U+0A32 GURMUKHI LETTER LA + U+0A3C GURMUKHI SIGN NUKTA]

Symbols

Gurmukhi uses a couple of religious symbols.

ੴ␣☬

[U+0A74 GURMUKHI EK ONKAR] can have various different forms. Unicode classes it as a letter. The shape in the Unicode charts is highly stylised.

The stylised shape of ek onkar in the Unicode chart.

The other religious symbol, [U+262C ADI SHAKTI], is encoded in Unicode's Miscellaneous Symbols block.

Numbers

Gurmukhi has its own set of decimal digits, however modern text tends to use ASCII digits.ws

੦␣੧␣੨␣੩␣੪␣੫␣੬␣੭␣੮␣੯

In some cases the choice of digits depends on the context. For example, list counter styles often use Gurmukhi digits, whereas postcodes, route numbers, and ordinal dates, etc. tend to use ASCII digits.@GitHub,https://github.com/r12a/scripts/issues/118#issuecomment-1235187772

Text direction

Gurmukhi script runs left to right in horizontal lines.

Show default bidi_class properties for characters in the Punjabi orthography described here.

Glyph shaping & positioning

This section brings together information about the following topics: writing styles; cursive text; context-based shaping; context-based positioning; baselines, line height, etc.; font styles; case & other character transforms.

You can experiment with examples using the Gurmukhi character app.

Gurmukhi text is not cursive (ie. joined up like Arabic).

The orthography has no case distinction, and no special transforms are needed to convert between characters.

Font styling & weight

tbd

Graphemes

Although grapheme clusters alone provide good segmentation for many Gurmukhi syllables (because of the lack of conjuncts), they are not sufficient to represent typographic units for stacks. Stacks are common and must not be split apart by edit operations that visually change the text (such as letter-spacing, first-letter highlighting, and in-word line breaking). For those operations one needs to segment the text using orthographic syllables, which string grapheme clusters together with a virama.

The Gurmukhi virama (halant) is [U+0A4D GURMUKHI SIGN VIRAMA​], which has an Indic Syllabic Category of Virama.

Grapheme clusters

Base Combining_mark* ZW(N)J?

Grapheme clusters cover the combinations described just above. In these sequences, Gurmukhi combining marks used for Punjabi may include zero or more of the following types of character.

  1. Nukta [1] (see extendedC) Only one per grapheme cluster, typed and stored immediately after the base consonant.
  2. Dependent vowels [9] (see combiningvowels) Always a single code point.
  3. Bindi [2] (see nasalisation) Occurs over an independent vowel or over a consonant.
  4. Gemination mark [1] (see gemination) Occurs after a consonant (with optional nukta).
  5. Other marks [3] (see yakash, finals and udaat) All are uncommon and occur after a consonant (with optional nukta).
  6. Virama (halant) (see clusters and novowel) Occurs immediately after a consonant (and optional nukta) at the beginning of a cluster. In modern Punjabi text stacks are only formed with a subjoined RA or HA, and on rare occasions VA. Other consonant clusters are simple sequences of consonant letters.

ZW(N)J is not usually found in Gurmukhi text.

The following examples show a variety of typical grapheme clusters:

Click on the text version of these words to see more detail about the composition.


ਧੂੰਆਂ

ਖਿੱਚਣਾ

ਕਾਗ਼ਜ਼

ਭਿਖੵਾ

ਕ਼ੌਮ

ਪ੍ਰਭੂ

ਹੜ੍ਹ

Note how grapheme clusters segment the parts of a stack after the virama in the last 2 examples. This is not always desirable (see orthographicS just below).

Larger typographic units

(Consonant Nukta? Virama)* Grapheme_cluster

Gurmukhi stacks medial RA (and sometimes VA) and HA after a syllable coda (see clusters). The stacks represent consonant clusters (but not gemination, which is indicated using a diacritic).

Grapheme clusters terminate after a sequence of marks that includes a halant, but editorial operations that change the visual appearance of the text, such as letter-spacing, first-letter highlighting, line-breaking, and justification, should never split conjunct forms apart. For this reason, an alternative way of segmenting graphemes is needed. This may not apply, however, for some other operations such as cursor movement or backwards delete.

Where stacks appear, a typographic unit contains multiple grapheme clusters. The non-final grapheme clusters all end with [U+0A4D GURMUKHI SIGN VIRAMA​], and the final grapheme cluster begins with a consonant.

The following are examples. Some examples were shown in the previous section: here the stack is treated as a single typographic unit.

Click on the text version of these words to see more detail about the composition.


ਪ੍ਰਭੂ

ਹੜ੍ਹ

ਅੰਮ੍ਰਿਤਸਰ

ਪ੍ਰਾਂਗਾਰ

ਫ਼੍ਰੈਂਚ

Browser behaviour

Test in your browser. The words test units that equate to grapheme clusters only, and others that include conjuncts. First, the text is displayed in a contenteditable paragraph, then in a textarea. Results are reported for Gecko (Firefox), Blink (Chrome), and WebKit (Safari) on a Mac.

ਧੂੰਆਂ ਕਾਗ਼ਜ਼ ਪ੍ਰਭੂ ਹੜ੍ਹ ਫ਼੍ਰੈਂਚ

Cursor movement. Move the cursor through the text.
Gecko steps through the text using grapheme clusters. It takes 2 steps (to get through the stacks, one grapheme cluster at a time. Blink and WebKit step through all words using the orthographic syllables as described here (ie. they step over a stack and all associated combining characters in one jump).

Selection. Place the cursor next to a character and hold down shift while pressing an arrow key.
The behaviour is the same as for cursor movement.

Deletion. Forward deletion works in the same way as cursor movement. The backspace key deletes code point by code point, for all browsers.

Line-break. See this test. The CSS sets the value of the line-break property to anywhere. Change the size of the box to slowly move the line break point.
Gecko, WebKit and Blink all wrap on orthographic syllable boundaries.

Punctuation & inline features

Word boundaries

Words are separated by spaces.

Some grammatical suffixes are separated from their stem by a space, and the two parts should not be separated.@GitHub,https://github.com/r12a/scripts/issues/118#issue-1359804026

Words are occasionally hyphenated, eg.

ਚਿੜੀ-ਛਿੱਕਾ

Phrase & section boundaries

,␣:␣;␣.␣?␣!␣।

Gurmukhi generally uses western punctuation.

phrase

, [U+002C COMMA]

; [U+003B SEMICOLON]

: [U+003A COLON]

sentence

. [U+002E FULL STOP]

? [U+003F QUESTION MARK]

! [U+0021 EXCLAMATION MARK]

[U+0964 DEVANAGARI DANDA] (infrequently)

[U+0964 DEVANAGARI DANDA] may be used rather than a period at the end of a sentence.

ਭਾਰਤ ਵਿਚ ਮੌਰੀਸ਼ਸ ਦੇ ਰਾਜਦੂਤ ਸ੍ਰੀਮਤੀ ਮੇਰੀ ਕਲੇਅਰ ਜੇ. ਮੌਂਟੀ ਅੱਜ ਸ੍ਰੀ ਹਰਿਮੰਦਰ ਸਾਹਿਬ ਵਿਖੇ ਦਰਸ਼ਨ ਕਰਨ ਪੁੱਜੇ। ਉਨ੍ਹਾਂ ਸ਼ਰਧਾ ਸਹਿਤ ਮੱਥਾ ਟੇਕ ਕੇ ਗੁਰੂ ਰਾਮਦਾਸ ਜੀ ਲੰਗਰ ਹਾਲ ਵਿਖੇ ਵੀ ਕੁੱਝ ਸਮਾਂ ਸੇਵਾ ਕੀਤੀ।
Examples of danda in use as a sentence delimiter in a newspaper report.
translation

The Ambassador of Mauritius to India Mrs. Mary Claire J. Monty came to visit Sri Harmandir Sahib today. He bowed down with reverence and served for some time at Guru Ramdas Ji's Langar Hall.

Bracketed text

(␣)

Punjabi commonly uses ASCII parentheses to insert parenthetical information into text.

  start end
standard

( [U+0028 LEFT PARENTHESIS]

) [U+0029 RIGHT PARENTHESIS]

Quotations & citations

‘␣’␣“␣”

Punjabi texts typically use quotation marks. Of course, due to keyboard design, quotations may also be surrounded by ASCII double and single quote marks.

  start end
initial

[U+201C LEFT DOUBLE QUOTATION MARK]

[U+201D RIGHT DOUBLE QUOTATION MARK]
nested

[U+2018 LEFT SINGLE QUOTATION MARK]

[U+2019 RIGHT SINGLE QUOTATION MARK]

Single quotation marks are used for quotations within quotations.

Emphasis

tbd

Abbreviation, ellipsis & repetition

[U+0A03 GURMUKHI SIGN VISARGA​] is used very occasionally in Gurmukhi. In some cases it acts like a Sanskrit visarga, producing a voiceless h sound, but in others it represents an abbreviation, in the same way the period is used in English.ws

However, contractions are very common in Punjabi text, and a much more common way of indicating these is to use ' [U+0027 APOSTROPHE]. A particularly common contraction is to represent ਵਿੱਚ as 'ਚ. @GitHub,https://github.com/r12a/scripts/issues/118#issue-1359804026

ਇਸਕੋਨ ਦੇ ਉਪ ਪ੍ਰਧਾਨ ਰਾਧਾਰਮਨ ਨੇ ਕਿਹਾ ਕਿ ਆਸਟ੍ਰੇਲੀਆ ’ਚ ਹਿੰਦੂ ਮੰਦਿਰਾਂ ’ਤੇ ਹਮਲਿਆਂ ਦੀਆਂ ਵਧਦੀਆਂ ਘਟਨਾਵਾਂ ਚਿੰਤਾਜਨਕ ਹਨ।
Examples of abbreviated words using apostrophes.
translation

ISKCON Vice President Radharaman said that the increasing incidence of attacks on Hindu temples in Australia is alarming.

Inline notes & annotations

tbd

Other punctuation

CLDR also lists the following non-ASCII characters.

‐␣–␣—␣′␣″

Other inline text decoration

tbd

Line & paragraph layout

Line breaking & hyphenation

By default, Gurmukhi breaks lines at inter-word spaces.

Show (default) line-breaking properties for characters in the modern Punjabi orthography.

Text alignment & justification

tbd

Text spacing

tbd

This section looks at ways in which spacing is applied between characters over and above that which is introduced during justification.

Baselines, line height, etc.

Gurmukhi uses the so-called 'alphabetic' baseline, which is the same as for Latin and many other scripts.

It also has a 'hanging baseline', which may be used for text alignment in things such as initial letter highlighting. The hanging baseline is based on the top bar that joins the letters.

Gurmukhi requires slightly more vertical space than Latin text. To give an approximate idea, fig_baselines compares Latin and Gurmukhi glyphs from Noto fonts. The basic Gurmukhi letters are typically slightly higher than the Latin x-height, and conjunct stacks and other diacritics extend slightly below the Latin descenders. The hanging baseline is slightly higher than the Latin x-height (Noto fonts actually have a lower top bar than many others).

Hhqxਚੜ੍ਹਸ੍ਵੰਰੀਘੋਊਰੈਲੵੴ੬ Hhqxਚੜ੍ਹਸ੍ਵੰਰੀਘੋਊਰੈਲੵੴ੬
Font metrics for Latin text compared with Gurmukhi glyphs in the Noto Serif Gurmukhi (top) and Noto Sans Gurmukhi (bottom) fonts.

fig_baselines_other shows similar comparisons for the Baloo Paaji 2 and Raavi fonts.

Hhqxਚੜ੍ਹਸ੍ਵੰਰੀਘੋਊਰੈਲੵੴ੬ Hhqxਚੜ੍ਹਸ੍ਵੰਰੀਘੋਊਰੈਲੵੴ੬
Latin font metrics compared with Gurmukhi glyphs in the Baloo Paaji (top) and Raavi (bottom) fonts.

Counters, lists, etc.

You can experiment with counter styles using the Counter styles converter. Patterns for using these styles in CSS can be found in Ready-made Counter Styles, and we use the names of those patterns here to refer to the various styles.

The modern Punjabi orthography uses a native numeric style.

Numeric

The gurmukhi numeric style is decimal-based and uses these digits.rmcs

੧␣੨␣੩␣੪␣੫␣੬␣੭␣੮␣੯␣੦

Examples:

੧␣੨␣੩␣੪␣੧੧␣੨੨␣੩੩␣੪੪␣੧੧੧␣੨੨੨␣੩੩੩␣੪੪੪

Prefixes and suffixes

Punjabi commonly uses a full stop + space as a suffix.

Examples:

੧. ੨. ੩. ੪. ੫.
Separator for Punjabi list counters: full stop + space.

Styling initials

tbd

Page & book layout

This section is for any features that are specific to Gurmukhi and that relate to the following topics: general page layout & progression; grids & tables; notes, footnotes, etc; forms & user interaction; page numbering, running headers, etc.

References

Acknowledgements

Thanks to @bgo-eiu for extensive comments on an early version of this page.