Updated 10 January, 2023
This page brings together basic information about the Balinese script and its use for the Balinese language. It aims to provide a brief, descriptive summary of the modern, printed orthography and typographic features, and to advise how to write Balinese using Unicode.
ᬫᬓᬲᬫᬶᬫᬦᬸᬲᬦᬾᬓᬳᭂᬫ᭄ᬩᬲᬶᬦ᭄ᬫᬳᬃᬤᬶᬓᬮᬦ᭄ᬧᬢᭂᬄᬲᬚ᭄ᬭᭀᬦᬶᬂᬓᬳᬦᬦ᭄ᬮᬦ᭄ᬓ᭄ᬯᬲ᭟ ᬳᬶᬧᬸᬦ᭄ᬓᬵᬦᬸᬕ᭄ᬭᬳᬶᬦ᭄ᬯᬶᬯᬾᬓᬮᬦ᭄ᬩᬸᬤ᭄ᬥᬶ᭞ ᬧᬦ᭄ᬢᬭᬦᬶᬂᬫᬦᬸᬲᬫᬗ᭄ᬤᬦᬾᬧᬭᬲ᭄ᬧᬭᭀᬲ᭄ᬫᬲᭂᬫᭂᬢᭀᬦᬦ᭄᭟
The Balinese script is used for writing the Balinese language spoken on the Indonesian islands of Java and Bali. It may also be used for Old Javanese, and liturgical Sanskrit. With some additions, it is also used to write Sasak in the neighbouring island of Lombok.
Everyday use of the script has largely been eclipsed by the Latin alphabet, but Balinese has a significant presence in traditional ceremonies and texts of the Hindu religion. It is also used for signage on roads, at the entrances to villages, and on government buildings. Traditional literature is published on a small scale, but little modern literature. Sekaha Pesantian community groups gather to read the Balinese script in a social context, commonly in song form.
ᬅᬓ᭄ᬱᬭᬩᬮᬶ ạk͓ṡ̂rbli aksara bali Balinese script
Balinese script is derived from Old Kawi, and ultimately from Brahmi. Historically, Balinese was written on palm leaves or inscribed in stone. Its similarity to the Javanese script in form and behaviour leads some to propose that they are typological variants of each other.
Sources Scriptsource and Wikipedia.
The script is an abugida. Consonants carry an inherent vowel a, although that is pronounced ə at the end of a word. See the table to the right for a brief overview of features for the Balinese language.
Balinese text runs left to right in horizontal lines.
Words are not separated by spaces, however syllables may be separated by ZWSP, as long as they don't fall inside a stack. ❯ word
The 18 consonant letters used for pure Balinese words are supplemented by 15 more derived from Sanskrit and Kawi words, some of which are used as honorifics, a little like capital letters. Repertoire extensions for 8 non-native sounds are achieved by applying the rerekan diacritic to characters. ❯ consonants
Consonant clusters are represented by stacked consonants (many subjoined consonants have alternative shapes) or conjoined pairs. Occasionally, a visible adeg adeg (virama) is used. ❯ clusters
Stacked consonants and conjoined pairs span word boundaries.
Syllable-initial clusters use 3 subjoined versions of ordinary consonants or vocalics for the second consonant. ❯ onsets
Word-final consonant sounds may be represented by 3 final-consonant diacritics. Otherwise, if nothing follows, they areordinary consonants followed by a visible ᭄ [U+1B44 BALINESE ADEG ADEG]. ❯ finals
The Balinese orthography has an inherent vowel, and represents vowels using 11 vowel signs (including 2 pre-base and 6 circumgraphs, most of which can decompose into composite vowels). All vowel signs are combining marks, and are stored after the base character. ❯ vowels
In principle, Balinese has no composite vowels, however the 6 circumgraphs can also be decomposed into 2 parts. Those can involve up to 2 glyphs, and glyphs can surround the base consonant(s) on up to 3 sides.
Independent vowels are used at the beginning of a word for standalone vowel sounds. Inside a word these are written using vowel signs applied to ᬳ [U+1B33 BALINESE LETTER HA]. ❯ standalone
Balinese has vocalics. ❯ vocalics
The following represents the repertoire of the Balinese language.
Click on the sounds to see where else in the document they are referred to.
Phones in a lighter colour are non-native or allophones .
|stop||p b||t d||k ɡ|
|fricative||f v||s z||x ɣ||ħ ʕ||h|
See also vocalics
ᬓ ka U+1B13 BALINESE LETTER KA
a following a consonant is not written, but is seen as an inherent part of the consonant letter, so ka is written by simply using the consonant letter.
However, the inherent vowel is pronounced ə at the end of a word and also in prefixes ma-, pa- and da-.
Non-inherent vowel sounds that follow a consonant are represented using vowel signs.
Balinese vowel signs are all combining characters. In principle a single Unicode character is used per base consonant, even if the vowel signs appear on both sides of the base consonant, however 5 vowel signs decompose to more than one character (see circumgraphs). All vowel signs are typed and stored after the base consonant, and the glyph rendering system takes care of the positioning at display time.
The glyphs used to represent vowels, whether alone or in composites, are arranged around a syllable onset, which may be 2 consonants, rather than just around the immediately preceding consonant. For an example of the effect this produces, see prebase and circumgraphs.
The majority of the vowel signs are spacing marks, meaning that they consume horizontal space when added to a base consonant.
ᬓᬶ ki U+1B13 BALINESE LETTER KA + U+1B36 BALINESE VOWEL SIGN ULU
Balinese uses the following dedicated combining marks for vowels.
To represent the sounds rə or lə, Balinese uses vocalic letters. A sequence such as *ᬭᭂ [U+1B2D BALINESE LETTER RA + U+1B42 BALINESE VOWEL SIGN PEPET] is not used. See vocalics.
ᬓᬾ ke U+1B13 BALINESE LETTER KA + U+1B3E BALINESE VOWEL SIGN TALING
Two vowel signs appear to the left of the base consonant letter or cluster.
These are combining marks that are always stored after the base consonant. The rendering process places the glyph before the base consonant. Click on the following word to see the sequence of characters in storage.
These vowel characters are actually placed before the start of the orthographic syllable. This means that a word with a consonant cluster at the start separates the pre-base vowel from any post-base vowels by more than one consonant character (see fig_prebase).
ᬓᭀ ko U+1B13 BALINESE LETTER KA + U+1B40 BALINESE VOWEL SIGN TALING TEDUNG
Five vowel signs are usually produced by a single combining character with visually separate parts, that appear on different sides of the consonant onset.
This section includes some vowel signs described in the section vocalics.
Like pre-base glyphs, these are combining marks that are always stored after the base consonant. The rendering process places the glyphs around the base consonant, as needed. Click on the following word to see the sequence of characters in storage.
Glyphs can appear on up to 3 sides of the base. Some of the glyphs merge with the base character's glyph (see context).
These circumgraphs have canonically equivalent decomposed forms (see vs_encoding).
Balinese has 2 ways to represent standalone vowels.
Typically, a standalone vowel is represented by a vowel sign attached to ᬳ [U+1B33 BALINESE LETTER HA], which acts as a carrier, eg. ᬤᬳᬾᬭᬄ
Without a vowel sign the letter ᬳ [U+1B33 BALINESE LETTER HA] may represent a, eg. ᬳᬮᬲ᭄
However, it may be unclear from the written text whether ᬳ [U+1B33 BALINESE LETTER HA] represents the sound h or is used as a carrier for a vowel, eg. compare ᬳᬶᬕ ᬳᬶᬕᭂᬮ᭄
At the beginning of a word, most standalone vowels are represented using one of the 10 independent vowel characters. The set includes a character to represent the inherent vowel sound. Examples:
The vowels ◌ᭂ [U+1B42 BALINESE VOWEL SIGN PEPET] and ◌ᭃ [U+1B43 BALINESE VOWEL SIGN PEPET TEDUNG] don't have an independent form, and have to be used alongside ᬳ [U+1B33 BALINESE LETTER HA] at the beginning of a word.
In Sasak, independent vowel ᬅ [U+1B05 BALINESE LETTER AKARA] can be followed by an explicit ◌᭄ [U+1B44 BALINESE ADEG ADEG] in word- or syllable-final position, where it indicates the glottal stop. Other consonants can also be subjoined to it. eg. ᬳᬫᬅ᭄ hmạ͓ amaʔ
Balinese uses ᭄ [U+1B44 BALINESE ADEG ADEG] (the Balinese equivalent of the Sanskrit virama) to kill the inherent vowel after a consonant.
The adeg adeg is always visible at the end of a word that ends in consonant and isn't followed by another consonant, eg. ᬧᬦᬓ᭄
It is usually hidden (with occasional exceptions) when the consonant is part of a consonant cluster (see clusters).
Sometimes it is used to clarify the distinction between a word-final consonant and a medial consonant by preventing the stacking of the final consonant in the previous word and the first consonant in the next, eg. compare: ᬧᬓ᭄ᬭᬫᬦ᭄ ᬧᬓ᭄ᬭᬫᬦ᭄
The following list shows where vowel signs, including vocalics, are positioned around a base consonant to produce vowels, and how many instances of that pattern there are.
At maximum, vowel components can occur concurrently on 3 sides of the base.
This section maps Balinese vowel sounds to common graphemes in the Balinese orthography. Click on a grapheme to find other mentions on this page (links appear at the bottom of the page). Click on the character name to see examples and for detailed descriptions of the character(s) shown.
Sounds listed as 'infrequent' are allophones, or sounds used for foreign words, Sanskrit, etc.
Word-internal standalone vowels (and word-initial in the case of ə and əː) use the vowel sign over a silent ᬳ [U+1B33 BALINESE LETTER HA]. Vowel signs that decompose are shown only in precomposed form.
Inherent vowel at the end of a word and also in prefixes ma-, pa- and da-.
ᬺ [U+1B3A BALINESE VOWEL SIGN RA REPA] in vocalic rə.
ᬼ [U+1B3C BALINESE VOWEL SIGN LA LENGA] in vocalic lə.
ᬋ [U+1B0B BALINESE LETTER RA REPA] in word-initial vocalic rə.
ᬍ [U+1B0D BALINESE LETTER LA LENGA] in word-initial vocalic lə.
ᬻ [U+1B3B BALINESE VOWEL SIGN RA REPA TEDUNG] in vocalic rəː
ᬽ [U+1B3D BALINESE VOWEL SIGN LA LENGA TEDUNG] in vocalic ləː
ᬌ [U+1B0C BALINESE LETTER RA REPA TEDUNG] in word-initial vocalic rəː.
ᬎ [U+1B0E BALINESE LETTER LA LENGA TEDUNG] in word-initial vocalic ləː.
At the beginning of a syllable the vocalic is treated as a consonant, eg. ᬓᭂᬋᬂ ᬢᬍᬃ
As a second component in a consonant cluster, the vocalic ra repa has a postfixed form and a subjoined form.
When the sound occurs after a syllable-initial consonant, ie. as a medial consonant in the same syllable, use the vowel sign which produces the subjoined form. The sequence of characters here is simply C + ᬺ [ consonant + U+1B3A BALINESE VOWEL SIGN RA REPA], eg. ᬓᬺᬰ᭄ᬡ
When the sound occurs after a syllable-final consonant, ie. as the start of a new syllable, use the conjoined (postfix) form . The sequence of Unicode characters is C + ᭄ + ᬋ [ consonant + U+1B44 BALINESE ADEG ADEG + U+1B0B BALINESE LETTER RA REPA], eg. ᬧᬓ᭄ᬋᬋᬄ
The characters listed here and in the following sections also have conjoined and/or subjoined forms, which may differ significantly from those shown here. See clusters for a list of glyph shapes.
This section lists the basic 18 consonants known as ᬅᬓ᭄ᬱᬭᬯᬺᬱᬵᬲ᭄ᬢ᭄ᬭ aksara wreṣāstra.
ᬳ [U+1B33 BALINESE LETTER HA] at the beginning of a word or after a preceding vowel is mostly used as a support for a vowel sign (see standalone), and is not pronounced or transcribed. Word finally with a suffix vowel, however, it is transcribed.loc
These are called ᬅᬓ᭄ᬱᬭᬰ᭄ᬯᬮᬮᬶᬢ ạk͓ṡ̂rŝ͓wllit aksara sualalita.
Many of the additional consonants are commonly used in words originating from Arabic and Dutch, and are most common in north Bali and Lombok. When used in pure Balinese words, they are similar to capital letters and are used to create an honorific effect. There are similar characters in Javanese.
They don't add any consonant sounds to the Balinese repertoire. In words originating from Sanskrit, Old Javanese, or Old Balinese, they represent aspirated or other consonants.loc
Additional consonants used for Sanskrit words.
Additional consonants used for words from Kawi.
Two consonants, ᬔ [U+1B14 BALINESE LETTER KA MAHAPRANA] and ᬙ [U+1B19 BALINESE LETTER CA LACA,] are considered very rare, and one other, ᬛ [U+1B1B BALINESE LETTER JA JERA], seems to be known from only one word:
(It is possible that an original ai may have been lost in Balinese, to be replaced by the glyph for jʰa.)
A number of the Sanskrit or Kawi consonants are rather poorly attested. The letter ᬙ [U+1B19 BALINESE LETTER CA LACA] is only found in non-initial position following ᬘ [U+1B18 BALINESE LETTER CA], ie. ᬘ᭄ᬙ c͓CMost of the series that originally represented retroflex sounds is often omitted in books about the script.
The combining mark ᬴ [U+1B34 BALINESE SIGN REREKAN] is used, as is a similar sign in Javanese, to extend the character repertoire for foreign sounds.
The first 7 of the 8 listed above are attested in Library of Congress transliterations and in earlier Sasak orthography. The 8th, ᬤ᬴ could be used for one-to-one transliteration for Javanese ɖ.
In rendering, the dots of these letters appear above the top character, which can cause some ambiguity in reading. The following are all visually indistinguishable: ᬓ᬴᭄ᬚ kˑ͓ʤ xja ᬓ᭄ᬚ᬴ k͓ʤˑ kza ᬓ᬴᭄ᬚ᬴ kˑ͓ʤˑ xza
In practice these combinations are probably rather rare.
In recent times, Sasak users abandoned the use of the Javanese-influenced rerekan in favour of a series of modified letters (see above), making use, in addition, of some of unused Kawi letters for the Arabic sounds. In place of ᬓ᬴ x and ᬕ᬴ ɣ, for instance, the new fusion of KA and HA, ᭆ [U+1B46 BALINESE LETTER KHOT SASAK] and the Kawi letter ᬖ [U+1B16 BALINESE LETTER GA GORA] are used.
(Does the fact that these relate to aspirated or retroflex forms originally affect the pronunciation?)
The consonants ya, ra, la and wa regularly appear immediately after the initial consonant in a syllable. Unlike Javanese, Balinese has no special characters for these medial sounds (other than the vocalics mentioned earlier); they are just written using the normal approach for dealing with consonant clusters, eg. ᬓ᭄ᬭᬫ
Multiple medials can occur: r or l can be followed by w or y, eg. ᬩ᭄ᬭ᭄ᬬᬕ᭄
In addition, the vocalics can produce consonant sounds (tied to a specific vowel) in medial position, eg. ᬓᬺᬰ᭄ᬡ
See clusters for more details on shaping of glyphs.
Word-final consonant sounds with no following consonant are by default represented by ordinary consonant characters, followed by ᭄ [U+1B44 BALINESE ADEG ADEG] . which is visible if not followed by another consonant, eg. ᬓᬵᬤᭂᬧ᭄ ᬓᬧᬮ᭄
However, there is also a set of combining marks for syllable-final consonants that don't need to be followed by the adeg adeg.
ᬂ [U+1B02 BALINESE SIGN CECEK] and ᬄ [U+1B04 BALINESE SIGN BISAH] only appear at the end of a word, eg. ᬓᭂᬋᬂ ᬫᬗᬄ unless the word involves repetition, eg. ᬘᬾᬂᬘᬾᬂ
ᬃ [U+1B03 BALINESE SIGN SURANG] can appear at the end of any syllable, eg. ᬓᬃᬡ
A syllable-final diacritic may appear above a stack. It is typed and stored after the other components in the stack, eg. ᬩᬗ᭄ᬓᬸᬂ
When the syllable has a spacing vowel sign, any above-base final-consonant mark appears over the base character, rather than over the vowel sign. This is positioned by the font; the final consonant mark is still typed and stored after the other syllable components, eg. ᬕᭂᬤᭀᬂ
See also cchar_modre.
The absence of a vowel sound between two or more consonants is visually indicated in one of the following ways.
See also finals for a dedicated final consonant mark followed by a regular consonant.
Word boundaries. Conjuncts span word boundaries. Because there are no spaces between words, a cluster is created when a consonant with no following vowel at the end of a word is followed by a consonant at the beginning of the next word.
Stacks and conjoined sequences are not normally split at line ends (see word and linebreak for the ramifications of this).
Stacked and conjoined consonant clusters are referred to as conjuncts.
In Unicode, the stacking and conjoining behaviour is achieved by adding ᭄ [U+1B44 BALINESE ADEG ADEG] between the consonants. The font hides the glyph automatically when a conjunct is formed.
In some cases, however, the adeg adeg remains visible (see adegadeg).
To represent consonants without intervening vowels, the non-initial consonant is typically drawn below the initial consonant, and with a slightly different shape.
Many of the subjoined forms are just slightly smaller versions of the original, but several have very different shapes altogether, most of which ligate with the cluster initial consonant by joining strokes.
There can be up to 3 consonants combined in this way, and the third consonant must be one of ya, ra, la or wa.
This list shows consonants in their normal and subjoined forms
In conjoined clusters, the consonant glyphs remain side by side, but the non-initial consonant is reduced on the left side. fig_conjoined_s shows an example in the word ᬅᬓ᭄ᬱᬭ.
This list shows consonants in their normal and conjoined forms
The conjoined ᬲ [U+1B32 BALINESE LETTER SA] is unusual in that it also adds a stroke below the initial consonant. This helps distinguish it from the conjoined p. See fig_conjoined_sa for an example in the word ᬧᬓ᭄ᬲ.
Because there is no word separator, consonants at the end of one word and beginning of the following word are normally stacked, too.
In some cases this leads to ambiguity about whether this is one or two words. If you really want to make clear which is which, you can use an explicit adeg-adeg, eg. compare ᬧᬓ᭄ᬭᬫᬦ᭄ ᬧᬓ᭄ᬭᬫᬦ᭄
The Unicode Standard recommends the use of U+200C ZERO WIDTH NON-JOINER] (ZWNJ) after the [adeg-adeg in order to prevent conjunct formation. However, not many people understand the function of ZWNJ or can access it easily from the keypad. It also doesn't introduce line-break opportunities. A better solution may be to use U+200B ZERO WIDTH SPACE] (ZWSP). This character is needed anyway on most systems in order to allow line-breaking, and it appears to work equally well for this. [
A somewhat ambiguous situation arises where conventions prevent certain combinations stacking. For example, the name of the village tamblung should not stack the mbl, but should look like ᬢᬫ᭄ᬩ᭄ᬮᬂ
The Unicode Standard advises to use a zero-width non-joiner after ma, to achieve this.
Observation: Note that this may also be achieved by intelligence in the font, as was actually the case when I generated this example (click on it to see). It's not clear to me what is the preferred approach: put ZWNJ in only when the font doesn't do what you want, or use it always. The latter may lead to more consistent content where different fonts are applied to the text (eg. after cut and paste). In theory, this shouldn't affect searching and sorting, although some applications may not ignore the ZWNJ as they should.
Balinese represents some final consonants using dedicated marks (see finals). Such final marks are followed by ordinary consonant shapes in consonant clusters. There is no visual indication of missing vowel sounds other than the use of the mark itself.
This section maps Balinese consonant sounds to common graphemes in the Balinese orthography, grouped by native Balinese letters ( b ), or Sanskrit ( s ) or Kawi ( k ) derived forms, or extended with rerekan ( r ). Click on a grapheme to find other mentions on this page (links appear at the bottom of the page). Click on the character name to see examples and for detailed descriptions of the character(s) shown.
Sounds listed as 'infrequent' are allophones, or sounds used for foreign words, Sanskrit, etc.
Balinese is a script where different sequences of Unicode characters may produce the same visual result. Here we look at those related to vowels.
Five of the circumgraphs can be written as a single character, or as two characters, the second being ᬵ [U+1B35 BALINESE VOWEL SIGN TEDUNG] in all cases.
|ᭁ [U+1B41 BALINESE VOWEL SIGN TALING REPA TEDUNG]||ᭁ [U+1B3F BALINESE VOWEL SIGN TALING REPA + U+1B35 BALINESE VOWEL SIGN TEDUNG]|
|ᭃ [U+1B43 BALINESE VOWEL SIGN PEPET TEDUNG]||ᭃ [U+1B42 BALINESE VOWEL SIGN PEPET + U+1B35 BALINESE VOWEL SIGN TEDUNG]|
|ᭁ [U+1B41 BALINESE VOWEL SIGN TALING REPA TEDUNG]||ᭁ [U+1B3F BALINESE VOWEL SIGN TALING REPA + U+1B35 BALINESE VOWEL SIGN TEDUNG]|
|ᬻ [U+1B3B BALINESE VOWEL SIGN RA REPA TEDUNG]||ᬻ [U+1B3A BALINESE VOWEL SIGN RA REPA + U+1B35 BALINESE VOWEL SIGN TEDUNG]|
|ᬽ [U+1B3D BALINESE VOWEL SIGN LA LENGA TEDUNG]||ᬽ [U+1B3C BALINESE VOWEL SIGN LA LENGA + U+1B35 BALINESE VOWEL SIGN TEDUNG]|
The single code point per vowel sign is preferred, however the parts are separated in Unicode Normalisation Form D (NFD), and recomposed in Unicode Normalisation Form C (NFC), so both approaches are canonically equivalent.
Whichever approach is used, the vowel signs must be typed and stored after the consonant characters they surround, and in left to right order.
Three of the independent vowels can be written as a single character, or as two. Again, this always involves ᬵ [U+1B35 BALINESE VOWEL SIGN TEDUNG].
|ᬊ [U+1B0A BALINESE LETTER UKARA TEDUNG]||ᬊ [U+1B09 BALINESE LETTER UKARA + U+1B35 BALINESE VOWEL SIGN TEDUNG]|
|ᬒ [U+1B12 BALINESE LETTER OKARA TEDUNG]||ᬒ [U+1B11 BALINESE LETTER OKARA + U+1B35 BALINESE VOWEL SIGN TEDUNG]|
|ᬆ [U+1B06 BALINESE LETTER AKARA TEDUNG]||ᬆ [U+1B05 BALINESE LETTER AKARA + U+1B35 BALINESE VOWEL SIGN TEDUNG]|
The precomposed characters decompose in NFD, and reform again in NFC. It is generally recommended to use the precomposed character.
The following indicates the expected ordering of Unicode characters within a Burmese combining character sequence. The labels are those used for the Unicode Indic Syllabic Categories. Follow the links to see what characters are represented by a given label.
Ordering characters as shown above avoids potential ambiguities and maximises the likelihood of success when rendering the text.
Two combining marks have a specialist usage related to (usually religious) Sanskrit words.
ᬀ [U+1B00 BALINESE SIGN ULU RICEM] when combined with certain syllables becomes part of the Aksara Modre, or holy letters, which are used to write words in Sanskrit, usually part of prayers. This character only appears in Sanskrit texts, eg. ᬰᬶᬤ᭄ᬥᬀ siddham
ᬁ [U+1B01 BALINESE SIGN ULU CANDRA] appears only in holy letters, eg. ᬫᬁ mŋ̽ (Mang)When combined with independent vowel ạʷ it becomes a special symbol called omkara and is pronounced m. In this form it is used to represent god, eg. ᬒᬁᬱᬦ᭄ᬢᬶ᭞ᬱᬦ᭄ᬢᬶ᭞ᬱᬦ᭄ᬢᬶ᭞ᬒᬁ
The other symbols in the Balinese block are all musical symbols, and are not described here.
There is also a set of musical diacritical marks, which are not described here.
There is a set of Balinese digits, and they are used in the same way as ASCII digits in Latin text.
However, because many of the digit symbols are indistinguishable from other Balinese letters, numbers are typically surrounded by ᭞ [U+1B5E BALINESE CARIK SIKI], so that they are clearly distinguished, eg. ᬩᬮᬶ᭞᭓᭞ᬚᬸᬮᬶ᭞᭑᭙᭘᭒᭟
Balinese text is written horizontally, left to right.
bidi_class properties for characters in the Balinese orthography described here.
This section brings together information about the following topics: writing styles; cursive text; context-based shaping; context-based positioning; baselines, line height, etc.; font styles; case & other character transforms.
You can experiment with examples using the Balinese character app.
Balinese text is not cursive (ie. joined up like Arabic), however there is a significant amount of interaction between glyphs, and some joining, around consonant clusters.
The orthography has no case distinction, and no special transforms are needed to convert between characters.
Balinese text relies on OpenType rules to correctly position glyphs and shape them according to the surrounding text.
One major area where this applies is in the use of conjunct forms for consonant clusters. See the relevant sections for lists of stacked and conjoined shapes.
The following is a selection of other examples of contextual shaping and positioning.
After a stacked consonant, the vowel signs that would normally appear below a base are moved to the side, and the shape is modified.
|ᬓ᭄ᬭᬸ||᭄ + ᬭ + ᬸ [U+1B44 BALINESE ADEG ADEG + U+1B2D BALINESE LETTER RA + U+1B38 BALINESE VOWEL SIGN SUKU]||ᬓ᭄ᬭᬸᬦ|
|ᬓ᭄ᬬᬹ||᭄ + ᬬ + ᬹ [U+1B44 BALINESE ADEG ADEG + U+1B2C BALINESE LETTER YA + U+1B39 BALINESE VOWEL SIGN SUKU ILUT]|
ᬵ [U+1B35 BALINESE VOWEL SIGN TEDUNG] and the right side of ᭁ [U+1B41 BALINESE VOWEL SIGN TALING REPA TEDUNG] combine with several of the consonants. The table below shows 2 examples.
|ᬳᬵ||ᬳ + ᬵ [U+1B33 BALINESE LETTER HA + U+1B35 BALINESE VOWEL SIGN TEDUNG]|
|ᬭᬵ||ᬭ + ᬵ [U+1B2D BALINESE LETTER RA + U+1B35 BALINESE VOWEL SIGN TEDUNG]||ᬢᬭᬵ|
When a vowel sign and a syllable-final consonant mark appear over the same base, they are typically drawn side by side. Combinations such as rerekan and above-base vowels are typically stacked.§
|ᬓᬷᬃ||ᬷ + ᬃ [U+1B37 BALINESE VOWEL SIGN ULU SARI + U+1B03 BALINESE SIGN SURANG]||ᬢᬷᬃᬢ|
|ᬰᬶᬁ||ᬶᬁ [U+1B36 BALINESE VOWEL SIGN ULU + U+1B01 BALINESE SIGN ULU CANDRA]|
Grapheme clusters alone are not sufficient to represent typographic units in Balinese. Stacks and conjoined sequences are very common and must not be split apart by edit operations that visually change the text (such as letter-spacing, first-letter highlighting, and line breaking). For those operations one needs to segment the text using orthographic syllables, which string grapheme clusters together with
᭄ [U+1B44 BALINESE ADEG ADEG],
which has an Indic Syllabic Category of
The adeg-adeg is rendered visibly if it is not part of a consonant cluster, for example at the end of a word followed by a space.
Balinese doesn't use word boundaries for text segmentation, relying instead on grapheme boundaries because consonant clusters that span word boundaries are combined into stacks or conjoined forms.
Base Combining_mark* Joiner?
Combining marks may include zero or more of the following types of character:
Any of the above may occur after a consonant base. Independent vowel bases usually only have final consonant marks.
The following examples show a variety of grapheme clusters:
Click on the text version of these words to see more detail about the composition.
Note how grapheme clusters break up the conjuncts. This is not usually desirable (see orthographicS just below).
(Consonant Rerekan? Adeg_adeg)* Grapheme_cluster
Balinese commonly stacks or conjoins glyphs, to form conjuncts. The conjuncts represent consonant clusters, which can arise (a) where one phonetic syllable ends in a consonant letter and the following syllable begins with a consonant, or (b) when most medial consonants are written, since Balinese uses conjunct forms for sequences such as Cr-, Cy-, Cw-, Cry-, etc. The cluster of consonants that make up the conjunct are all encoded with adeg adeg between them (see clusters).
Balinese is unusual in that these conjuncts occur across word boundaries, so the word-final consonant of the first word may be stacked above the word-initial consonant of the second. See fig_kahananlankwasa2 for an example.
Grapheme clusters terminate after a sequence of marks containing an adeg adeg, but editorial operations that change the visual appearance of the text, such as letter-spacing, first-letter highlighting, line-breaking, and justification, should never split conjunct forms apart. For this reason, an alternative way of segmenting graphemes is needed. This may not apply, however, for some other operations such as cursor movement or backwards delete.
Where conjuncts appear, a typographic unit contains multiple grapheme clusters. The non-final grapheme clusters all end with ᭄ [U+1B44 BALINESE ADEG ADEG], and the final grapheme cluster begins with a consonant.
The following are examples.
Click on the text version of these words to see more detail about the composition.
Note that one of the characteristic features of the Indic category of Virama is that the adeg adeg is visible when not followed by a consonant, but invisible when a consonant does follow (creating a stack). This means that the adeg adeg sometimes participates in a simple grapheme cluster, but when followed by a consonant it becomes the 'glue' that creates an orthographic syllable.
On the infrequent occasions when an adeg adeg needs to be visible even though it is followed by another base, an invisible character must be added to prevent it joining with the following base. A zero-width space can achieve that.
Test in your browser. The words test units that equate to grapheme clusters only, and others that include conjuncts. First, the text is displayed in a contenteditable paragraph, then in a textarea. Results are reported for Gecko (Firefox), Blink (Chrome), and WebKit (Safari) on a Mac.
ᬢᬷᬃᬢ ᬓᬺᬰ᭄ᬡ ᬧᬾᬜ᭄ᬚᭀᬃ ᬢᬶᬫ᭄ᬧᬮ᭄ᬩ᭄ᬭ᭄ᬬᬕ᭄
Cursor movement. Move the cursor through the text.
Gecko steps through the whole text using grapheme clusters. It takes 2 or more steps (depending on the number of GCs) to get through the stacks, one grapheme cluster at a time. Blink and WebKit step through all words using the orthographic syllables described here (ie. they step over a stack and all associated combining characters in one jump).
Selection. Place the cursor next to a character and hold down shift while pressing an arrow key.
The behaviour is the same as for cursor movement.
Deletion. Forward deletion works in the same way as cursor movement. The backspace key deletes code point by code point, except for WebKit, which deletes one grapheme cluster at a time.
Line-break. See this test. The CSS sets the value of the
line-break property to
anywhere. Change the size of the box to slowly move the line break point.
Gecko appears to segment on orthographic syllable, per the description here, except for one case where the complex stack is split. WebKit and Blink appear to sometimes wrap inside stacks and other times not. It's not obvious why, but both segment in the same way.
Words are not separated by spaces, and in fact some word boundaries occur between stacked consonants. This means that segmentation for line-breaking, etc. uses orthographic syllables as a unit (see graphemes).
Balinese has its own punctuation marks.
᭝ [U+1B5D BALINESE CARIK PAMUNGKAH] is used as a colon, and ᭞ [U+1B5E BALINESE CARIK SIKI] and ᭟ [U+1B5F BALINESE CARIK PAREREN] are used as comma and full stop respectively.
Both ᭚ [U+1B5A BALINESE PANTI] and ᭛ [U+1B5B BALINESE PAMADA] are used to begin a section in text. At the end of a section, pasalinan᭟᭜᭟ and carik agung᭛᭜᭛ may be used (depending on what sign began the section).
Because there are no spaces between words, and because the end of one word and the beginning of another often form conjuncts (see fig_kahananlankwasa2), Balinese doesn't wrap at word boundaries. See graphemes for a description of the typographic units that are used for line break opportunities.
Unfortunately, modern browsers are often unable to detect appropriate break points for Balinese, so in the sample text at the beginning of this page U+200B ZERO WIDTH SPACE] is used at places where the line could be broken. Otherwise, the line would continue, unbroken off the right side of the page. [
In lontar texts where a word must be broken at the end of a line (always after a full syllable), the sign ᭠ [U+1B60 BALINESE PAMENENG] is inserted. This sign is not used as a word-joining hyphen; it is used only in linebreaking.
Observation: The images appear to show a gap before the pameneng.
In online use, an application would need to insert the pameneng, rather than the content author. As line-length is changed by stretching a window, or as content is added earlier in the same paragraph, the location of the word relative to the line edge will change. The insertion of pameneng is only appropriate at those instants when the appropriate sequence of characters appears at the line end.
For an application to use this correctly, it would need to know where the word boundaries are in the text, and then put this character at the end of the line only when a multisyllabic word is broken. This would require a dictionary to be applied to the text, since it would not be appropriate to insert the pameneng at the boundary of 2 words.
Observation: Aditya Bayu Perdana has found instances in lontar where ᬄ [U+1B04 BALINESE SIGN BISAH] is moved to the beginning of a line, alone, while a pameneng appears at the end of the previous line. If this is not just a scribal inconsistency (eg. it's not clear why you wouldn't put the bisah at the end of the line if there's space for a pameneng), it may indicate that this letter should not be a combining mark in Unicode; however, the usage needs to be verified first. See pictures.
As in almost all writing systems, certain punctuation characters should not appear at the end or the start of a line. The Unicode line-break properties help applications decide whether a character should appear at the start or end of a line.
Show (default) line-breaking properties for characters in the Balinese orthography.
The following list gives examples of typical behaviours for characters used in contemporary Balinese. Context may affect the behaviour of some of these and other characters.
Click on the Balinese characters to show what they are.
According to Sudewa, full justification is not a feature of Balinese text in traditional palm-leaf manuscripts, and only left, or occasionally centred or right alignment is relevant.
This section looks at ways in which spacing is applied between characters over and above that which is introduced during justification.
Balinese uses the so-called 'alphabetic' baseline, which is the same as for Latin and many other scripts.
fig_baselines shows glyphs from the Noto Serif fonts. The basic height of Balinese letters is the same as the Latin x-height, however extenders and combining marks, extend well beyond the Latin ascenders and descenders, creating a need for larger line heights.
This section is for any features that are specific to Balinese and that relate to the following topics: general page layout & progression; grids & tables; notes, footnotes, etc; forms & user interaction; page numbering, running headers, etc.
Traditionally, Balinese was written on thin, landscape palm-leaf manuscripts, called lontar.
The text was packed in without paragraph breaks.