Javanese orthography notes

There are currently difficulties in finding a workable Unicode font for Javanese. The original Noto Sans Javanese font uses shapes that are over-simplified for some Javanese users but the latest version is better, and the often recommended font, Tuladha Jejeg, is based on Graphite technology, and so only works on Firefox with Graphite rendering enabled (ie. not on Chrome, Safari, etc.). The default webfont for this page is a Tuladha Jejeg webfont, with a Noto fallback webfont.

Basic features

The Javanese script is an abugida, ie. each consonant contains an inherent vowel sound.

❯ basicV

Vowels Javanese has 2 inherent vowels, pronounced a or ɔ.

Post-consonant vowels are written using 5 combining marks (vowel signs). It is mandatory for 2 of the consonant+vowel sequences to be written using vocalics, rather than vowel signs.

There is 1 pre-base glyph and no circumgraphs, or composite vowel signs. Only one composite vowel sign is used in modern Javanese orthography, although more exist when writing Sanskrit or Kawi in this script. The composite vowel sign used in modern Javanese involves only 2 vowel signs, and places glyphs on either side of the base consonant(s).

Standalone vowel sounds are normally written written using A9B2 with a vowel sign attached. 5 independent vowel letters can be used when there is a need to distinguish normal words from proper nouns or foreign words.

Javanese has, and regularly uses, vocalics.

❯ consonantSummary

Consonants Javanese uses 20 basic consonant letters. Another 9 letters are used as honorifics, a little like capital letters, and 5 more used in Sanskrit words. Repertoire extensions for 9 non-native sounds are achieved by applying the cecak telu diacritic to characters.

Vowel absence Vowel absence is indicated using conjunct forms, or occasionally, a visible pangkon (virama), especially at the end of a word ending with a coda and not followed by another word. There are dedicated characters for medial consonants and codas.

A9C0 is used to kill the inherent vowel and to form conjuncts. Usually the pangkon is invisible, but it is rendered visibly when no other consonant follows, or occasionally in special circumstances, when it can be forced to appear using an invisible formatting character.

Conjunct forms occur as stacked consonants or conjoined pairs.

Medial consonants are written using one of 3 dedicated combining marks.

Syllable codas are most commonly written using an ordinary consonant followed by A9C0. If another consonant follows, the consonant shapes are combined into a conjunct form — even if the consonants represent the end of one word and the beginning of another ! Alternatively, codas may be written using one of 4 dedicated diacritics.

Numbers Javanese has a set of native digits.

Layout Javanese text runs left to right in horizontal lines. Words are not separated by spaces, however syllables may be separated by ZWSP, as long as they don't fall inside a stack. Spaces may be used to separate phrases. There is no case distinction.

Punctuation is mostly native.

Stacked consonants and conjoined pairs span word boundaries. This means that text must be wrapped at orthographic syllable boundaries, and not at word boundaries. A kind of hyphenation occurs, using A9BA at the line end and new line start.

Notable features

no circumgraphs — vowel signs are doubled up (unlike Balinese)
conjuncts form across word boundaries
independent vowels distinguish proper nouns or foreign words from ordinary words
vocalic letters exist, but no vocalic vowel signs
consonants classed as basic, murda, and mahaprana
dedicated medial consonants (unlike Balinese)

Phonology

These are sounds for the Javanese language.

Click on the sounds to see where else in the document they are referred to.

Sounds in parentheses are non-native or allophones. Source Wikipedia.

Vowel sounds

i u e o are pronounced ɪ ʊ ɛ ɔ, respectively in closed syllables.wl§#Vowels

e and o are pronounced ɛ and ɔ, respectively, in open syllables when followed by one of i u. Otherwise they may be pronounced ə.wl§#Vowels

In the standard dialect of Surakarta, a is pronounced ɔ in word-final open syllables.wl§#Vowels

Diphthongs ai and aw are not used in modern Javanese.ws§#Swara

Consonant sounds

	labial	dental	alveolar	post- alveolar	retroflex	palatal	velar	glottal
stops	p b	t d			ʈ ɖ		k g	ʔ
affricates				t͡ʃ
aspirated				t͡ʃʰ
fricatives			s					h
nasals	m		n			ɲ	ŋ
approximants	w		l			j
trills/flaps			r

Most sources (eg. Daniels) imply that the voiced plosives are voicedd. Comriec describes the voiceless plosives as virtually unaspirated, and that the voiced plosives are devoiced at the end of a word. Wikipedia says that the Javanese "voiced" phonemes are in fact voiced voiceless, with breathy voice on the following vowel, the difference being described as stiff voice versus slack voice.wl§#Consonants Here we use the more common transcription.

The sound ʔ appears between words ending with a and the suffix ake, eg. lunga+ake -> lungaʔake.c

Tone

Javanese is not a tonal language.

Structure

Root words are typically disyllables of the form Cˡ V Cˡ V Cˡ, where Cˡ represents an optional consonant or consonant cluster, and V represents a vowel. Most commonly, this represents CVCVC, followed by CVCCVC.c

A Javanese phonetic syllable can begin with a vowel, a consonant, or a consonant cluster.

Syllable-initial consonant clusters usually have either r or j in the second position, or may have both.

According to Comrie, word-initial consonant clusters may involve prenasalised sounds such as mb, nd, ndh, nj or nng, which appear to be written using unconnected consonants side by side.c

eg.

ꦩꦧꦸꦫꦸ

Syllable-final consonants are usually a single sound.

In the orthography, phonetic syllable boundaries don't always coincide with the typographic units used. See segmentation for details.

Vowels

	Post-consonant	Standalone
Plain	ꦶ,ꦸ	ꦲꦶ,ꦆ,ꦲꦸ,ꦈ
	◌ꦺ,◌ꦺꦴ	ꦲꦺ,ꦌ,ꦲꦺꦴ,ꦎ
	ꦼ	ꦲꦼ
	◌ꦺ◌,ⓘ,◌ꦺꦴ◌
	ⓘ	ꦲ,ꦄ
Vocalics	ꦽ	ꦉ,ꦊ

ⓘ represents an inherent vowel. A following dotted circle indicates a closed syllable.

Inherent vowels

ꦏ ka ~ kɔ

Javanese has two possible inherent vowel sounds: a and ɔ. The choice of inherent vowel can depend on the speaker's dialect: speakers of Western Javanese dialects tend to pronounce the inherent vowel as a, while those of Eastern Javanese prefer ɔ.ws§#Form So ka/kɔ is written by simply using the consonant letter.

eg.

ꦒꦒꦤ

ꦒ,ꦒ,ꦤ

ꦥꦩꦭꦶ

ꦥ,ꦩ,ꦭ,ꦶ

ꦏꦸꦭꦮꦂꦒ

ꦏꦸ,ꦭ,ꦮꦂ,ꦒ

Wikipedia describes the following rules by Wewaton Sriwedari for determining the inherent vowel of a letter:ws§#Form

A letter stands for a syllable with the vowel ɔ if the previous letter contains diacritics.
A letter stands for a syllable with the vowel a if the following character contains diacritics.
The first letter of a word normally has the ɔ vowel, unless it precedes two other letters without diacritics, in which case the first letter has the a vowel.

Since Javanese consonants normally include an inherent vowel, the orthography has ways to indicate a consonant that is not followed by a vowel sound. See novowel.

Post-consonant vowels

Vowels that follow a consonant are written using 5 vowel signs, all combining marks. It is mandatory for 2 of the consonant+vowel sequences to be written using vocalics, rather than vowel signs.

There is 1 pre-base glyph and no circumgraphs. Only one multipart vowel is used in modern Javanese orthography, although more exist when writing Sanskrit or Kawi in this script. The multipart vowel used in modern Javanese involves only 2 vowel signs, and places glyphs on either side of the base consonant(s).

All vowel signs are typed and stored after the base consonant, and the glyph rendering system takes care of the positioning at display time. Conjuncts are treated as indivisible units when it comes to rendering vowel signs, meaning that pre-base vowel signs and left-side glyphs of circumgraphs are rendered before the conjunct as a whole (see prebase).

Three of the vowel signs are spacing marks, meaning that they consume horizontal space when added to a base consonant.

Vowel signs

ꦏꦶ ki

Javanese uses the following dedicated combining marks for vowels. They may be used on their own, or in combination with others (see compositeV).

ꦶ,ꦸ,ꦺ,ꦼ,ꦺꦴ

Additional information about archaic forms and variants of these characters and others mentioned below can be found in the character notes document.

A9B4 was originally a length mark, but in modern Javanese it is only used in the combination described in the section compositeV.

Modern Javanese doesn't write the sounds rə and lə using the consonant+vowel combinations *ꦫꦼ U+A9AB LETTER RA + U+A9BC VOWEL SIGN PEPET or *ꦭꦼ U+A9AD LETTER LA + U+A9BC VOWEL SIGN PEPET. Vocalic letters are used instead.ws§#Swara

eg.

ꦊꦩꦃꦊꦩ꧀ꦧꦸꦠ꧀

When a vowel sign follows a subjoined consonant it appears above the stack.

ꦤ꧀ꦛꦶ — The word kanthi, where the i appears above the n.

Four more vowel signs are not used in modern Javanese.

ꦷ,ꦹ,ꦵ,ꦻ

Standalone vowels

Javanese has two ways to represent standalone vowels.

Vowel signs

ꦲꦶ,ꦲꦸ,ꦲꦺ,ꦲꦼ,ꦲ

The normal approach combines a vowel sign with A9B2.

eg.

ꦲꦶꦏꦶ

ꦧꦲꦸꦱꦱ꧀ꦠꦿ

ꦩꦲꦺꦴꦱ꧀

Without a vowel sign the letter A9B2 represents a.

eg.

ꦲꦕꦫ

However, it may alternatively represent the sound ha. The reading is ambiguous, eg. compare the previous example with this:

ꦲꦤꦕꦫꦏ

The same applies for other combinations of this base letter and vowel sign.

Independent vowels

ꦆ,ꦈ,ꦌ,ꦎ,ꦄ

There are 8 independent vowel letters in the Javanese block, of which 5 are used in modern text.

eg.

ꦎꦏ꧀ꦠꦺꦴꦧꦼꦂ

ꦄꦫꦶꦱ꧀ꦠꦺꦴꦠꦼꦭꦼꦱ꧀

The independent vowel letters are used in Javanese to distinguish proper nouns or foreign words from ordinary wordsu, eg. compare the following two words, which include an adjective and a personal name (both have the same pronunciation):

eg.

ꦲꦪꦸ

ꦄꦪꦸ

Other forms

Modern Javanese only uses short vowels. Other characters or sequences of characters were used for long vowels and diphthongs in the past or for other languages.

ꦇ,ꦅ,ꦈꦴ,ꦄꦴ, ,ꦍ,ꦎꦴ

Three of the above use A9B4 to produce long sounds.

Unlike Javanese, Kawi uses A985 and A986 for short and long, respectively.d

Vowel components

This section describes various vowel components and behaviours associated with this orthography.

Composite vowel signs

ꦺꦴ

In Javanese, unlike many other scripts, including Balinese, when a vowel is represented by multiple glyphs either side of a base character two separate combining marks need to be added to the base. Javanese has no circumgraphs (ie. a single code point that places glyphs around the base).

Only one such composite vowel sign is used for modern Javanese, A9BA A9B4, which represents the sound o. Both characters are typed and stored after the base character, and should be in the order shown.

show composition

ꦥꦸꦭꦺꦴ

A two-part vowel associated with a consonant cluster appears before and after the whole cluster, whether it is conjoined or stacked. This is a reminder that vowel signs are applied to the orthographic syllable, rather than to a single letter. In fig_composite_word_break this actually extends across a word boundary, since the last and first letters of the adjoining words form a conjunct cluster. This means that the pre-base part of the vowel sign appears to be within the previous word.

show composition

ꦩꦔꦤ꧀ꦱꦺꦴꦠꦺꦴ

A number of archaic vowels are also represented by combinations of the basic vowel signs, but are not used in modern Javanese.

ꦻꦴ,ꦼꦵ,ꦼꦴ

Show details about vowel glyph positioning.

The following list shows where vowel signs are positioned around a base consonant to produce vowels, and how many instances of that pattern there are. The figure after the + sign represents combinations of Unicode characters,

2 pre-base, eg. ꦏꦺ ke
2 post-base, eg. ꦏꦴ kː
3 superscript, eg. ꦏꦶ ki
2 subscript, eg. ꦏꦸ ku
+2 pre+post-base, eg. ꦏꦺꦴ keː

Pre-base vowel sign

ꦺ

One vowel sign appears to the left of the base consonant letter or cluster in modern Javanese.

eg.

ꦲꦔꦺꦭ꧀

This is a combining mark that is always typed and stored after the base consonant(s), ie. the codepoints follow the order in which the items are pronounced. The rendering process places the glyph before the base consonant without changing the code points. The following shows the sequence of code points that make up the word just above.

ꦲ,ꦔ,ꦺ,ꦭ,꧀

These vowel signs do not split a conjunct. This means that a word with a consonant cluster at the start separates the pre-base vowel from the position where it is pronounced by more than one consonant character.

eg.

ꦒꦿꦺꦗ

ꦒ,ꦿ,ꦺ,ꦗ

A similar vowel sign is no longer used.

ꦻ

Vowel sounds mapped to characters

This section maps Javanese vowel sounds to common graphemes in the Javanese orthography.

Sounds listed as 'infrequent' are allophones, or sounds used for foreign words, etc. Light coloured characters occur infrequently.

dependent ꦶ

standalone ꦲꦶ

independent ꦆ

ꦼꦴ Use for Sundanese.

dependent ꦸ

standalone ꦲꦸ

independent ꦈ

dependent ꦺ

standalone ꦲꦺ

independent ꦌ

dependent ꦺꦴ

standalone ꦲꦺꦴ

independent ꦎ

dependent ꦼ See also vocalics.

standalone ꦲꦼ

independent ꦺ

Inherent vowel eg. ꦏꦿꦩ

dependent ꦺꦴ

Inherent vowel eg. ꦏꦔꦼꦤ꧀

standalone ꦲ

independent ꦄ

Vowel absence

ꦏ꧀ k

Vowel absence principally occurs either when a consonant is a syllable coda, or when a consonant is part of a consonant cluster.

Given that consonants normally include an inherent vowel, the orthography needs a way to indicate when a consonant is not followed by a vowel.

Follow these links for more information.

A visible pangkon.
A conjunct. There are 2 possibilities here:
1. Stacked consonants, where the non-initial (subjoined) consonant appears below the initial, often with a different shape from normal.
2. Conjoined consonants, where consonants sit side-by-side but the non-initial consonant has a slightly different form than usual.
A regular initial consonant followed by a dedicated medial consonant mark.
A dedicated final consonant mark followed by a regular consonant.

Word boundaries. Conjuncts span word boundaries. Because there are no spaces between words, consonants with no following vowel at the end of one word and a consonant at the beginning of the next create a cluster.

Stacks and conjoined sequences are not normally split at line ends (see word and linebreak for the ramifications of this). It means that some words cannot be wrapped at word boundaries.

ꦲꦏ꧀ꦲꦏ꧀ꦏꦁꦥꦝ — In the sequence hak-hak-kang-pa-da the combination k-h is conjoined, and k-k is stacked.

Visible pangkon

Where no letter follows a consonant and the vowel is silent (for example at the end of a sentence or isolated word, or before a number) a visible A9C0 is used to indicate that the inherent vowel is suppressed, eg. ꦏ꧀ explicitly represents just the sound k, as can be seen at the end of the following word:

ꦏꦼꦧꦏ꧀

The same pangkon is also used to produce conjunct forms for consonant clusters (stacking or conjoined shapes), in which case it is invisible (see clusters). Because Javanese commonly combines the last character of one word and the first of another into a conjunct, the pangkon is often not visible after words that end with a consonant but that occur within a sentence.

show composition

ꦏꦿꦸꦥꦸꦏ꧀

There are some exceptions, especially where the pangkon may be used to disambiguate words.

eg.

ꦧꦶꦱ꧀ꦠꦿꦤ꧀ꦱ꧀ꦗꦏꦂꦠ

No pangkon is needed between an onset consonant and a medial consonant, or after a dedicated final consonant.

The Javanese section of the Unicode Standard doesn't indicate how to force the pangkon to remain visible, but the Balinese section recommends the use of 200C (ZWNJ) after the adeg-adeg in order to prevent conjunct formation. However, not many people understand the function of ZWNJ or can access it easily from the keypad. It also doesn't introduce line-break opportunities. A better solution may be to use 200B (ZWSP). This character is needed anyway on most systems in order to allow line-breaking, and it appears to work equally well for this.

Conjuncts

Conjunct formation

In Unicode, the stacking and conjoining behaviour is achieved by adding A9C0 between consonants. The font hides the glyph automatically when a stacked conjunct is formed.

ꦱ,꧀,ꦠ,ꦱ꧀ꦠ

See a table of 2-consonant clusters.
The table allows you to test results for various fonts.

Stacking

To represent consonants without intervening vowels, the non-initial consonant is typically drawn below the initial consonant.

Many of the subjoined forms are just slightly smaller versions of the original, some with small additions, but several have very different shapes altogether, most of which ligate with the cluster initial consonant by joining strokes.　

ꦤ,꧀,ꦢ,ꦤ꧀ꦢ

ꦱ,꧀,ꦤ,ꦱ꧀ꦤ

ꦱ,꧀,ꦏ,ꦱ꧀ꦏ

Examples of stacked conjuncts.

Three consonants can be combined in a cluster, but the third consonant may be one of the medial ya or ra medial combining marks, eg.

ꦩ,꧀,ꦧ,꧀,ꦭ,ꦩ꧀ꦧ꧀ꦭ

ꦤ,꧀,ꦢ,ꦿ,ꦤ꧀ꦢꦿ

Stacked conjuncts with 3 components.

Show consonants in their normal and subjoined forms

Shows the same consonant before and after a pangkon.

basic (nglegéna) glyphs

ꦧ꧀ꦧ,ꦠ꧀ꦠ,ꦢ꧀ꦢ,ꦛ꧀ꦛ,ꦝ꧀ꦝ,ꦏ꧀ꦏ,ꦒ꧀ꦒ,ꦕ꧀ꦕ,ꦗ꧀ꦗ,ꦩ꧀ꦩ,ꦤ꧀ꦤ,ꦚ꧀ꦚ,ꦔ꧀ꦔ,ꦮ꧀ꦮ,ꦫ꧀ꦫ,ꦭ꧀ꦭ,ꦪ꧀ꦪ

murda glyphs

ꦨ꧀ꦨ,ꦡ꧀ꦡ,ꦑ꧀ꦑ,ꦓ꧀ꦓ,ꦖ꧀ꦖ,ꦯ꧀ꦯ,ꦟ꧀ꦟ,ꦘ꧀ꦘ,ꦬ꧀ꦬ

mahaprana glyphs

ꦣ꧀ꦣ,ꦜ꧀ꦜ,ꦞ꧀ꦞ,ꦙ꧀ꦙ

Conjoined consonants

In conjoined clusters, the consonant glyphs remain side by side, but the non-initial consonant is reduced on the left side. fig_conjoined_p shows an example in the word ꦱꦩ꧀ꦥꦸꦤ꧀.

show composition

ꦱꦩ꧀ꦥꦸꦤ꧀

ꦩ,꧀,ꦥ,ꦩ꧀ꦥ

ꦏ,꧀,ꦱ,ꦏ꧀ꦱ

ꦭ,꧀,ꦲ,ꦭ꧀ꦲ

Examples of conjoined conjuncts.

The conjoined A9B1 is unusual in that it also adds a glyph (resembling the vowel sign A9B8) below the initial consonant. This helps distinguish it from the conjoined p. See fig_conjoined_s for an example in the word ꦲꦏ꧀ꦱꦫ.

show composition

ꦲꦏ꧀ꦱꦫ

Show consonants in their normal and subjoined forms

Shows the same consonant before and after a pangkon.

basic (nglegéna) glyphs

ꦥ꧀ꦥ,ꦱ꧀ꦱ,ꦲ꧀ꦲ,ꦉ꧀ꦉ

murda glyphs

ꦦ꧀ꦦ

mahaprana glyphs

ꦰ꧀ꦰ

Consonants

	basic Javanese letters	murda	mahaprana	with cecak telu
Onsets	ꦥ,ꦧ,ꦠ,ꦢ,ꦛ,ꦝ,ꦏ,ꦒ	ꦦ,ꦨ,ꦡ,ꦑ,ꦓ	ꦜ,ꦣ,ꦞ	ꦔ꦳
	ꦕ,ꦗ	꧀ꦖ	ꦙ
	ꦱ,ꦲ	ꦯ	ꦰ	ꦥ꦳,ꦢ꦳,ꦗ꦳,ꦱ꦳,ꦏ꦳,ꦒ꦳,ꦲ꦳
	ꦩ,ꦤ,ꦚ,ꦔ	ꦟ,ꦘ
	ꦮ,ꦫ,ꦭ,ꦪ
Medials	ꦿ,ꦽ,ꦾ
Finals	ꦃ,ꦀ,ꦁ,ꦂ

Basic (nglegéna) consonants

Only 20 of the consonants in the Javanese Unicode block are used for pure Javanese language text. Some others (murda) are used as a kind of uppercase letter, but the remainder are used for words derived from Sanskrit or Kawi, or are archaic forms.

The characters listed here and in the following sections also have conjoined and/or subjoined forms, which may differ significantly from those shown here. See clusters for a list of glyph shapes.

ꦥ,ꦧ,ꦠ,ꦢ,ꦛ,ꦝ,ꦏ,ꦒ,ꦕ,ꦗ,ꦱ,ꦲ,ꦩ,ꦤ,ꦚ,ꦔ,ꦮ,ꦫ,ꦭ,ꦪ

A9B2 represents either ha or the standalone vowel a. It is also used as a carrier for vowel signs in order to represent standalone vowels (see standalone).

Murda letters

Murda forms can be viewed as a kind of capital letter for proper nouns (not sentence initial letters), used as honorifics. They are used to replace an ordinary letter form in the first syllable of the name. However, not all letters have a murda form, so if there is no letter in the first syllable that has a murda form, one is used for the next syllable in the name that has one.

Highly respected names may be all 'capitalized' to the extent that the corresponding murda are available.

ꦦ,ꦨ,ꦡ,ꦑ,ꦓ,ꦖ,ꦯ,ꦟ,ꦘ

A996 is only attested as a subjoined form, ꧀ꦖ The non-subjoined form shown just above list is a modern-day reinvention.ws§#Wyanjana

A9AF is a rare letter. It can be used in the sequence ꦯ͜ꦌ̈ to represent the Chinese sound se, and in the sequence ꦯ꦳ꦾꦺꦴ to represent the Chinese syo.

A9AC not used in modern text, and also not widely known, was used historically by some writers to address royal figures..ws§#Wyanjana

Examples. The uppercase letters in the transcriptions indicate murda letters.

ꦦꦑꦸꦨꦸꦮꦟ

ꦨꦶꦩ

ꦫ꧈ꦩ꧈ꦯꦸꦭꦂꦠ

Mahaprana letters

These are letters that are not basic forms, nor are repurposed as murda consonants.

ꦣ,ꦜ,ꦞ,ꦙ,ꦰ

Mahaprana forms were originally aspirated consonants used in Sanskrit and Kawi transliterations (mahaprana means aspirated). They are rarely, if ever, found in modern text.

eg.

ꦒꦼꦣꦺꦴꦁ

ꦧ꧀ꦊꦣꦸꦒ꧀

Repertoire extension

The following combinations, called aksara rékan (ꦲꦏ꧀ꦱꦫꦫꦺꦏꦤ꧀), are used to represent foreign sounds. There may be some variance around which combinations produce which sounds.ws§#R%C3%A9kan

ꦐ,ꦔ꦳,ꦥ꦳,ꦢ꦳,ꦗ꦳,ꦱ꦳,ꦏ꦳,ꦒ꦳,ꦲ꦳

Javanese uses A9B3 with a similar consonant to represent most foriegn sounds, initially those from Arabic, but then also those from Dutch, Indonesian, and English.

eg.

ꦗ꦳ꦏꦠ꧀

ꦥ꦳ꦗꦂ

A990 is used for writing q in Sasak.

When consonants are subjoined there can be some ambiguity about which consonant the cecak telu applies to. For example, the following look identical:ꦏ꦳꧀ꦗ kˑ͓ʤ kza ꦏ꧀ꦗ꦳ k͓ʤˑ xja

Wikipedia has a set of Chinese sounds that are represented using some combining characters from a non-Javanese block.ws§#Additional_Aksara

Onsets

Three dedicated combining characters represent medial consonants (wyanjana), making it easy to tell that the consonant is part of a syllable-initial cluster and not the start of a new syllable.

ꦿ,ꦽ,ꦾ

eg.

ꦏꦿꦥꦾꦏ꧀

ꦱꦽꦔꦺꦔꦺ

ꦧꦒꦾ

Balinese doesn't have these dedicated medial consonants.

Codas

Word-final consonant sounds with no following consonant may be represented by ordinary consonant characters, followed by a visible A9C0 character.

eg.

ꦕꦼꦝꦏ꧀

ꦏꦔꦼꦤ꧀

If another word or consonant does follow the word-final consonant, the pangkon is still used, but becomes invisible and results in the stacking of the two consonants. (See clusters.)

If the consonant is followed by another consonant, either in the middle or at the end of a word, the adeg adeg code point remains, but becomes invisible as the consonant shapes combine vertically or horizontally (see clusters).

eg.

ꦏ꧀ꦭꦩ꧀ꦧꦶ

Combining marks

There is also a set of dedicated combining characters (seisigeg) that dispense with the need for the pangkan.

Four syllable-final consonant sounds are represented using combining characters.

ꦃ,ꦀ,ꦁ,ꦂ

eg.

ꦱꦼꦏꦺꦴꦭꦃ

ꦤꦂꦥ

ꦥꦼꦠꦼꦁ

When both a vowel sign and one of these marks appear above the same base, they are positioned side by side.

eg.

ꦏꦼꦂꦠ

ꦏꦚ꧀ꦕꦶꦁ

Consonant sounds to characters

This section maps Javanese consonant sounds to common graphemes in the Javanese orthography.

Conjunct forms are shown to the right of each line.

Sounds listed as 'infrequent' are allophones, or sounds used for foreign words, etc. Light coloured characters occur infrequently.

꧀ꦥ basic ꦥ

꧀ꦦ murda ꦦ

꧀ꦧ basic ꦧ

꧀ꦨ murda ꦨ

꧀ꦠ basic ꦠ

꧀ꦡ murda ꦡ

t͡ʃ

꧀ꦕ basic ꦕ

꧀ꦖ murda ꦖ Only found in non-initial positions as a subjoined consonant.

꧀ꦢ basic ꦢ

d͡ʒ

꧀ꦗ basic ꦗ

꧀ꦙ mahaprana ꦙ

꧀ꦛ basic ꦛ

꧀ꦜ mahaprana ꦜ

꧀ꦝ basic ꦝ

꧀ꦣ mahaprana ꦣ

꧀ꦞ mahaprana ꦞ

꧀ꦏ basic ꦏ

꧀ꦑ murda ꦑ

꧀ꦒ basic ꦒ

꧀ꦓ murda ꦓ

extended ꦔ꦳ Used to represent foreign sounds, especially Arabic.

extended ꦥ꦳ Used to represent foreign sounds, especially Arabic.

extended ꦢ꦳ Used to represent foreign sounds, especially Arabic.

꧀ꦱ basic ꦱ

꧀ꦯ murda ꦯ

꧀ꦰ mahaprana ꦰ

extended ꦗ꦳ Used to represent foreign sounds, especially Arabic.

extended ꦱ꦳ Used to represent foreign sounds, especially Arabic.

extended ꦏ꦳ Used to represent foreign sounds, especially Arabic.

extended ꦒ꦳ Used to represent foreign sounds, especially Arabic.

extended ꦲ꦳ Used to represent foreign sounds, especially Arabic.

꧀ꦲ basic ꦲ

coda ꦃ

꧀ꦩ basic ꦩ

coda ꦀ Only used for OM.

꧀ꦤ basic ꦤ

꧀ꦟ murda ꦟ

꧀ꦚ basic ꦚ

꧀ꦘ murda ꦘ

꧀ꦔ basic ꦔ

coda ꦁ

꧀ꦮ basic ꦮ

꧀ꦫ basic ꦫ

medial ꦿ

coda ꦂ

rə

medial ꦽ

꧀ꦉ vocalic ꦉ

꧀ꦭ basic ꦭ

lə

꧀ꦊ vocalic ꦊ

꧀ꦪ basic ꦪ

medial ꦾ

Glyph shaping & positioning

You can experiment with examples using the Javanese character app.

Context-based shaping & positioning

Shaping

Glyph shaping is required for Javanese. One principle area is that of subjoined or postfixed consonants, which often interact typographically with the preceding consonant.

Not all fonts show the same shaping behaviours.

In fig_k_joins, the three syllables, each containing a k-k stack, show how the font adapts the subjoined A98F at the bottom right according to what follows it.

ꦏ꧀ꦏꦿ ꦏ꧀ꦏ ꦏ꧀ꦏꦾ — Adaptations of the lower right of a subjoined k.

The following two syllables show how the font changes the shape of A9BF to match the depth of the syllable.

ꦏꦿ ꦏ꧀ꦏꦿ — Adaptations of medial RA shape to suit the context.

The fig_ku_shaping the font shows different renderings of the u vowel sign after the second character in a consonant cluster. In kru the lines suggest that the medial r is drawn after the u, although it is pronounced the other way around.

ꦏ꧀ꦏꦸ ꦏ꧀ꦰꦸ ꦏꦿꦸ — Adaptations of -u in kku, ksu, and kru.

Note that the middle cluster contains only one u character. The similar-looking shape in the middle of the word is just part of the kS conjoined shape. The rightmost cluster uses a ligature for -ru, where the chakra appears to be drawn after the u, although actually stored before it.

Context-based positioning

Obviously the principle of subjoining consonants requires rules about positioning, and those rules need to be disregarded for combinations where the second character of a cluster is not subjoined (though it usually changes shape).

In the following example we see ka with cecak telu on the left. In the middle syllable cecak telu has shifted slightly to the left to make room for the other diacritic. In the right-hand syllable the cecak telu has both moved and reduced in size to fit with the other diacritic.

ꦏ꦳ ꦏ꦳ꦂ ꦏ꦳ꦼ — The position and size of cecak telu depends on its neighbours.

Another example of the need for special positioning occurs when a vowel sign is pronounced after a subjoined consonant but appears above the previous consonant in the stack (see the example earlier).

Letterform slopes, weights, & italics

Observation: Numerous examples of slanted text exist in the publication Kajawen in 1933. However, they are used to distinguish blocks of text from other blocks, but not inline.

Examples of use include separate panels floated alongside the main flow of text (the whole panel is slanted), subheadings or sometimes headings, figure captions, and by-lines.

Alternate lines in this text are slanted. (Click to enlarge.)

Typographic units

Word boundaries

Words are not separated by spaces. (Though spaces may be used to separate phrases, see phrase).

As mentioned in the previous section, Javanese is one of a small number of scripts where an initial consonant for a word may be subjoined below the last consonant of a preceding word. Since stacks are never broken, typographic operations such as line-breaking therefore split the text at orthographic syllable boundaries rather than words (see graphemes).

ꦥꦔꦤ꧀ꦢꦶꦏ — When the word ꦥꦔꦤ pŋn paŋan is followed by ꦢꦶꦏ dik dika, the initial letter of dika is subjoined below the last letter of paŋan, and the vowel sign in dika appears above the stack.

Graphemes

Grapheme clusters alone are not sufficient to represent typographic units in Javanese. Stacks and conjoined sequences are very common and must not be split apart by edit operations that visually change the text (such as letter-spacing, first-letter highlighting, and line breaking). For those operations one needs to segment the text using orthographic syllables, which string grapheme clusters together with A9C0, which has an Indic Syllabic Category of Virama.

The pangkon is rendered visibly if it is not part of a consonant cluster, for example at the end of a word followed by a space.

Javanese doesn't use word boundaries for text segmentation, relying instead on grapheme boundaries because consonant clusters that span word boundaries are combined into stacks or conjoined forms.

Grapheme clusters

Base Combining_mark*

Combining marks may include zero or more of the following types of character:

Cecak telu (nukta) [1] (see extendedC)
Medial consonants [3] (see onsets)
Vowel signs [5] (see plainV)
Final consonants [4] (see finals)
Virama (pangkon) [1] (see novowel)

Javanese grapheme clusters that include a syllable nucleus usually begin with a consonant, but can also begin with an independent vowel, and occasionally numbers also constitute a base (and by some reports may be followed by combining marks).

Syllable codas may be represented by a combining mark, or may be incorporated into a consonant cluster, but they may also be written using a sequence of consonant letter (possibly including a cecak telu) followed by a pangkon, which is visible if no base immediately follows it, eg.

The following examples show a variety of grapheme clusters:

Click on the text version of these words to see more detail about the composition.

	ꦠꦶꦂꦠ
	ꦏꦿꦶꦪ
	ꦱꦼꦏꦺꦴꦭꦃ
	ꦏꦔꦼꦤ꧀
	ꦏꦤ꧀ꦛꦶ
	ꦲꦏ꧀ꦱꦫ

Note how grapheme clusters break up the conjuncts in the last 2 examples. This is not usually desirable (see orthographicS just below).

Larger typographic units

(Base Cecak_telu? Pangkon)* Grapheme_cluster

Javanese commonly stacks or conjoins glyphs, to form conjuncts. The conjuncts represent consonant clusters, which can arise where one phonetic syllable ends in a consonant letter and the following syllable begins with a consonant. (Unlike Balinese, medial consonants – for sequences such as Cr-, Cy-, Cw-, Cry-, etc – are combining characters rather than conjunct forms.) The cluster of consonants that make up the conjunct are all encoded with adeg adeg between them (see clusters).

Javanese is unusual in that these conjuncts occur across word boundaries, so the word-final consonant of the first word may be stacked above the word-initial consonant of the second. See fig_pangandika for an example.

Grapheme clusters terminate after a sequence of marks containing a pangkon, but editorial operations that change the visual appearance of the text, such as letter-spacing, first-letter highlighting, line-breaking, and justification, should never split conjunct forms apart. For this reason, an alternative way of segmenting graphemes is needed. This may not apply, however, for some other operations such as cursor movement or backwards delete.

Where conjuncts appear, a typographic unit contains multiple grapheme clusters. The non-final grapheme clusters all end with A9C0, and the final grapheme cluster begins with a consonant.

The following are examples. The first 2 examples were shown in the previous section: here the conjunct is treated as a single typographic unit.

Click on the text version of these words to see more detail about the composition.

	ꦏꦤ꧀ꦛꦶ
	ꦲꦏ꧀ꦱꦫ
	ꦏꦭꦶꦱ꧀ꦠꦿꦶꦏꦤ꧀

Note that one of the characteristic features of the Indic category of Virama is that the pangkon is visible when not followed by a consonant, but invisible when a consonant does follow (creating a stack). This means that the pangkon sometimes participates in a simple grapheme cluster, but when followed by a consonant it becomes the 'glue' that creates an orthographic syllable.

On the infrequent occasions when a pangkon needs to be visible even though it is followed by another base, an invisible character must be added to prevent it joining with the following base. A zero-width space can achieve that.

Browser behaviour

Test in your browser. The words test units that equate to grapheme clusters only, and others that include conjuncts. First, the text is displayed in a contenteditable paragraph, then in a textarea. Results are reported for Gecko (Firefox), Blink (Chrome), and WebKit (Safari) on a Mac.

ꦠꦶꦂꦠ ꦏꦿꦶꦪ ꦏꦤ꧀ꦛꦶ ꦲꦏ꧀ꦱꦫ

Cursor movement. Move the cursor through the text.
Gecko steps through the whole text using grapheme clusters. It takes 2 or more steps (depending on the number of GCs) to get through the stacks, one grapheme cluster at a time. Blink and WebKit step through all words using the orthographic syllables described here (ie. they step over a stack and all associated combining characters in one jump).

Selection. Place the cursor next to a character and hold down shift while pressing an arrow key.
The behaviour is the same as for cursor movement.

Deletion. Forward deletion works in the same way as cursor movement. The backspace key deletes code point by code point, except for WebKit, which deletes one grapheme cluster at a time.

Line-break. See this Balinese test. The CSS sets the value of the line-break property to anywhere. Change the size of the box to slowly move the line break point.
Gecko appears to segment on orthographic syllable, per the description here, except for one case where the complex stack is split. WebKit and Blink appear to sometimes wrap inside stacks and other times not. It's not obvious why, but both segment in the same way.

Punctuation & inline features

Phrase & section boundaries

Javanese uses a set of native punctuation marks.

phrase	0020 A9C8 A9C7
sentence	A9C9 A9C8 after A9C0
paragraph	A9CB
section	Ditto.
general divider	A9CA

If the final word in a phrase ends with A9C0, a space alone is sufficient to indicate a phrase boundary. Otherwise, A9C8 is used.

A9C7 is equivalent to a colon.

The sequence A9C0 A9C8 indicates a sentence boundary. If there is no pangkon, A9C9 is used.

All of the above punctuation marks are optionally followed by a space, if they occur inside a paragraph.

A paragraph and or a section typically begins with A9CB. This punctutation is also used before other short runs of text, such as subtitles, list items, etc.

A9CA is a general divider. See also quotations.

A paragraph containing almost all of the punctuation marks described here.

Letters

꧋,꧆,꧅,꧄,꧃,꧉

Letters may begin with ꧋꧆꧋ if the writer doesn't want to indicate a distinction regarding age or rank between themselves and the reader. Otherwise, for more formal letters, they can choose one of three alternatives provided as single characters in the Javanese Unicode block.

A9C5 is used for letters to people of greater age or higher rank,
A9C4 for people of equal age/rank, and
A9C3 for people of lower age/rank. The difference between these three is the height of the swash to the far left.

The end of a letter can be signaled using ꧉꧆꧉ This combination may also involve just ꧆꧉or may be repeated with spaces between to fill the linee, eg.
꧉ ꧆ ꧉ ꧆ ꧉ ꧆ ꧉

Poetry

ꦆ,ꦕ,ꦖ,ꦟ,ꦢ,ꦧ,ꦿ,꧀,꧅,꧉

In poetry ꧅ꦧ꧀ꦖ꧅ or ꧅ꦧ꧀ꦕ꧅ (purwapada) introduces a poem; ꧅ꦟ꧀ꦢꦿ꧅ (madyapada) introduces a new song within a poem; and ꧅ꦆ꧅ (wasanapada) indicates the end of a poem.

Optionally, A9C9 can be added to the above with some space around it. The spaces should be non-breaking, since there should be no line-breaks between the constituent partse, eg.
꧅ ꧉ ꦧ꧀ꦖ ꧉ ꧅

Titles

Titles may be marked by a pair of rerenggan characters, ꧁ and ꧂.

ie.

꧁...꧂

The glyphs for these characters may vary substantially.

Bracketed text

Javanese commonly uses native punctuation to insert parenthetical information into text.

	start	end
standard	A9CA	A9CA
alternative	A9CC	A9CD

Typically a pair of A9CA characters are used.

Alternatively, the pair of characters A9CC and A9CD may be used.

Quotations & citations

As for parentheses, Javanese text may use A9CA for quotation marks.

	start	end
top level	A9CA	A9CA
alternative	A9CC	A9CD

Alternatively, the pair of characters A9CC and A9CD may be used.

Emphasis

To draw attention to text Javanese may use the same characters as are used for parentheses and quotations, ie. a paired set of A9CA characters around the relevant text, or the two characters A9CC and A9CD can be used similarly.

Sometimes just A9CC is repeated.

Abbreviation, ellipsis & repetition

Abbreviation

According to Everson A9C8 is used for acronyms.

eg.

꧈ꦢꦺ꧈ꦲ꧈ꦌꦭ꧀꧈

It is also used after initials in a name.e

eg.

ꦫ꧈ꦩ꧈ꦯꦸꦭꦂꦠ

Repetition

A repeated syllable can be represented by A9CF, which is derived from the arabic-indic digit for 2.

eg.

ꦧꦸꦏꦸꧏ buku-buku books

It can be transcribed as buku2.

For 'ditto' marks in vertical lists, Javanese uses A9C9 .

Inline notes & annotations

Correction marks

According to Wikipedia A9DF is used in handwriting to indicate a correction in Yogyakarta, eg. where a scribe wanted to write pada luhur but actually wrote pada wu.. they would use this mark as follows: ꦥꦢꦮꦸ꧟꧟꧟ꦭꦸꦲꦸꦂ

In Yogyakarta they would use the character A9DE instead.

Line & paragraph layout

Line breaking & hyphenation

Like Tibetan, line breaking can occur after any full orthographic syllable, however no conjuncts are split during line breaking.

Observation: Does a 'full' orthographic syllable include a syllable coda written as a normal consonant letter followed by pangkon? Does that need to be wrapped with the syllable onset and nucleus?

Because there are no spaces between words, and because the end of one word and the beginning of another often form conjuncts (see fig_pangandika), Javanese doesn't wrap at word boundaries. Instead, it wraps at syllable boundaries where no conjuncts are involved.

Unfortunately, modern browsers are often unable to detect appropriate break points for Javanese. In the sample text at the beginning of this page 200B is used at places where the line could be broken. Otherwise, the line would continue, unbroken off the right side of the page.

Hyphenation, per se, is not used. See an interesting discussion about Javanese & Balinese line-breaking on GitHub.

Taling duplication

In some materials, when a new line begins with A9BA, an additional spacing taling is placed at the end of the previous line.

ꦲꦸꦠꦮꦶꦲꦶꦏꦁꦏꦊꦉꦱ꧀ꦏꦺ
ꦮꦺꦴꦤ꧀꧈ — An extra taling at the end of the line when the word kawon is split before won.

In online use, an application would need to create the extra taling, rather than the content author. As line-length is changed by stretching a window, or as content is added earlier in the same paragraph, the location of the word relative to the line edge will change. The insertion of an extra taling is only appropriate at those instants when the taling happens to appear at the line start.

Line-edge rules

As in almost all writing systems, certain punctuation characters should not appear at the end or the start of a line. The Unicode line-break properties help applications decide whether a character should appear at the start or end of a line.

Show line-breaking properties for characters in the modern Javanese orthography.

The following list gives examples of typical behaviours for some of the characters used in modern Javanese. Context may affect the behaviour of some of these and other characters.

Click/tap on the Javanese characters to show what they are.

꧈ ꧉ ꧇ should not begin a new line.
꧊ ꧋ ꧌ ꧍ ꧁ ꧂ ꧅ ꧄ ꧃ should not allow a linebreak unless next to whitespace or another character that allows line breaks.

Text alignment & justification

Observation: Articles in the Kajawen publication are fully justified. The justification algorithm appears to stretch spaces, which generally occur on each line, and also to slightly stretch inter-character spaces. In the latter case, stacks and super-/subscript diacritics are not affected, but space is added between pre-base vowel signs and base characters, and between base characters and conjoined characters.

Paragraph indents

Observation: One of the articles in the Kajawen publication uses paragraph indents. See an example.

Baselines, line height, etc.

Javanese uses the so-called 'alphabetic' baseline, which is the same as for Latin and many other scripts.

Javanese requires far more vertical space than Latin text. To give an approximate idea, fig_baselines compares Latin and Javanese glyphs from Noto fonts. The basic height of Javanese letters is typically around the Latin x-height, but the overall height is 2 to 3 times that of the Latin letters, creating a need for much larger line spacing.

Xqhxꦈꦥꦾꦏꦃꦥꦺꦊꦂꦘꦼ꧌꧄꧂ — Font metrics for Latin text compared with Javanese glyphs in the Noto Sans Javanese font.

fig_baselines_other shows similar comparisons for the Tuladha Jejeg and Javanese Text fonts.

Notes, footnotes, etc

See inlinenotes for purely inline annotations, such as ruby or warichu. This section is about annotation systems that separate the reference marks and the content of the notes.

Javanese

Sample

Usage & history

Basic features

Notable features

Character index

Letters

Basic consonants

Murda consonants

mahapranas

Independent vowels

Vocalic

Other

Not used for modern Javanese

Combining marks

Vowel signs

Medial consonants

Final consonants

Other

Not used for modern Javanese

Numbers

Punctuation

Other

To be investigated

Phonology

Vowel sounds

Consonant sounds

Tone

Structure

Vowels

Inherent vowels

Post-consonant vowels

Vowel signs

Standalone vowels

Vowel signs

Independent vowels

Other forms

Vowel components

Composite vowel signs

Pre-base vowel sign

Vowel sounds mapped to characters

Vocalic letters

Archaic forms

Vowel absence

Visible pangkon

Conjuncts

Conjunct formation

Stacking

Conjoined consonants

Consonants

Basic (nglegéna) consonants

Murda letters

Mahaprana letters

Repertoire extension

Onsets

Codas

Combining marks

Consonant sounds to characters

Encoding choices

Code point sequences

Numbers, dates, currency, etc.

Text direction

Glyph shaping & positioning

Context-based shaping & positioning

Shaping

Context-based positioning

Letterform slopes, weights, & italics

Typographic units

Word boundaries

Graphemes

Grapheme clusters

Larger typographic units

Browser behaviour

Punctuation & inline features

Phrase & section boundaries

Letters

Poetry

Titles

Bracketed text

Quotations & citations