Orthography comparison table

Updated 14 July, 2024 • recent changes • leave a comment

This page provides contrastive information about orthographies. It is not intended to be exhaustively scientific – merely to give a basic idea of what languages require what type of feature support. The categorisations are fairly rough and ready, but clicking on the data in the cells takes you to pages that give more details.

The many columns are divided into sections. Clicking on the buttons below toggles one or more of those sections in to or out of view.

Click on the column headings to sort by that column. Mousing over or clicking/tapping on cells will show detailed information at the bottom of the window.

Show/hide detail for

Key

The table is intended to provide a general indication only. There are things that could be disputed.

The major sections are displayed like buttons below. The text following the button indicates the URL parameter that will cause that section to be automatically displayed.

?show=characters

Total chars. This figure is based on the data in the Character Usage Lookup app.

The figures in the next column, preceded by a + sign indicate how many additional characters are currently under investigation. These currently contain many characters that may not be relevant, but that are awaiting assessment before they can be removed.

Character counts do not include ASCII characters. It is assumed that those characters are always available.

Note also that the character counts reflect the characters needed to represent both precomposed and decomposed versions of content. For example, use of a character such as â would add 3 characters to the total count: â, a, and the combining circumflex. Use of ô would then add a further 2 (because the circumflex is already counted).

The 5 other (initially hidden) columns give character counts for specific types of character, per the Unicode general property assignment. These include:

Like the other character counts, these figures exclude ASCII characters, and include characters for any compositions and decompositions that may be applied (unless they are deprecated by the Unicode Standard).

?show=vowels

Writing system. This column indicates whether the orthography is one of the following.

The other (initially hidden) columns cover the following types of character used to write vowel sounds.

?show=consonants

?show=cclusters

These columns are initially hidden. Where consonant clusters are represented in a special way by the orthography, this column indicates the more common strategies. Cells indicate:

?show=direction

The typical direction(s) in which the text flows. Columns indicate:

?show=shaping

'Shaping' here means that glyph shapes change according to the context (gsub), whereas 'positioning' refers to the need to position glyphs differently according to context (gpos). The columns look at two specific properties of shaping.

?show=inline

Currently this section only indicates how and if words are separated. The following alternatives are called out:

?show=para

Columns currently cover:

?show=more

This rough grouping places the script in the region where it originated, so English is in Europe, and Arabic is in the West Asia. It serves to get a very rough idea of how things stack up on a regional basis. Regions are one of the following:

Changed 2024-07-14 7:50 GMT.  •  Send feedback.  •  Licence CC-By © r12a.