Updated 15-Aug-2018 • tags scriptnotes.
This page provides information about characteristics of a number of scripts. It is not intended to be exhaustively scientific – merely to give a basic idea of what languages require what type of feature support. The symbol after a script name points to a page that gives a quick summary of the script.
Click on the column headings to sort by that column.
The table is intended to provide a general indication only. There are things that could be disputed.
Number of characters This figure is based on a character count for Unicode blocks related to that script. It is approximate, however, for a number of reasons. Blocks such as punctuation are not included – this is just a figure for the main block or set of blocks dedicated to that script.
Very often a particular language will use only a small number of the total characters available in a script (think, for example, how many characters are used for English out of the 1,286 Latin characters). The figures also include archaic characters.
Combining characters. This shows the subset of the number of characters that are combining characters. No attempt is made to indicate how many of the base characters each combining character can combine with. In some cases, this will be limited, but in most cases a combining character will combine with a fair number of base characters.
Contextual positioning. This is typically related to combining characters, and indicates that a typical font uses OpenType rules to position of a glyph according to the glyphs that surround it, eg. tone marks in Thai, or vowel signs in Arabic (if used). Nearly all scripts with combining characters will need some positioning rules to take account of where the combining character should be placed. This indicator is more concerned with whether that location varies significantly, depending on the surrounding context.
Multiple combining characters. Whether more than one combining character can be associated with a give base character.
Vowel signs. Whether the script uses vowel-signs to represent vowels (ie. is an abugida). The keyword 'standalone' indicates that some vowel signs are not combining characters.
Case sensitive Whether or not the script makes case distinctions.
Contextual shaping. Whether different glyph shapes have to be used for a character depending on the visual context, eg. the RA in Myanmar that grows and shrinks to fit around the character is surrounds. Note that this does not include shaping for cursive scripts (see below).
Cursive script. Do the letters in this script join up, eg. as in Arabic?
Text direction. Is this a right-to-left script (which actually usually means that bidirectional behaviour needs to be supported, for numbers and embedded foreign text.) Is it used in a vertical orientation?
Word separator. Is this a script like Thai, where spaces are used to separate phrases, not words, or like Japanese and Chinese, that don't use spaces, or Ethiopic, that has its own word separator?
Baseline. The baseline for Latin text is labelled 'mid'. Scripts designed like Indic scripts that hang from a high baseline, are labelled 'high'. Scripts like Chinese are labelled 'low'.
Text wrap. At the end of a line, where is the typically break point? Is it between words, or characters? Entries labelled 'special' wrap at a character that is not a space, eg. Tibetan, which uses a tsheg between words, rather than a space.
Hyphenation. Whether or not hyphenation is used with the script – by which is meant the addition of a mark at the end or beginning of a line when a word is broken at line end. Scripts that simply break text at syllable or character boundaries are not classed here as hyphenating. An asterisk indicates that practise varies for certain languages (for example, Uighur written with the Arabic script does hyphenate, but most other languages using Arabic script do not).
Justification. What is the basic starting point for justification of text on a line? Typically this is related to the spaces between words. Here are the other alternatives listed: 'char' is typical of Chinese and Japanese, where justification starts with inter-character spaces; 'cluster' refers to scripts such as in South East Asia, where word boundaries are taken into account, but spaces are used as phrase separators; 'word' is used for arabic-based scripts, where justification is commonly achieved by stretching the baseline or using ligatures.
Region. This rough grouping places the script in the region where it originated, so English is in Europe, and Arabic is in the Middle East. It serves to get a very rough idea of how things stack up on a regional basis.
Digits. Does the script have a set of native digits? Note that in some cases these may not be used for a particular language.
Feature count. This is a very simplistic indicator that simply awards one point for each column after the first three columns that doesn't read 'no', 'mid' or '0'.