Triage linebreaking tool

Legend

AI (ambiguous, alphabetic or Ideograph) line breaking behavior depends on the context, and may reflect different rules in East Asian and other contexts.

AK (Aksara) can occur as the base of a Brahmic orthographic syllable and can also follow a virama of Indic syllabic category Virama or Invisible_Stacker within the same orthographic syllable.

AL (ordinary alphabetic and symbol characters) requires other characters to provide break opportunities; otherwise, unless tailored rules are applied, no line breaks are allowed between pairs of them.

AP (Aksara Pre-Base) characters of Brahmic scripts that are part of an orthographic syllable but in logical order precede the base or any half-forms.

AS (Aksara Start) can occur as the base of a Brahmic orthographic syllable, but cannot follow a virama of Indic syllabic category Virama or Invisible_Stacker within the same orthographic syllable.

B2 (break opportunity before and after) the EM DASH used to set off parenthetical text may allow line breaks before or after, but may also be affected by local orthographic rules.

BB (break before) indicates that characters move to the next line at a line break and thus provide a line break opportunity before.

BA (break after) indicates that it is normal to break after that character.

CJ (conditional japanese starter) may be treated as either NS or ID. Treating characters of class CJ as class NS will give CSS strict line breaking; treating them as class ID will give CSS normal breaking.

CL (close punctuation) should be kept with the preceding character. The class CL is closely related to the class CP (Close Parenthesis). They differ only in that CP will not introduce a break when followed by a letter or number, which prevents breaks within constructs like “(s)he”.

CM (combining mark) takes on the behaviour of its base character.

CP (closing parenthesis) will not cause a break opportunity when appearing in contexts like “(s)he.” In all other respects the breaking behavior of CP and CL are the same.

EX (exclamation mark/interrogation) behave like closing characters, except in relation to postfix (PO) and non-starter characters (NS).

GL (non-breaking “glue") non-tailorable, non-breaking characters prohibit breaks on either side, but that prohibition can be overridden by SP or ZW.

HL (Hebrew letter) does not break around a following hyphen; otherwise acts as Alphabetic.

HY (hyphen) additional context analysis is required to distinguish usage of this character as a hyphen from its usage as a minus sign (or indicator of numerical range). If used as hyphen, it acts like U+2010 HYPHEN, which has line break class BA.

IN (inseparable characters) is intended to be used consecutively. There is never a line break between two characters of this class.

ID (ideographic) do not require other characters to provide break opportunities; lines can ordinarily break before and after and between pairs of ideographic characters. Note that this class also includes characters other than Han ideographs..

IS (infix numeric separators) usually occurs inside a numerical expression and may not be separated from the numeric characters that follow, unless a space character intervenes. For example, there is no break in “100.00” or “10,000”, nor in “12:59”..

NS (nonstarters) cannot start a line, but unlike CL they may allow a break in some contexts when they follow one or more space characters.

NU (number) behaves like ordinary characters (AL) in the context of most characters but activate the prefix and postfix behavior of prefix and postfix characters.

OP (open punctuation) should be kept with the character that follows. This is desirable, even if there are intervening space characters, as it prevents the appearance of a bare opening punctuation mark at the end of a line.

PO (postfix numeric) usually follows a numerical expression and may not be separated from preceding numeric characters or preceding closing characters, even if one or more space characters intervene. For example, there is no break opportunity in “(12.00) %”.

PR (numeric prefix) may not be separated from following numeric characters or following opening characters, even if a space character intervenes. For example, there is no break opportunity in “฿ (100.00)”.

QU (quotation) characters can be opening or closing, or even both, depending on usage. The default is to treat them as both opening and closing.

SA (Southeast Asian) require morphological analysis to determine break opportunities, in a way similar to a hyphenation algorithm. No break opportunities will be found otherwise. Complex context analysis, often involving dictionary lookup of some form, is required to determine non-emergency line breaks. If such analysis is not available, it is recommended to treat them as AL.

SY (symbols allowing break after) provides a break opportunity after, except in front of digits, so as to not break “1/2” or “06/07/99”.

VF (Virama Final) a virama of Indic syllabic category Pure_Killer in scripts where the final consonant of a phonological syllable is expressed as a sequence of a consonant and such a virama, and the final consonant needs to be kept together with the preceding orthographic syllable.

VI (Virama) a virama of Indic syllabic categories Virama and Invisible_Stacker.

ZW (ZERO WIDTH SPACE, ZWSP) enables invisible break opportunities wherever SPACE cannot be used. It has no width, and is treated as if it wasn't there during justification.