Updated 11 February, 2020
This page lists characters in the following Unicode blocks and provides information about them.
This is not authoritative, peer-reviewed information – these are just notes I have gathered and copied from various places.
If you click on any red example text, you will see at the bottom right of the page a list of the characters that make up the example.
To find a character by codepoint, type #char0000 at the end of the URL in the address bar, where 0000 is a four-figure, hex codepoint number, all in uppercase. Or type the character or the hex number in the Find control above.
To view this page as intended, you need an appropriate font. Click the blue vertical bar at the bottom right of the page to apply other fonts, if you have them on your system. For transcriptions I recommend the excellent and free Doulos SIL font. The large character in the box will not be rendered unless the webfont downloaded with the page or a system font has a glyph for it. (If there is no glyph and you want to see what it looks like, click on See in UniView.)
Information about languages that use these characters is taken from the list maintained for the Character Use app. The list is not exhaustive.
References are indicated by superscript characters. Wherever possible, those contain direct links to the source material. When such a pointer is alongside an arrow → it means that it's worth following the link for the additional information it provides. Digits refer to the main sources, which are listed at the bottom of a set of notes.
When you are using UniView and you turn on Show notes, UniView will pull in information about characters from this page.
U+200B ZERO WIDTH SPACE
An invisible character, used to signal line-break and word-break opportunities. It was originally provided for use with writing systems such as Thai, Myanmar, Khmer, Japanese, etc. that don't use spaces between words.
Justification may visibly adjust the space between the characters on either side of this character, doing so as if the ZWSP wasn't there, eg. the Thai text อักษรไทย may look like อั ก ษ ร ไ ท ย when justified, or when letter-spacing is applied, even though the two words are separated by a ZWSP (click on the word to see the composition).
U+200C ZERO WIDTH NON-JOINER
Prevents two adjacent letters forming a cursive connection with each other when rendered.
The ZWNJ is used in Persian for plural suffixes, some proper names, and Ottoman Turkish vowels. Ignoring or removing the ZWNJ will result in text with a different meaning or meaningless text.1 For example, تنها is the plural of body, whereas تنها is the adjective alone.2 The only difference is the presence or absence of ZWNJ after noon. u373 g
Khmer register shifters (ie. ◌៉ [U+17C9 KHMER SIGN MUUSIKATOAN] or ◌៊ [U+17CA KHMER SIGN TRIISAP]) usually appear above a consonant. However, if a superscript vowel is also attached to the consonant, the shifter is normally displayed below the consonant, instead. If you want to force the shifter to remain above the consonant, as is occasionally the case, insert ZWNJ between the consonant and the shifter. For example, ហ ហ៊ ហ៊ី ហ៊ី. u373 sk
U+200D ZERO WIDTH JOINER
Permits a letter to form a cursive connection without a visible neighbour.
The marker for hijri dates is an initial form of heh, even though it doesn't join to the left, ie. ه. For this, use a U+200D ZERO WIDTH JOINER immediately after the heh, eg. الاثنين 10 رجب 1415 ه..
In some cases ـ [U+0640 ARABIC TATWEEL] is used to ensure that the shape looks right, because some applications or fonts don't produce the right effect when using the ZWJ, eg. الاثنين 10 رجب 1415 هـ..
U+2060 WORD JOINER
An invisible character, equivalent to a zero-width no-break space, and used to prevent line-breaks, eg. it can be used around the + sign in base+delta to prevent a line break occuring in that sequence of characters. It has no effect on word segmentation.
This functionality is also provided by U+FEFF ZERO WIDTH NO-BREAK SPACE, but since that character also represents the byte-order mark, the use of this word joiner character (added in Unicode 3.2) is strongly preferred over the latter.
U+2016 DOUBLE VERTICAL LINE
Called dagger, but also known as obelisk, obelus, or long cross.b321
A reference mark, used primarily with footnotes. When used for this purpose with other signs, the traditional order is * † ‡ § ‖ ¶.b68
Also a death sign in European typography, used to mark the year of death or the names of dead persons.b321
In lexicography it marks obsolete forms, and in editing of classical texts flags passages judged to be corrupt.b321
U+2021 DOUBLE DAGGER
U+2024 ONE DOT LEADER
Armenian punctuation miǰakēt
Used like a semi-colon – a shorter break than a full stop. u322
Ոչ ոք չպետք է լինի ստրկության կամ անազատ վիճակում․ պետք է արգելվեն ստրկատիրության ու ստրուկների առուծախի բոլոր ձևերը։
U+2033 DOUBLE PRIME