Originally designed for hot metal typesetting as a space that is approximately 0.5 em wide. In digital text, the size is fixed by the font, and does not normally increase in size during justification. Canonically equivalent to 2002.

Originally designed for hot metal typesetting as a space that is approximately 1 em wide. In digital text, the size is fixed by the font, and does not normally increase in size during justification. Canonically equivalent to 2003.

Provides a space that is approximately 0.5 em wide. Does not increase in size during justification.

Some content authors using Southeast Asian orthographies, such as Thai and Khmer, may use this character to produce a wider space between sentences than around phrases (since there is no sentence-final punctuation). Does not increase in size during justification.

Originally designed for hot metal typesetting as a space that is approximately 0.3 em wide. In digital text, the size is fixed by the font, and does not normally increase in size during justification.

Originally designed for hot metal typesetting as a space that is approximately 0.25 em wide. In digital text, the size is fixed by the font, and does not normally increase in size during justification.

Originally designed for hot metal typesetting as a space that is approximately 1/6th em wide. In digital text, the size is fixed by the font, and does not normally increase in size during justification.

Has a fixed width known as tabular width, which is the same width as digits used in tables. Does not increase in size during justification.u16,#G1834

Defined to be the same width as a period. Does not increase in size during justification.u16,#G1834

Used for narrow word gaps and for justification of type. Slightly larger than 200A. Sometimes gets expanded during justification, unlike the other fixed width spaces.u16,#G1834

Used for narrow word gaps and for justification of type. Slightly smaller than 2009. Does not increase in size during justification.u16,#G1834

An invisible character, used to signal line-break and word-break opportunities. It was originally provided for use with writing systems such as Thai, Myanmar, Khmer, Japanese, etc. that don't use spaces between words.

Justification adjusts the gap between the characters on either side of the ZWSP as if the ZWSP wasn't there^§,827, eg. the two lines below show Thai text containing a ZWSP after the 4th base character. The first is rendered as per normal, the second is as it would appear with justification or letter-spacing. Note how the second line has no extra spacing where the ZWSP occurs.

Prevents two adjacent letters forming a cursive connection with each other when rendered. Especially useful for educational illustrations, but also has some real-world applications.

Also used with complex scripts to manage the visual representation of glyphs that normally interact, eg. to form conjuncts, position diacritics, etc.

Persian

The ZWNJ is used in Persian for plural suffixes, some proper names, and Ottoman Turkish vowels. Ignoring or removing the ZWNJ will result in text with a different meaning or meaningless text. For example, تن‌ها is the plural of body, whereas تنها is the adjective alone. The only difference is the presence or absence of ZWNJ after noon. u373 g

Khmer

Khmer register shifters (ie. ◌៉ [U+17C9 KHMER SIGN MUUSIKATOAN] or ◌៊ [U+17CA KHMER SIGN TRIISAP]) usually appear above a consonant. However, if a superscript vowel is also attached to the consonant, the shifter is normally displayed below the consonant, instead. If you want to force the shifter to remain above the consonant, as is occasionally the case, insert ZWNJ between the consonant and the shifter.u373 sk ហ ហ៊ ហ៊ី ហ‌៊ី

Hindi

The ZWNJ can be used to prevent the formation of conjuncts, eg. क्क → क्‌क क्ष → क्‌ष

Permits a letter to form a cursive connection without a visible neighbour. Especially useful for educational illustrations, but also has some real-world applications.

Also used with complex scripts to manage the visual representation of glyphs that normally interact, eg. to form conjuncts, position diacritics, etc.

Arabic

The marker for hijri dates is an initial form of heh, even though it doesn't join to the left, ie. ه‍. For this, use a U+200D ZERO WIDTH JOINER immediately after the heh, eg. الاثنين 10 رجب 1415 ه‍..

In some cases ـ [U+0640 ARABIC TATWEEL] is used to ensure that the shape looks right, because some applications or fonts don't produce the right effect when using the ZWJ, eg. الاثنين 10 رجب 1415 هـ..

Hindi

The ZWJ can be used to make a conjunct that usually forms a ligature use half-forms instead, eg. क्ष → क्‍ष

An invisible character with strong LTR directional properties that can be used to produce the correct ordering of text, especially where there is a risk of spillover effects while the Unicode Bidirectional Algorithm is at work.

An invisible character with strong RTL directional properties that can be used to produce the correct ordering of text, especially where there is a risk of spillover effects while the Unicode Bidirectional Algorithm is at work.

This hyphen is rendered with a narrow width, and used in words such as 'left-to-right'.u16,#G6120

When typesetting text, this character is preferred, rather than U+002D HYPHEN-MINUS (which has ambiguous semantic value and rendered with an average width).u16,#G6120

Has the same semantics as U+2010 HYPHEN except that it prevents line breaks around it. This hyphen is rendered with a narrow width, and used in words such as 'left-to-right'.u16,#G6120

Has the same ambiguous semantics as U+002D HYPHEN-MINUS but has the same width as monospaced digits.u16,#G6120

It is also used by some typographers to make a break – like this – and when used this way it usually has spaces either side.

Note, however, that in Hebrew, using this rather than U+002D HYPHEN-MINUS to write a range results in the numbers in the range being read right to left, rather than the normal left to right, eg.

Used to make a break—like this—in which case it usually has no spaces either side (unlike – U+2013 EN DASH). In typewriter text this is oftn represented by a double hyphen.u16,#G6120

Authors of Chinese text may use two of these characters, side-by-side, to indicate a break, but nowadays ⸺ U+2E3A TWO-EM DASH is recommended, instead.

In older mathematical typography this may be used to indicate a binary minus sign.u16,#G6120

An old standard reference mark used with footnotes. When used for this purpose with other signs, the traditional order is * † ‡ § ‖ ¶.b

This is the preferred default for a punctuation apostrophe (avoiding the ambiguity of ' U+0027 APOSTROPHE), eg. in contractions such as "We’ve been here before."u16,#G6120

If surrounded by text or digits on both sides, this should not constitute a line-break opportunity.u16,#G6120

Where the apostrophe is to represent a modifier letter (for example, in transliterations to indicate a glottal stop), a letter apostrophe is used. The code point for that is ʼ U+02BC MODIFIER LETTER APOSTROPHE.u16,#G6120 That code point is used, for example, for many languages as a letter of their alphabets, as a tone marker in Bodo and Dogri, and to indicate vowel elongation, or various truncations and ellipsis in Maithili.

A reference mark, used primarily with footnotes. When used for this purpose with other signs, the traditional order is * † ‡ § ‖ ¶.b68

Also a death sign in European typography, used to mark the year of death or the names of dead persons.b321

In lexicography it marks obsolete forms, and in editing of classical texts flags passages judged to be corrupt.b321

A reference mark used with footnotes. When used for this purpose with other signs, the traditional order is * † ‡ § ‖ ¶.b68

Հայկական Բարձրաւանդակ․ նկարուած հարաւային Կովկասի արբանեակէն Armenian Highlands; satellite image of the southern Caucasus

It can also be used with 2025 to construct dot leaders in plain text, when the application doesn't generate them automatically. This character allows for fine-tuning of the dot leader sequence length.u16,#G13727 (Note that this only works when the page width is fixed.) For example:

Dot leaders that connect things like chapter titles with page numbers are often generated automatically by an application. If they are not, this and 2024 can be used to construct them.u16,#G13727 (Note that this only works when the page width is fixed.) For example:

A convenient alternative to writing ellipsis with 3 consecutive full stops. This makes it easy to ensure that the ellipsis is not broken across a line, but the spacing of the dots in the ellipsis glyph may need to change based on the language.u16,#G13586

In CJK texts, it is normal to express ellipsis using this character twice (making 6 dots). When used that way, a line should not normally be broken between the 2 code points. CJK ellipsis is usually vertically centred in horizontal text and horizontally centred in vertical text. The dots are evenly spread across the 2em width. CJK fonts tend to do this automatically when rendering this code point (although not all fonts do).

그는 최선을 다했다. 그러나 성공할지는…… He did his best. But if he succeeds...

⋯ U+22EF MIDLINE HORIZONTAL ELLIPSIS may be substituted for this character in order to make the dots appear centred in the line, but this is really a mathematical symbol. Neither CLReq nor JLReq mention the use of that character.

There is also a presentation form aimed at vertical text, ︙ U+FE19 PRESENTATION FORM FOR VERTICAL HORIZONTAL ELLIPSIS, which maps to the GB 18030 Chinese standard, but as for most presentation forms, fonts should render the text appropriately so that it is not needed. There is also another mathematical symbol, ⋮ U+22EE VERTICAL ELLIPSIS, which looks similar, but the Unicode Standard recommends that, if the font doesn't automatically render U+2026 as needed, then U+FE19 is a better choice.

Mongolian, which is normally written vertically, has its own, 4-dot ellipsis code point, ᠁ U+1801 MONGOLIAN ELLIPSIS.

Do not confuse this character with ⁝ U+205D TRICOLON, which is used as a word or phrase delimiter.

A raised dot used in dictionaries to indicate word-break opportunities, eg. “dic‧tio‧nary”.u16,#G20622

The Unicode Standardu16,#G746543 has a table illustrating the use of this and other methods for indicating word-breaks in a selection of dictionaries.

Sets the start point for a range of inline text when applying a base direction of left-to-right. The range is terminated by PDF

U+202C POP DIRECTIONAL FORMATTING (PDF).

Sets the start point for a range of inline text when applying a base direction of right-to-left. The range is terminated by 202C (PDF).

Sets the end point for a range of inline text when applying a base direction. The range is started with either 202A (LRE) or 202B (RLE).

You should use 2069 (PDI) and its associated range starters rather than this character.

Many Mongolian suffixes are separated from the root or other suffixes by a gap that is smaller than a normal space. Characters following this gap may take on special shapes, and lines should not be broken at this gap. For example:

This character was initially added to Unicode for Mongolian suffix handling, but in Unicode 16 a decision was taken to use U+180E MONGOLIAN VOWEL SEPARATOR instead for this purpose.

A somewhat recent innovation in writing Cree syllabics is to use this as a morpheme separator, rather than the hyphen which is used in the Latin transcription, eg. ᐁ ᐚᐸᒫᐟ ê-wâpamât

Also useful in Latin script languages where a thin, non-breaking space is needed:

Encoded for convenience when working with vertical text in East Asian or Mongolian scripts.u16,#G5491

Can be used to create a high line, corresponding to _ U+005F LOW LINE. A sequence of these characters should create an unbroken line.u16,#G2006

This is distinct from the following: ̅ U+0305 COMBINING OVERLINE, and ̄ U+0304 COMBINING MACRON.

A proofreading mark indicating a location where something should be inserted.u16,#G14769

A fraction slash to be used between digits. Fonts may convert the sequence to a single typographic unit, such as in the following examples:u16,#G2001

Encoded for convenience when working with vertical text in East Asian or Mongolian scripts.u16,#G5491

Used in Old English or Irish Gaelic in the same way as a modern ampersand (&), but may also be used as a letter in some contexts.u16,#G28551

In some medieval materials an uppercase form appears. This can be represented using ⹒ U+2E52 TIRONIAN SIGN CAPITAL ET, however note that these two are not case-mapped in Unicode.u16,#G28551

A common alternative to ¶ U+00B6 PILCROW SIGN. The pilcrow characters and § U+00A7 SECTION SIGN are used to indicate sections or paragraphs, in editorial markup, to show format modes, etc. Which character is used is dictated by convention.u16,#G4247

Used as a minus sign in commercial or tax-related forms or publications in several European countries, including Germany and Scandinavia. Can also be written as the sequence ./. U+002E FULL STOP + U+002F SOLIDUS + U+002E FULL STOP.u16,#G7935

In European countries such as Finland, this character and ✓ U+2713 CHECK MARK are used in marking student work to indicate 'correct' and 'incorrect', respectively.u16,#G7935

Also used as a marginal note in letters to indicate enclosures, and in the Uralic Phonetic Alphabet to indicate a structurally related borrowed element of different pronunciation.u16,#G7935

One character in a set of archaic punctuation characters used in common for ancient and medieval scripts. The specific function can vary by script.u16,#G13108

Used in dictionaries to indicate syllable boundaries that are not suitable word-break opportunities, eg.u16,#G20622

To reinforce the idea that there should be no line break here, this character may be followed by 2060.u16,#G20622

An invisible character, equivalent to a zero-width no-break space, and used to prevent line-breaks. It has no effect on word segmentation.

It can also be used to bracket other characters to turn them into non-breaking characters, such as U+2009 THIN SPACE or ― [U+2015 HORIZONTAL BAR].

This functionality is also provided by U+FEFF ZERO WIDTH NO-BREAK SPACE, but since that character also represents the byte-order mark, the use of this word joiner character (added in Unicode 3.2) is strongly preferred.

Sets the start point for a range of inline text when applying a base direction of left-to-right, and isolates the text within that range from text outside it. The isolation prevents unintended spill-over effects when the text is reordered by the Unicode Bidirectional Algorithm. The range is terminated by PDI

U+2069 POP DIRECTIONAL ISOLATE (PDI).

Sets the start point for a range of inline text when applying a base direction of right-to-left, and isolates the text within that range from text outside it. The isolation prevents unintended spill-over effects when the text is reordered by the Unicode Bidirectional Algorithm. The range is terminated by PDI

U+2069 POP DIRECTIONAL ISOLATE (PDI).

Sets the start point for a range of inline text when applying a base direction, and isolates the text within that range from text outside it. The base direction set is determined by that of the first strong directional character in the range. The isolation prevents unintended spill-over effects when the text is reordered by the Unicode Bidirectional Algorithm. The range is terminated by PDI

U+2069 POP DIRECTIONAL ISOLATE (PDI).

Used in dictionaries to indicate certain morphological boundaries in West Asian linguistics.u16,#G9921

For similar-looking Fraktur hyphens use the normal hyphen characters with an appropriate font, rather than this character.u16,#G9921

Orthographies that separate words with a raised dot, such as Avestan or Samaritan, can use this word separator which is not script-specific. However, Runic has its own code point for the same purpose (᛫ U+16EB RUNIC SINGLE PUNCTUATION).u16,#G15382

Similar-looking code points with different semantics include: ⸳, ‧, and · U+00B7 MIDDLE DOT.