Unicode code converter

Standard use

Most of the time you will probably want to drop the text to be converted into the field with the green background, and hit the associated Convert button. This will convert all escapes to characters, then convert those into each of the forms listed against the boxes below.

If your text contains bare numbers that you also want to convert, use the select control to the right. (Be aware, however, that in this case something like 'ab' could be interpreted as a hex number.)

Note, also, that the escapes \n, \t, \b, and \" etc, are recognised by default. If you check the box next to Convert \n etc they will be ignored. For full CSS behaviour here, use the CSS input field.

The Extract button strips away everything but the escapes. For example, it will turn "The first row is U+0A73 ura, U+0A05 a, U+0A72 iri, U+0A38 sa, U+0A39 ha." into "U+0A73 U+0A05 U+0A72 U+0A38 U+0A39".

It currently doesn't recognise escapes such as \n, \t, etc. Nor does it recognise character entities, such as á.

If you have selected Hex code points or Dec code points in the adjacent select control, it will also leave behind hex or decimal numbers. For example, it will turn "<0915, 094E, 0947>" into "0915 094E 0947" or "0915 0947", respectively. Note, however, that it may not always get this right. Text such as '(dec)' will result in 'dec' being treated as a number. It only detects numbers if they are preceded and followed by a space, a particular set of punctuation marks, or the start/end of the text. It treats UTF-8 and UTF-16 code units as hex code point numbers.

Special use. If you only want to convert a specific type of escape and leave all others untouched, paste the text into one of the other (grey) boxes and hit its associated Convert button.

Checkboxes. Several of the output fields have checkboxes that allow you to slightly alter the results of a conversion. If an output field already contains a result when you click on a checkbox, you'll often see a change happen as you click. In a couple of cases, however, this doesn't happen, since it is not possible to produce good results.

Invoking via URL. You can also pass a string to the page using the q parameter in the URI. For example, http://r12a.github.io/app-conversion/?q=Crêpes. You can also pass a string with escapes in it. You will need to be careful to percent-escape characters such as &, + and # which affect the URI syntax. For example, http://r12a.github.io/app-conversion/beta?q=CrU%2B00EApes.

Box inputs and outputs

The following describe how the various boxes work, including what happens if you paste or type text into the named field and hit Convert, and the output in the named field if you hit Convert elsewhere.

Characters

When conversion puts something here: Everything is displayed as characters.

You can view more detail for each character by clicking on View in UniView.

If you start a conversion from here: Everything is treated as characters, eg. U+1234 is not treated as an escape for the purposes of conversion.

HTML/XML

When conversion puts something here: Ordinary characters will appear by default, except that < > " and & are converted to character entities. This is useful for preparing examples of sample code for HTML or XML.

By default the control Escape invisible characters is checked. This causes certain invisible characters (such as RLM) or ambiguous characters (such as NO-BREAK SPACE) to be converted to escaped form. The characters affected will be added to over time.

If Convert bidi controls to HTML markup is selected RLE, LRE, RLI, LRI, FSI, PDF and PDI are converted to HTML markup based on a span element.

Hint: if you want to get the result into source code form, once the initial conversion has been done just click Convert above this text area, and then look in the Characters text area

Note that if your text contains RLO or LRO plus PDF, the PDF will incorrectly be converted to </span> at the moment. I may fix this (and thereby allow RLO/LRO conversion too) at a later date.

If you start a conversion from here: Use HTML or XML markup. Numeric character references or HTML character entities other than < > " and & are converted to ordinary characters during conversion.

Hex NCRs

When conversion puts something here: By default, everything except ASCII characters is converted.

You can deselect the Show ascii checkbox to specify that you want all characters to be converted.

If you start a conversion from here: It can be a mix of text and escapes. Only hexadecimal NCRs are converted.

Decimal NCRs

When conversion puts something here: By default, everything except ASCII characters is converted.

You can deselect the Show ascii checkbox to specify that you want all characters to be converted.

If you start a conversion from here: It can be a mix of text and escapes. Only decimal NCRs are converted.

JavaScript/Java/C

When conversion puts something here: By default, everything except visible ASCII characters is converted to numeric escapes, and the following escapes are substituted for ASCII characters: \0, \b, \t, \v, \f, \\.

The default output to this field is in the ES6 style, which is much more useful when dealing with supplementary characters (such as emoji), and is well supported by major browsers, except for Internet Explorer. To generate the old style escapes, or escapes for Java, deselect the ES6-style checkbox. A small number of Java-only named escapes such as \e are rendered as numeric escapes.

If C-style is checked, supplementary characters are rendered by a single number, eight digits long, rather than two adjacent surrogate code point numbers.

If \n etc is checked, line feeds (\n), tabs (\t), and quotation marks (\") are also escaped.

If you start a conversion from here: It can be a mix of text and escapes. Only the following types of escape are recognised:

\u{1F468}
\u1234 (requires 2 escapes for supplementary characters)
\U00123456
\x10
\0 \b \t \n \r \v \f \\ \"

Rust/Ruby

When conversion puts something here: By default, everything except visible ASCII characters is converted to \u{...} escapes, and the following escapes are substituted for ASCII characters: \0, \b, \t, \v, \f, \\. Output for other characters in the ranges U+0001-U+001F and U+0080-U+009F (ie. invisible control characters) uses the \x.. escape format.

If \n etc is checked, line feeds (\n), tabs (\t), and quotation marks (\") are also escaped.

If you start a conversion from here: It can be a mix of text and escapes. Only the following types of escape are recognised:

\u{1F468}
\u{1234 1235 1236}
\x10
\0 \b \t \n \r \v \f \\ \"

Perl/UTR#18

When conversion puts something here: By default, everything except visible ASCII characters is converted to \x{...} escapes, and the following escapes are substituted for ASCII characters: \0, \b, \t, \v, \f, \\. Output for other characters in the ranges U+0001-U+001F and U+0080-U+009F (ie. invisible control characters) uses the \x.. escape format.

If \n etc is checked, line feeds, tabs, and quotation marks are also escaped.

If you start a conversion from here: It can be a mix of text and escapes. Only the following types of escape are recognised:

\x{1F468}
\x10
\0 \b \t \n \r \v \f \\ \"

CSS

When conversion puts something here: It does not escape non-control ASCII characters. Output content uses 6-digit escape forms followed by a space for supplementary characters, and 4-digit escapes followed by a space for all other escaped characters.

If you start a conversion from here: It can be a mix of text and escapes.

Percent-encoding for URIs

When conversion puts something here: Characters allowed in URI syntax are not converted.

If you start a conversion from here: It can be a mix of text and escapes. Only percent escapes are converted.

U+hex

When conversion puts something here: By default, everything except ASCII characters is converted.

You can deselect the Show ascii checkbox to specify that you want all characters to be converted.

If you want to insert spaces between adjacent escapes (only) click on the Separate button. Note, however, that if you now click on the Convert button for that field, the output will contain those extra spaces.

Hint: to separate a sequence of characters by spaces, paste the characters into the field with a green background and click Convert. Then click Separate followed by Convert in the U+hex field and look in the Characters field for the result.

If you start a conversion from here: It can be a mix of text and escapes. Only U+hex escapes are converted.

0x...

When conversion puts something here: By default, everything except ASCII characters is converted.

You can deselect the Show ascii checkbox to specify that you want all characters to be converted.

Hint: to separate a sequence of characters by spaces, paste the characters into the field with a green background and click Convert. Then click Separate followed by Convert in the 0x... field and look in the Characters field for the result.

If you start a conversion from here: It can be a mix of text and hexadecimal 0x... escapes. Only 0x...escapes are converted.

UTF-8 code units

When conversion puts something here: You'll see pairs of 2-digit hexadecimal numbers representing the bytes that make up the text when encoded in UTF-8.

If you start a conversion from here: It must be hexadecimal byte codes only, separated by spaces.

UTF-16 code units

When conversion puts something here: You'll see hexadecimal numbers of 1 to 4 digits representing the UTF-16 code units for the text converted. Supplementary characters are represented by two code units.

If you start a conversion from here: It must be hexadecimal code units only, separated by spaces.

Hexadecimal code points

When conversion puts something here: By default, you'll see Hex numbers only, all separated by spaces.

If you use the checkbox Keep ascii ASCII characters remain unchanged, and a space is inserted after any hex sequence followed by another hex sequence.

The Pad selector allows you to indicate how much padding to apply to the hex numbers. This control applies changes when you select a different number, as well as when a conversion is triggered. If the Hex/UTF-32 box contains mixed hex values and ASCII characters see the warning just below.

If you start a conversion from here: It can be a mix of text and hex numbers. Only hex numbers are converted.

Warning: It is not recommended to click on the Convert button for Hex/UTF-32 when that box contains mixed ASCII and hex numbers, unless you are sure that the box contents can be reliably parsed. One reason is that a sequence of two or more characters in the range a-f, such as cafe, will be treated as a hexadecimal number representing a character. Another reason is that if spaces are not inserted between a hex sequence and adjacent ASCII characters, this can cause problems if you try to convert, such as W3C202C being read as W+3C202C (which raises a code point out of range error). Furthermore, the extra spaces inserted when a conversion results in mixed text in this box are also carried through if you then click on the Convert button for this box.

Decimal code points

When conversion puts something here: By default, you'll see decimal numbers only, all separated by spaces.

If you the Keep ascii checkbox is checked, the result will be a mix of decimal code point numbers and ASCII text. A space is inserted before a code point value if it is immediately followed by another decimal code point value.

If you start a conversion from here: It can be a mix of text and decimal numbers. Only decimal numbers are converted.

Warning: It is not recommended to click on the Convert button for the Decimal box when that box contains mixed ASCII and decimal numbers, unless you are sure that the box contents can be reliably parsed. One reason is that if spaces are not inserted between a hex sequence and adjacent digits, this can cause problems if you try to convert. Furthermore, the extra spaces inserted when a conversion results in mixed text in this box are also carried through if you then click on the Convert button for this box.

Standard use

Box inputs and outputs

Characters

HTML/XML

Hex NCRs

Decimal NCRs

JavaScript/Java/C

Rust/Ruby

Perl/UTR#18

CSS

Percent-encoding for URIs

U+hex

0x...

UTF-8 code units

UTF-16 code units

Hexadecimal code points

Decimal code points

Changes in version 10

Notable changes in version 9