Encoding converter

utf-8

big5

euc-jp

iso-2022-jp

shift_jis

euc-kr

gb18030

gbk

windows-1252/latin1

utf-8

big5

euc-jp

iso-2022-jp

shift_jis

euc-kr

gb18030

gbk

windows-1252/latin1

Notes:

Updated Tue 21 Jun 2016 • tags encoding, apps

This app allows you to see what bytes are used by legacy encodings to represent a particular character, or to convert a sequence of bytes into characters for a range of encodings. You can customise the encodings you want to experiment with by clicking on change encodings shown. The default selection excludes most of the single-byte encodings.

The algorithms used are based on those described in the Encoding specification, and thus describe the behaviour you can expect from web browsers. The transforms may not be the same as for other conversion tools. (In some cases the browsers may also produce a different result than shown here. See the tests.)

Encoding algorithms convert Unicode characters to sequences of double-digit hex numbers that represent the bytes found in the target character encoding. A character that cannot be handled by an encoder will be represented as a decimal HTML character escape.

Decoding algorithms take the byte codes just mentioned and convert them to Unicode characters. The algorithm returns replacement characters where it is unable to map a given byte to the encoding.

For the decoder input you can provide a string of two-digit hex numbers separated by space or by percent signs.

Green backgrounds appear behind sequences where all characters or bytes were successfully mapped to a character in the given encoding. Beware, however, that the character mapped to may not be the one you expect – especially in the single byte encodings.

To identify characters and look up information about them you will find UniView extremely useful. You can paste Unicode characters into the UniView text area and click on the down-arrow icon below to find out what they are. (Click on the name that appears for more detailed information.) It is particularly useful for identifying escaped characters. Copy the escape(s) to the Find input area on UniView and click on Dec just below.

This app also provides a couple of extra tools below the main area. One converts between hex codepoint numbers, decimal codepoint numbers and characters. Just add one of the above and remove the focus, and the other fields will be updated. The other tool allows you to create a list of all characters in various legacy encodings.

utf-8	windows-1250	ISO 8859-2
big5	windows-1251	iso-8859-3
euc-jp	windows-1252	iso-8859-4
iso-2022-jp	windows-1253	iso-8859-5
shift_jis	windows-1254	iso-8859-6
euc-kr	windows-1255	iso-8859-7
gb18030	windows-1256	iso-8859-8
gbk	windows-1257	iso-8859-8-i
koi8-r	windows-1258	iso-8859-10
koi8-u	macintosh	iso-8859-13
select all	ibm866	iso-8859-14
clear all	windows-874	iso-8859-15
X	x-mac-cyrillic	iso-8859-16