Find non-ASCII characters used by a given language, or languages that use a given non-ASCII character. The information may not be 100% reliable. Please read the notes below.

Find a language

Find by typing

Find a language by script
Find a language by region

Look up characters

Languages (italics indicate infrequent), out of Native speakers

Compare lists

Characters unique to this side:

Characters unique to this side:

Help

Updated Sun 21 Jul 2019 • tags counterstyles, scriptnotes, apps

This page allows you to track the correspondences between languages and non-ASCII Unicode characters. Much of the information is derived from CLDR and/or the Unicode site's UDHR transcripts, although some characters are added on the basis of other sources. It isn't claimed that this list is exhaustively correct, so you should treat it as an approximation.

To note:

  • ASCII characters are ignored.
  • Auxiliary characters from CLDR are shown as 'infrequent'. Every character that appears in a UDHR transcription is shown.
  • Characters shown for a language include all characters produced by applying uppercase, lowercase, NFC, and NFD to the set of characters attributed to that language by its source.
  • As mentioned above, the data is expected to be largely correct, but not 100%. There's no guarrantee that the CLDR source data is completely correct. I have already spotted some cases where changes are needed, and I hope that this tool will help spot further issues. In particular, data that is based on UDHR alone may be missing characters, just because they don't occur in that text (especially for scripts with a large syllabic repertoire). So the data should be treated with care. However, the data should be mostly correct, and I intend to fix it where errors come to light.
  • The Native speakers row or column indicates the estimated number of native speakers for all the languages listed, in order to give a rough idea of the prevalence of that character. It doesn't represent the number of people who speak it as a second language, and often that is a multiple of the native speaker total. However, this number also represents speakers rather than literate users, so they are potential users of the character. Depending on the language, therefore, the figures may be low or at least conservative for speakers of many languages, and possibly high for speakers of some languages (typically small languages, or when using an alternate orthography).
  • Chinese languages, Japanese, and Korean are not listed.

Tips:

  • Mouse over the characters displayed to see their Unicode code point value and name. The U icon will show all characters in that row in UniView.
  • If you don't have fonts for all the characters displayed, click on Convert to images. You can also change the font to one you have on your system, using the Change font to: control.
  • The line that starts with Non-ASCII character count allows you to copy lists of characters to the clipboard. Click on the words letters, marks, etc. to pick up the relevant items, or click on total to pick up everything except the infrequent use items.
  • The control Find by typing allows you to type in a name, or part of a name, of a language in order to find an option. Select the language you want from the suggestions offered. To see all options, just empty the box. (In Firefox you'll need to hit return again after selecting an item from the list of alternatives.)
  • When adding characters to the Look up characters field, you can add Unicode code point numbers with space to either side, or escapes. For example, for આ any of the following escapes will work: આ \u0A86 \u{A86} \0A86 U+0A86 0xA86. No extra space is needed between escapes, and supplementary characters work too.
  • After you have generated a list of languages that use a given character, if you click on a language name then details for that language will be displayed above.
  • To compare lists of characters, copy one set to the left box under Compare lists, and the other to the right box, then click on Compare. If both boxes are identical there will be no output, but if there are differences they will be displayed below the boxes.
  • You can automatically display data via the URL. For example, try https://r12a.github.io/app-charuse/?language=vi&charlist=đỹã.

Sources:

To do:

  • Add a graphic to show the number of speakers using a rectangle that grows with population.
  • Show character names in HTML rather than tooltips when doing mouseover?
  • Add symbols in the relevant block that are not included in the list? (Useful for checking the data.)
  • Allow multiple regions per language for things like English, Spanish, Portuguese, etc.?
Latest commit 2019-07-22 9:53 GMT. Make a comment. Licence CC-By © r12a