UniView Help & User Guide

Updated Sun 11 Jun 2017 • tags uniview, apps

UniView is an HTML-based application for working with Unicode characters. You can look up or find characters (using graphics or fonts) and related information, view whole character blocks or custom ranges, select characters to paste into your document, paste in and discover unknown characters, search for characters using regular expressions, do hex/dec/ncr conversions, highlight character types, etc. etc. It supports Unicode 10.0 and is written with Web Standards to work on a variety of browsers.

You can find the source code on github, and a list of recent changes. If you spot a bug, please raise a github issue.

Click on the area you need help with:

Help selector.

Quick start

To get started, use the Show block selector, or add characters or code points to one of the fields on the Look up tab and hit the icon. UniView produces tables or lists of characters in the Character list panel (lower left).

By default, clicking on a character in a list or table displays detailed information for that character in the Character detail panel (to the right).

Use the  toggle switch to add the characters to the Text area instead. Alternatively, you can paste characters directly into the Text area and hit to discover what they are.

Lookup controls

Look up > Show block

Select a Unicode block from the pull-down list and the characters in the block will be displayed below. You can then click on characters to view detailed information about them or add them to the text area, etc.

Click on the icon to refresh the results.

Notes:

Look up > Show range

If you type or paste a start and end code point value (in hex) into this control, the characters in the range will display below. Note that this can only be one contiguous range.

Click on the icon to refresh the results.

If the range you select does not fill a whole column when displayed as a table, surrounding characters are greyed out. (When displaying as a list, you will only see the characters in the range.)

The Show range field will accept various formats, making it easier to paste a range from elsewhere. The numbers must be in hexadecimal form but can be separated by either a colon (the default), a hyphen, one or more spaces, or one or more periods. The code point values themselves can be in the following formats: 1234, ሴ, \1234;, \u1234, U+1234. The actual number of hex digits can be between 1 and 6.

Notes:

Look up > Find

This control accepts a list of codepoint values in hexadecimal or decimal, a list of characters or a search string. After adding content to the input box, click on the appropriate button below the control to activate lookup. (If the right button was already highlighted, you could just hit Enter or click on the down-arrow icon.

Click on the X icon to quickly clear the control.

Looking for hex or dec codepoints

By default the control expects hex codepoint values. To look up a decimal value, click on the Dec button.

This field is very forgiving about the format of the text entered into the box. Most types of character escapes will be recognised, and you can even paste in surrounding text. For example, UniView will detect and list the characters referred to by code point in the text "the decomposition mapping is <U+CE20, U+11B8>, and not <U+110E, U+1173, U+11B8>." Of course, this is not foolproof, but should provide the desired results most of the time.

Looking for characters

If the Characters button is selected, each character in the control will be processed. (Much of the time you will probably find it easier to use the text area for this, but this control can be used if you want to look up some characters without disturbing a set of characters you are building up or using in the text area.)

Using search

By selecting the Search button you can search for text in the Unicode database, and returns a list of matching characters.

You can use regular expressions in searches. For example, suppose you wanted to find all characters with the word 'tet'. You could type into the input field, \btet\b. The \b represents a word boundary. If you wanted to search for entries containing either the word 'tet' or the word 'tat' you can use the 'or' operator | as in \btet\b|\btat\b. (In UniView a colon can be used as a short form of \b so the example could have been written :tet:).

Another example: You want to search for 'alpha', but you only want results for the Latin characters (not the many Greek or mathematical results). Simply use the following search string latin.*alpha. The .* represents any number of intervening characters.

Basic regular expressions that work in JavaScript code should work.

By default searches match against character names and alternative names in the main Unicode database, and also searches the information displayed for an individual character under the heading Description in the right panel. You can limit the search using the Names, Descriptions and Other check boxes that appear above the input box. Other refers to alternative names. If you want to set these before searching, just click on the Search button with nothing in the input box. It will bring up the checkboxes, which you can set as required. Then just enter your search string and click again on the Search button.

You can also limit the search to the characters currently in a list or table. To limit the search, select the check box labelled Local. Matching characters will be highlighted. (You can then produce a list of just the highlighted characters by clicking on the icon next to Filter > Make list from highlights. If you need to refine your search, you could then search again on this list, and so on.)

To clear the highlighting after doing a local search, just clear the Find input box, and hit the Search button again.

Look up > Use graphics

If the check box is selected, characters (apart from those in the text area and notes area) will be shown as images, rather than text.

Dedicated images are available for all characters except for those in the large Han, Hangul and Tangut blocks. Many of those characters are however also available thanks to an agreement with the folks who run the decodeunicode server.

If you want to find fonts, check out the script resource pages. When you select a Unicode block, a pointer to the relevant page appears at the top right of the panel.

Look up > Show range as list

If the check box is selected when you select a range using Show block or Show range the characters will be displayed as a list, rather than a table.

You can also use this to switch between table and list views of a range you have just selected.

Look up > Show notes

If you display information about a character in the Character details panel (lower right) and there are notes available for that character they will be displayed at the bottom of the page if this control is selected.

The notes are not authoritative. They are compiled by myself as I learn about various scripts and characters. They are made available in case they are of use, but you should not assume that they are always correct, and they are certainly not complete.

Block info link

The block info link appears when you select a block for display. Click on it to open a page of useful links related to that block. (The page opens in another window.)

Text area

The Text area is associated with a set of controls for managing characters as text. It makes UniView like a character map or picker tool, but also much more.ou can add characters to the text area from other parts of UniView, or you can simply paste text into it for analysis.

The insertion point for characters echoed to the text area can be changed in most browsers by just clicking where you want characters to appear, but not Internet Explorer, where characters are always added to the end of the line. If you highlight a range of text, any typed or echoed characters will replace the highlighted range.

Text area controls. You can interact with the text in the text area using the icons below it. They are described here:

Character list panel

The character list panel is where you will see tables and lists of characters that you generate from elsewhere in UniView.

When characters are displayed in table format and you hover over a character in this area you will see the code point and name appear at the top of the panel.

Whether in table or list format, if you click on a character one of two things will happen. Either details for that character will appear to the right, or the character will be added to the text area. The former is the default, but the latter will happen if you have indicated that it should by clicking on the icon described above.

Unassigned character positions in a table are shown with a greyed out background (though you can change the colour, if you want).

Tags

A basic set of links is shown for each block or range of characters to allow you to quickly highlight characters with the property letter, mark, number, punctuation, or symbol. For more fine-grained property distinctions, see the Filter panel.

In addition, for some blocks there are other links available that reflect tags assigned to characters. This tagging is far from exhaustive! For instance, clicking on sanskrit will not show all characters used in Sanskrit.

The tags are just intended to be an aid to help you find certain characters quickly by exposing words that appear in the character descriptions or block subsection titles. For example, if you want to find the Bengali currency symbol, click on currency and all other characters but those related to currency will be dimmed.

(Since the highlight function is used for this, don't forget that, if you happen to highlight a useful subset of characters and want to work with just those, you can use the Make list from highlights command, or click on the upwards pointing arrow icon to move those characters into the text area.)

Character detail panel

This panel shows detailed information and links for a single character.

The previous and next clickable text at the top allows you to step through characters in the block one by one. It skips code values that don't have assignments, and at the end of a block it will move you to the first character of the following block, and vice versa.

Han and hangul characters you will see a link View in PDF code charts (pageXX). For Han blocks, this will open the PDF file for the block. The PDF is useful if there is no picture or font glyph for that character, but also allows you to see the variant forms of the character. Get more information.

Notes panel

The notes panel appears only when a check mark appears alongside Show notes in the Lookup panel. It also only appears if notes actually exist for character being displayed in the Character details panel.

The notes provide information I have gathered while learning about scripts, and should not be considered authoritative. Notes may change and new notes will be added from time to time.

If a note is displayed, you'll also find a link at the bottom of the Character details panel that allows you to open the full page from which the notes were taken. These pages carry annotations for a whole Unicode block or set of blocks.

Filter controls

Click on the area you need help with:

Filter selector.

Filter > Show properties

This control allows you to search for characters with a particular property. It creates a list of matching characters.

By default, searches match against the characters in a list or table in the Character list panel. Matching characters will be highlighted. (You can then produce a list of just the highlighted characters by clicking on the icon next to Filter > Make list from highlights. If you need to refine your search, you could then search again on this list, and so on.)

To enlarge the search to the whole of Unicode, deselect the check box labelled Local.

Filter > Make list from highlights

If you have highlighted items in a list, using the Find or Show properties controls, this control will remove all but the highlighted items from the list.

Filter > Make list from non-highlighted items

If you have highlighted items in a list in the lower left panel, using the Find or Show properties controls, this control will remove all the highlighted items from the list, leaving the non-highlighted items only.

Filter > Show age

This control allows you to see when a character was added to Unicode. It shows version numbers for characters added after Unicode version 1.1.

You can also find the same information on a character-by-character basis in the Character detail panel for that character.

To remove the information, click on the X icon alongside, or display new data in the left panel.

Options controls

Click on the area you need help with:

Option selector.

Options > Show U+ in lists

If this is checked, hex code point numbers in lists in lists will be preceded by U+. The default is just the number.

Options > List format

This allows you to change the order and items in lists appearing in the lower left panel. By default, you would see something like this:

0968 २ DEVANAGARI DIGIT TWO

With this control you can position the character before or after the number (or both!) or remove it altogether. You can also specify whether the list should show the number and/or the name of the character.

This control is provided for people who want some control over how the list will look when copied and pasted into their text.

Options > Hide numbers around matrix

This allows you to hide the column and row numbers around a table. The default is to show the numbers.

Options > Left panel max height

These controls allows you to change the height of the display box in the lower left panel. Just enter the height you want in pixels (just the number) or select a preset size.

This is particularly useful when you are dealing with lists on a small screen such as a netbook. If you set the height to something like 400px, you can scroll through a long list, but still see the details in the right panel when you click on a character in the list.

Options > Left panel font size

This control allows you to increase the size of the characters in the lower left panel (independently of text elsewhere on the page). Note: It has no effect when viewing characters as graphics.

To see all text larger, use the normal browser method for zooming (eg. Ctrl++ and Ctrl+- in browsers on Windows).

Tips and tricks

Quickly transfer all or one characters to the text area

Ensure that the large icon below the text area is NOT greyed out, ie. it should look like this: with the two arrows (if not, click or tap on it), then click on the character in the list or table. The character will be appended to the text area.

To copy all characters in the list or table into the text area, click on the upward pointing arrow icon, which can be found just below the text area.

Create a list of only those characters that are highlighted

On the Filter tab, click on the down-arrow icon next to Make list from highlights. The characters shown in the lower left panel will be reduced to a list of just those that were highlighted. (This is particularly useful for refining searches.)

Create a list of only those characters that are NOT highlighted

On the Filter tab, click on the down-arrow icon next to Make list from non-highlighted items. The characters shown in the lower left panel will be reduced to a list of just those that were not highlighted. (This is particularly useful for refining searches.)

Find out the decimal code point value for a character

Mouse over a character and the decimal code point value pops up in a tooltip. The decimal code point value is also shown in the right panel.

If mouseover doesn't work (for example on a mobile phone), display the details for the character and you will see the decimal code point listed.

Change the order or number of items on the lines in a list

This can be particularly useful when you want to copy and paste a list into another document. In the options, use the check boxes after List format to indicate what you want to see.

Switch between showing U+ before hex code point values

Toggle the check box labelled Show U+ in lists on the Options tab.

Link to information about Han characters and Hangul syllables

Information about Han characters will have a link View data in Unihan database. As expected, this opens a new window at the page of the Unihan database corresponding to this character.

Han and hangul characters also have a link View in PDF code charts (pageXX). For Han blocks, this will open the PDF file for that block at the page that lists this character on Firefox and Chrome. (For Safari and Edge you will need to scroll to the page indicated.) The PDF is useful if there is no picture or font glyph for that character, but also allows you to see the variant forms of the character.

For some Han blocks, the number of characters per page in the PDF file varies slightly. In this case you will see the text approx; you may have to look at a page adjacent to the one you are taken to for these characters.

Note that some of the PDF files are quite large. If the file size exceeds 3Mb, a warning is included.

Look up information about a character in other databases

The following options are available from the Character details panel.

Click on the CLDR's Property demo link. A new window will open to show the entry for that character in the CLDR database. This provides additional, less commonly used data and properties relating to the character.

Click on the decodeUnicode link. A new window will open to show the entry for that character in the decodeUnicode database. decodeUnicode is a wiki where people can provide information about characters.

decodeUnicode.org is a wiki where people can contribute information about Show blocks and characters. It is developed at the Department of Design at the University of Applied Sciences in Mainz. The project is supported by the Federal Ministry of Education and Research (BMBF) and has the objectives of creating a basis for fundamental typographic research and facilitating a textual approach to the characters of the world for all computer users. (They also provide the graphic versions of characters for UniView.)

Click on the FileFormat link. A new window will open to show the entry for that character in the FileFormat database.

The FileFormat pages provide useful information for Java and .Net programmers.

Click on the Conversion tool link. A new window will open to show a number of possible alternative representations of the character, eg. numeric character entity references, percent escaped forms, hex and decimal code point information, etc.

Display the block to which the character belongs

Click on the link next to the subheading Show block in the Character details panel and all characters in that block will be displayed as a list or table (according to your settings).

Customising the app

Using URIs to start up UniView with data in left or right panels

This is useful for pointing people to particular information using a URI, for example in email. By providing query parameters in the URI you can start up UniView with specific information displayed as follows:

You should only use one of these query parameters in a single call to UniView.

Acknowledgements and thanks

François Yergeau co-developed the Unicode Code Converter utility, and translated it into French.

Patrick Andries translated UniView into French, but that was many versions ago, and the French version is no longer available.

Change history

Changes in version 10.0.0

Changes to the database and representative shapes Unicode 10.0.0 were integrated into UniView.

Changes in version 9.0.0

Changes to the database and representative shapes made during the beta phase have now been integrated into UniView.

In addition, a quick filter facility was added just below the character list panel. This allows you to filter the characters according to some basic properties, and in some cases there are additional tags available, based on the words in the the character descriptions or block subsection titles. Such additional tagging is far from exhaustive or perfect, they are just intended to provide some additional help for finding certain characters.

(Since the highlight function is used for this, don't forget that, if you happen to highlight a useful subset of characters and want to work with just those, you can use the Make list from highlights command, or click on the upwards pointing arrow icon to move those characters into the text area.)

Changes in version 9.0.0beta

UniView now supports the characters introduced for the beta version of Unicode 9. Any changes made during the beta period will be added when Unicode 9 is officially released. (Images are not available for the Tangut additions, but the character information is available.)

It also brings in notes for individual characters where those notes exist, if Show notes is selected. These notes are not authoritative, but are provided in case they prove useful.

A new icon was added below the text area to add commas between each character in the text area.

Links to the help page that used to appear on mousing over a control have been removed. Instead there is a noticeable, blue link to the help page, and the help page has been reorganised and made easier to find information using image maps. The reorganisation puts more emphasis on learning by exploration, rather than learning by reading.

Various tweaks were made to the user interface.

Changes in version 8.0.0a

Han and hangul characters now have a link View in PDF code charts (pageXX). For Han blocks, this will open the PDF file for that block at the page that lists this character on Firefox and Chrome. (For Safari and Edge you will need to scroll to the page indicated.) The PDF is useful if there is no picture or font glyph for that character, but also allows you to see the variant forms of the character.

For some Han blocks, the number of characters per page in the PDF file varies slightly. In this case you will see the text approx; you may have to look at a page adjacent to the one you are taken to for these characters.

Note that some of the PDF files are quite large. If the file size exceeds 3Mb, a warning is included.

This version also fixed bugs in the display of some of the Han block information.

Changes in version 8.0.0

This version adds 1,945 new non-ideographic characters encoded in Unicode 8.0.0 (including 6 new scripts).

I also finally fixed the Show Age filter, and brought it up to date.

The github site now holds images for all 28,000+ Unicode codepoints other than Han ideographs and Hangul syllables (in two sizes).

Changes in version 7.0.0

This version updates the app per the changes during beta phase of the specification, so that it now reflects the finalised Unicode 7.0.0.

The initial in-app help information displayed for new users was significantly updated, and the help tab now links directly to the help page.

A more significant improvement was the addition of links to character descriptions (on the right) where such details exist. This finally reintegrates the information that was previously pulled in from a database. Links are only provided where additional data actually exists.

Rather than pull the data into the page, the link opens a new window containing the appropriate information. This has advantages for comparing data, but it was also the best solution I could find without using PHP (which is no longer available on the server I use). It also makes it easier to edit the character notes, so the amount of such detail should grow faster. In fact, some additional pages of notes were added along with this upgrade.

A pop-up window containing resource information used to appear when you used the query to show a block. This no longer happens.

Changes in version 7beta

This version adds the 2,834 new characters encoded in the Unicode 7.0.0 beta, including characters for 23 new scripts. It also simplified the user interface, and eliminated most of the bugs introduced in the quick port to JavaScript that was the previous version.

Some features that were available in version 6.1.0a are still not available, but they are minor.

Significant changes to the UI include the removal of the 'popout' box, and the merging of the search input box with that of the other features listed under Find.

In addition, the buttons that used to appear when you select a Unicode block have changed. Now the block name appears near the top right of the page with a I icon icon. Clicking on the icon takes you to a page listing resources for that block, rather than listing the resources in the lower right part of UniView's interface.

UniView no longer uses a database to display additional notes about characters. Instead, the information is being added to HTML files. When one of these files contains information for a particular block, you'll see a link to it from the detailed information for a particular character. At some future point, I may pull the information for that character into UniView, as before, but for the time being clicking on the Character notes link opens the page in a separate window. Initially such a link is only available for Tibetan, but I will add more from time to time.

Previous versions

Version history for previous versions of UniView can be found here.

This version 2017-06-11 16:47 GMT.  •  Copyright r12a@w3.org. Licence CC-By.