Factoids listed at the start of the EURid/UNESCO World Report on IDN Deployment 2013
5.1 million IDN domain names
Only 2% of the world’s domain names are in non-Latin script
The 5 most popular browsers have strong support for IDNs in their latest versions
Poor support for IDNs in mobile devices
92% of the world’s most popular websites do not recognise IDNs as URLs in links
0% of the world’s most popular websites allow IDN email addresses as user accounts
99% correlation between IDN scripts and language of websites (Han, Hangkuk, Hiragana, Katakana)
About two weeks ago I attended the part of a 3-day Asia Pacific Top
Level Domain Association (APTLD) meeting in Oman related to ‘Universal
Acceptance’ of Internationalized Domain Names (IDNs), ie. domain names
using non-ASCII characters. This refers to the fact that, although IDNs
work reasonably well in the browser context, they are problematic when
people try to use them in the wider world for things such as email and
social media ids, etc. The meeting was facilitated by Don Hollander, GM
Here’s a summary of information from the presentations and discussions.
(By the way, Don Hollander and Dennis Tan Tanaka, Verisign, each gave
talks about this during the MultilingualWeb workshop in Madrid the week
before. You can find links to their slides from the event program.)
International Domain Names (IDNs) provide much improved accessibility
to the web for local communities using non-Latin scripts, and are
expected to particularly smooth entry for the 3 billion people not yet
web-enabled. For example, in advertising (such as on the side of a bus)
they are easier and much faster to recognise and remember, they are also
easier to note down and type into a browser.
The biggest collection of IDNs is under .com and .net, but there are
new Brand TLDs emerging as well as IDN country codes. On the Web there
is a near-perfect correlation between use of IDNs and the language of a
The problems tend to arise where IDNs are used across cultural/script
boundaries. These cross-cultural boundaries are encountered not just
by users but by implementers/companies that create tools, such as email
clients, that are deployed across multilingual regions.
It seems to be accepted that there is a case for IDNs, and that they
already work pretty well in the context of the browser, but problems in
widespread usage of internationalized domain names beyond the browser
are delaying demand, and this apparently slow demand doesn’t convince
implementers to make changes – it’s a chicken and egg situation.
The main question asked at the meeting was how to break the vicious
cycle. The general opinion seemed to lean to getting major players like
Google, Microsoft and Apple to provide end-to-end support for IDNs
throughout their produce range, to encourage adoption by others.
Domain names are used beyond the browser context. Problem areas include:
- email clients generally don’t support use of non-ascii email addresses
- standards don’t address the username part of email addresses as well as domain
- there’s an issue to do with smptutf8 not being visible in all the right places
- you can’t be sure that your email will get through, it may be dropped on the floor even if only one cc is IDN
- applications that accept email IDs or IDNs
- even Russian PayPal IDs fail for the .рф domain
- things to be considered include:
- plain text detection: you currently need http or www at start in google docs to detect that something is a domain name
- input validation: no central validation repository of TLDs
- rendering: what if the user doesn’t have a font?
- storage & normalization: ids that exist as either IDN or punycode are not unique ids
- security and spam controls: Google won’t launch a solution without
resolving phishing issues; some spam filters or anti-virus scanners
think IDNs are dangerous abnormalities
- other integrations: add contact, create mail and send mail all show different views of IDN email address
- search: how do you search for IDNs in contacts list?
- search in general already works pretty well on Google
- I wasn’t clear about how equivalent IDN and Latin domain names will be treated
- mobile devices: surprisingly for the APTLD folks, it’s harder to
find the needed fonts and input mechanisms to allow typing IDNs in
- consistent rendering:
- some browsers display as punycode in some circumstances – not very user friendly
- there are typically differences between full and hybrid (ie. partial) int. domain names
- IDNs typed in twitter are sent as punycode (mouse over the link in the tweet on a twitter page)
Google are working on enabling IDN’s throughout their application
space, including Gmail but also many other applications – they pulled
back from fixing many small, unconnected bugs to develop a company wide
strategy and roll out fixes across all engineering teams. The Microsoft
speaker echoed the same concerns and approaches.
In my talk, I expressed the hope that Google and MS and others would
collaborate to develop synergies and standards wherever feasible.
Microsoft, also called for a standard approach rather than in-house,
proprietary solutions, to ensure interoperability.
However, progress is slow because changes need to be made in so many places, not just the email client.
Google expects to have some support for international email addresses
this summer. You won’t be able to sign up for Arabic/Chinese/etc email
addresses yet, but you will be able to use Gmail to communicate with
users on other providers who have internationalized addresses. Full
implementation will take a little longer because there’s no real way to
test things without raising inappropriate user expectations if the
system is live.
SaudiNIC has been running Arabic emails for some time, but it’s a
home-grown and closed system – they created their own protocols, because
there were no IETF protocols at the time – the addresses are actually
converted to punycode for transmission, but displayed as Arabic to the
Google uses system information about language preferences of the user
to determine whether or not to display the IDN rather than punycode in
Chrome’s address bar, but this could cause problems for people using a
shared computer, for example in an internet café, a conference laptop
etc. They are still worrying about users’ reactions if they can’t
read/display an email address in non-ASCII script. For email, currently
they’re leaning towards just always showing the Unicode version, with
the caveat that they will take a hard line on mixed script (other than
something mixed with ASCII) where they may just reject the mail.
A trend to note is a growing number of redirects from IDN to ASCII, eg. http://правительство.рф page shows http://government.ru in the address bar when you reach the site.
All the Arabic email addresses I saw were shown fully right to left,
ie. <tld><domain>@<username>. I wonder whether this
may dislodge some of the hesitation in the IETF about the direction in
which web addresses should be displayed – perhaps they should therefore
also flow right-to-left?? (especially if people write domain names
without http://, which these guys seem to think they will).
Many of the people in the room wanted to dispense with the http://
for display of web addresses, to eliminate the ASCII altogether, also
get rid of www. – problem is, how to identify the string as a domain
name – is the dot sufficient?? We saw some examples of this, but they
had something like “see this link” alongside.
By the way, Google is exploring the idea of showing the user, by
default, only the domain name of a URL in future versions of the Chrome
browser address bar. A Google employee at the workshop said “I think
URLs are going away as far as something to be displayed to users – the
only thing that matters is the domain name … users don’t understand the
rest of the URL”. I personally don’t agree with this.
One participant proposed that government mandates could be very
helpful in encouraging adaptation of technologies to support
international domain names.
I gave a talk and was on a panel. Basically my message was:
Most of the technical developments for IDN and IRIs were developed at
the IETF and the Unicode Consortium, but with significant support by
people involved in the W3C Internationalization Working Group. Although
the W3C hasn’t been leading this work, it is interested in understanding
the issues and providing support where appropriate. We are, however,
also interested in wider issues surrounding the full path name of the
URL (not just the domain name), 3rd level domain labels, frag ids, IRI
vs punycode for domain name escaping, etc. We also view domain names as
general resource identifiers (eg. for use in linked data), not just for
a web presence and marketing.
I passed on a message that groups such as the Wikimedia folks I met
with in Madrid the week before are developing a very wide range of fonts
and input mechanisms that may help users input non-Latin IDs on
terminals, mobile devices and such like, especially when travelling
abroad. It’s something to look into. (For more information about
Wikimedia’s jQuery extensions, see here and here.)
I mentioned the idea of bidi issues related to both the overall
direction of Arabic/Hebrew/etc URLs/domain names, and the more difficult
question about to handle mixed direction text that can make the logical
http://www.oman/muscat render to the user as http://www.muscat/oman
when ‘muscat’ and ‘oman’ are in Arabic, due to the default properties of
the Unicode bidi algorithm. Community guidance would be a help in
resolving this issue.
I said that the W3C is all about getting people together to find
interoperable solutions via consensus, and that we could help with
networking to bring the right people together. I’m not proposing that
we should take on ownership of the general problem of Universal
Acceptance, but I did suggest that if they can develop specific
objectives for a given aspect of the problem, and identify a natural
community of stakeholders for that issue, then they could use our
Community Groups to give some structure to and facilitate discussions.
I also suggested that we all engage in grass-roots lobbying, requesting that service/tool providers allow us to use IDNs.
At the end of the first day, Don Hollander summed up what he had gathered from the presentations and discussions as follows:
People want IDNs to work, they are out there, and they are not going
away. Things don’t appear quite so dire as he had previously thought,
given that browser support is generally good, closed email communities
are developing, and search and indexing works reasonably well. Also
Google and Microsoft are working on it, albeit perhaps slower than
people would like (but that’s because of the complexity involved). There
are, however, still issues.
The question is how to go forward from here. He asked whether APTLD
should coordinate all communities at a high level with a global
alliance. After comments from panelists and participants, he concluded
that APTLD should hold regular meetings to assess and monitor the
situation, but should focus on advocacy. The objective would be to
raise visibility of the issues and solutions. “The greatest
contribution from Google and Microsoft may be to raise the awareness of
their thousands of geeks.” ICANN offered to play a facilitation role
and to generate more publicity.
One participant warned that we need a platform for forward motion,
rather than just endless talking. I also said that in my panel
contributions. I was a little disappointed (though not particularly
surprised) that APTLD didn’t try to grasp the nettle and set up
subcommittees to bring players together to take practical steps to
address interoperable solutions, but hopefully the advocacy will help
move things forward and developments by companies such as Google and
Microsoft will help start a ball rolling that will eventually break the