Intro
Herald: So welcome to this evening's next
talk with the wonderfully broken title, I
love it, "Emoji domains and how
wonderfully broken they are" by a very,
very wonderful person, Jennifer, who is a
web developer, and you wouldn't believe
it, her nick is "unicorn", here is a
unicorn! ... Hi! Jennifer this is, tell us
everything about emoji domains, and why
they are so rotten broken. You go!
dysphoricUnicorn: Yeah, thank you a lot.
Exactly - are we speaking about these
wonderfully broken things and the talk
will be kind of like... I start with a bit
of an intro dump about the history of
emoji domains and what they actually are
and then I will talk about my personal
experience breaking things with them. So
yeah, let's start right of with the
history. So, DNS were standardized in
1987, with a very limited character set.
So, you can see, like, only roman naturals
and some numbers, and, like, four non-
letters. So these are definitely not
sufficient for many languages and it's a
very euro-centric view, or not even just
euro-centric, but it's actually very
centered on the english language and it
was clear that this won't suffice so in
1996, internationalized domain names were
posed, which allow encoding characters
that are not supported or that are not
officially supported into this very small
character set so that browsers could
simply convert them on the fly. This...
sources kind of disagree when this exactly
went live or when you could start it, when
you were able to use it for the first
time. The IDNA2003 standard allowed the
support, but the first emoji domains were
actually registered in 2001. Interesting
about these is that, in 2001, emojis
weren't part of Unicode yet. So you can
see these examples, like the "hot springs"
those do show as emoji. which is because
they are both emoji and Unicode pictographs.
So, not actually emoji domains at the
time, but right now, they were kind of
converted into emojis. Back then, they
were just pictographs. I couldn't really
find out if those domains actually
resolved if you entered the pictographs
back then or if it was just someone who
just was hoping they would rise in price
once IDNA2003 or whatever standard would
implement it, went live. So there was also
an IDNA2003 normalization, but that is not
too interesting for us because we just
want to look at the emoji side of things.
IDNA2008 actually banned emoji for most
major TLDs, because of concerns that it
would be used for phishing domains that
looked very similiar to actual, other
domains. Like every character exists as an
emoji, to be able to make to make country
flags, so that could be used for phishing
and they decided to ban it for most major
TLDs that comply with IDNA2003. Important
to my little story, in 2020, the emoji 13
standard added transgender pride flag
emoji. You'll see why that's important
later. So what actually is this punycode
encoding? It's non-human readable representation
of Unicode characters. So you can see this
symbol here would be translated x-n-dash-
dash C-8-H, which obviously doesn't make
much sense to type in but your browser
would take care of this. So, DNS didn't have to
be changed, it's only inside your browser
that these conversions happen. Compatible
browsers, depending on which browser you
use, will either intransparently or
semitransparently translate, Firefox for
example, as a mitigation to these phishing
attempts, does allow you to enter emoji or
other Unicode characters, but as soon as
you hit enter it will, the URL bar will
show this xn-dash-dash domain. Safari, as
far as I know, does not do it
transparently, so you will not know what
exactly the punycode representation is of
what you were just enterin'. And different
TLDs only support a specific subset as I
said, IDNA2008 actually banned it. Fun
fact, I forgot on the last slide: IDNA2008
went live in 2010 which is kind of
confusing, but whatever. Different TLDs
only support specific charsets, most don't
support emoji, but there are TLDs that
have "supporting emoji" as their main
selling point. TLDs that most people
wouldn't want to use unless they just
simply are interested in emoji. Why did I
end up breaking things with it? In early
2011... not 2011, 2021... this year - I
was unemployed and looking for interesting
ways to build my portfolio. I knew that
emoji were somewhat supported but I didn't
know what, how exactly it worked, I just
knew that there were some people that had
emoji domains and I was kind of happy that
there was a transgender pride emoji
added, so I decided, well, maybe it's a
good idea to add some domain that contains
this transgender pride emoji to also kind
of become less interesting for bigoted
potential employers. So, yeah, let's
register domain with that emoji. Well...
that seems to be a bit more difficult
because these domains, even though you
never really counter them, seemed to be
sold out. Nothing that I looked up worked,
and actually the web interface broke a
bit, but more to that later. Well... none
of these domains actually resolve to
anything: .dev does not support emoji at
all and namecheap doesn't support emoji
even with top-level domains that do
support them. So, I had to go to another
registrar, which was a bit annoying
because I thought, well, I like everything
in one place, not specifically I love
namecheap or anything. But, whatever. Few
months later, I am now the proud owner of
"transgender pride flag purple heart .
ws". At least, that what I think. So, I
just set up to build a small demo page for
it, and deploy it on my server and test it
and - wow. My server usually isn't that
slow. Timeouts... the route looks okay
inside my reverse proxy, trying again, and
after long time, I end up with this
wonderful error message. So we're sorry,
that domain is invalid. It also does not
show the transgender pride flag anymore,
but that could be down to the simply their
webfont not supporting it yet because it
was just added to emoji 13, at least
that's what I thought at that point.
Obviously, I was a bit scared because,
well I just spent 10 euros at something
and... I didn't really know when I would
have a stable income again so I did this
to find a new job and german unemployment
benefits are really difficult to get, so I
was a bit scared, but godaddy didn't sell
me some invalid domain or they also
definitely did not scam me, because if you
enter these exact characters that
apparently are invalid, it does resolve to
my server. So when I looked at the
godaddy web interface, it also showed
these three characters, the purple heart,
the white flag and the transgender symbol.
It's simply not the domain that I had
entered into the emoji domain search
engine. Wasn't just their webfont that
doesn't support it. And that is caused by
the wonderful zero-width joiners. To avoid
having tons of similar emoji, each with
their own code, many emoji are created by
combining others. So you have the skintone
modifiers for example or the country
flags, that are a combination of different
emoji with a zero-width joiner. The
transgender pride flag is a combination of
a white flag and a transgender symbol with
a zero-width joiner inbetween. And the
thing is, punycode does not really support
them so it was simply just dropped during
conversion while I bought my domain. But
that's not everything. Because I still had
this project, I still wanted emoji domains
and my interest was peaked so I wanted to
try out what else I could break. To avoid
spending even more money on this
project, I just moved my testing to sub
domains which was a good idea because I
have way more control over sub domains
than I have over regular ones. I can
register them with any registrar, so I
could use my go-to registrar. I can
register whatever strings I want, so even
invalid punycode. I can register them
under a TLD that does not allow it because
it's not a second-level domain but a
third-level domain. And, yeah, let's see
what browsers do about that. So I created
the sub domain "transgender pride flag .
dysphoric . dev". Firefox converts it to
xn-- and I'm not gonna say all that.
Chromium converts it to a different
string. Which, if you plug any of those
into a converter, it will tell you
that both are invalid punycode. However,
both are understood and routed, so I just
simply added an [unintelligible] all-route
to my reverse proxy, so that both would
work. If you use dig, which is a command-
line tool that lets you look up domain
records - first of all, it doesn't do the
punycode conversion at all, so I had to
use one of the strings that one of my
browsers gave me, but when I use that
string it also gave me this "It's not a
valid IDNA2008 name. Disable validation
using these tool parameters." also didn't
tell me that I needed both. So I added the
first and then, oh, you still need the
second. But, whatever. Once both were
added, I was able to get correct results
and my site was reachable. The next thing
I thought of was, what if I will move my
domain to a non-supported registrar,
because as I just talked about, namecheap
does not actually allow emoji domains and I
was interested to see how their web interface
would handle it. Sadly, it simply did not
handle at all, because they don't support
.ws domains. I wasn't really going to
contact their support team to try and
still get it because this was only a
simple thing that I will probably just
simply not interested in hosting that
domain because it breaks their web
interface if you try to. Or other things
about emoji domains break their web
interface, so I don't really see why their
support team would actually be on my side
here. So, what about email? Because,
apparently, email clients really enjoy
breaking. From my experience at least. Do
they break with emoji? When trying to add
an emoji domain as a sender, my mail
server actually broke because validation
was run after punycode to unicode
conversion, which caused an uncaught
exception, which was suprising, it's
already fixed but the patch is not
released yet so I couldn't yet test it.
But there's still the local part which I
could already control as much as I wanted
to and the [unintelligible] so Thunderbird
simply ignored it and showed the punycode
and Apple Mail dropped the zero-width
joiner and also showed the punycode under
the thing where it shows the exact domain.
So, mixed results, nothing too spectacular,
no exceptions or crashing clients or
anything interesting like that, sadly.
What did I learn doing this? Well,
obviously emoji domains are very buggy.
Implementations vary from browser to
browser so you can have the same input
string and get different punycodes out of
it, so testing in just one browser
definitely is not enough, well, it never
is, but here especially it isn't. And, you
may be able to buy a domain that won't
work as you would think which can cause
quite the annoyance. But it's still a lot
of fun to mess around with this stuff,
just not for productive use. I like to end
my talks but telling people to join a
labor union that doesn't have anything to
do with this but that's what I do for some
reason. And I've got also a blog post
about this where I've written it up and I
would publish the slides under the
wonderful domain "poop emoji nycode . ws".
It's just a link to my regular blog for
now. I'm sorry. I think I went a bit fast
but I still thank you for your time and
I'm open to questions.
Herald: [talks, but no sound is audible]
Herald: I'm online... oops, I'm sorry. I'm
awfully sorry, my machine is slow. I muted
myself about half a minute ago. Thank you
for that beautiful talk, Jennifer. I had
to grin a couple of times, because it was
great and it made my day. And actually we
have a question. The question is in
German, I'll say it in English: why is DNSSec
so complicated for emoji domains?
dysphoricUnicorn: Well, because no one
actually really likes emoji domains except
the people who sell them. At least that
was my experience looking up things for
that. So, they are kind of disallowed in the
standard, but just some of top level domains just
ignore the standard and still let you register
them and it's just something that people
will implement things don't want to think
about at all. I haven't actually tried
DNSSec, but it's just something that is
easilly forgotten because it shouldn't
actually exist, which may
be a bit harsh, but...
Herald: Is - you remember the ringtone
fads when smartphones didn't exist yet -
is this just a fad like this ringtone
thing and it will just disappear within
the next couple of years or would you
think emojis are here to stay? Is this
serious?
dysphoricUnicorn: I think emojis are here
to stay but not within domains or... like,
it was possible since 2001, kind of, but
at least since 2011 where the first actual
emoji domain was registered. But most
domains that are, like, popular examples
already don't resolve anymore or resolve
to sites that say "emoji domains". So,
emoji domains definitely are not much more
than a fad or a nice, funny thing to just
look at for a bit. However, emojis as a
whole are such a large part of our
culture, I don't think they're going to go
away any time soon because it's been more
than ten years and the annoying
downloadable ringtones were popular
for a bit less time, I think.
Herald: This is a question that I actually
wanted to ask myself as well, because I
run my own email server as well and...
which email server software do you talk
about? Do you know about
supporting the others?
dysphoricUnicorn: Errm...
Herald: What do you use as a software on
your email server?
dysphoricUnicorn: My email server is
running on mailhue, which is a set of
Docker containers that are specially made
to work together to make setting up an
email server as painless as possible for
free. So I haven't actually tested any
other servers, however in theory they
shouldn't actually have any issues. So, the part
of mailhue that failed wasn't actually the
mail server part. It was simply a parser.
So, in theory, with another mail server,
it should work, if they didn't also mess
up parsing at some point.
Herald: Somebody asked here, is there a
list of top-level domains that support
emojis and somebody posted and answering
Wikipedia, is that correct? Wikipedia has
such a list?
dysphoricUnicorn: It has, but it isn't
actually correct, the list, that it has it
is the english Wikipedia. It lists at
least one domain that no longer supports
emojis which is actually kind of a big
political thing where they removed
support. So, the Wikipedia list is not
complete or contains too much. There are,
however, registrars, that are specialized
in emoji domains and those will have
current lists. So, I had .ws as one of
them. It's not the red heart emoji, though
because that's invalid punycode and so I
don't really know what to enter in my URL
bar to get to them other than searching
it on Google, so...
Herald: laughs Next question, is there a
difference between single punycode and
multiple emoji chained together as a
second or third level domain?
dysphoricUnicorn: It's just different
punycode, depending on how many emoji you
have but theoretically, the implementation
for this would just, I think the technical
term was ASCII-to-Unicode something, which
is like, an algorithm to convert it, does
handle multiple emoji similarly. Or - it
should work without any
issues if one of the two works.
Herald: Are there any emoji
first-level domains?
dysphoricUnicorn: No. There are not. There
are punycode first-level domains, because
there are languages that simply do not use
the same letters as english does, so
punycode first-level domains are existent
but no emoji first-level domains at this
point. Maybe there will be, but I kind of
doubt it because the people in charge of
this emoji domains are kind of an eye sore
to them from what I could read, so...
Herald: Talking about eye sores: I always
have the impression, that at least to the
old coders, diacritical signs in
themselves were considered an eye sore.
You know, that funny little dots those
German speaking people have up there.
Don't talk about the Czech and the Poles.
Now, my name contains such a diacritical
sign, my first name is André and I've been
fighting with all kinds of inputs that say
7 Bit ASCII and nothing else. Do
diacritical signs still break domains?
dysphoricUnicorn: They should not, because
are actually reason why IDN's exist. So it
was actually proposed by someone who has
one of those sign in his name and probably
just wanted the domain with his name. This
was the actual reason why we have punycode
in the first place and supporting emoji
was kind of an unwanted side effect. So in
theory, it should work without issues but
still many people don't think about it
enough when implementing their own thing,
so you can never be too certain that it
will. But it should.
Herald: seventy posted here, seventy
obviously runs a Windows, and in Windows
10, the emoji menu with the combination of
the Windows and the full stop. Is that
common already or is that new? I think
it's common by now, it's been implemented
and ever since then everybody's been using
emojis. And there is also a remark that
says "MS Outlook has actually pretty good
unicode-punycode support but still don't
try emojis". I remember a story about when
the Bosnian wars broke, when the Yugoslav
war broke, especially the ones in Bosnia
broke out, there were about a hundred
thousand Bosnians that fled to
Switzerland, and about fifteen thousand
were granted citizenship, but they
couldn't be registered in the citizenship
register, because that only supported
7-bit or 8-bit ASCII but no diacritical
sign of [unintelligible]. I think they
fixed it by now but that was quite a thing
some years back. I see no further
question, - oh, there is one ... coughs
... one... coughs excuse me that came in
right now... coughs... is there a
uniform way to generate punycode over
multiple platforms? Mobiles do not work
well with entering unicode numbers as we
all know.
dysphoricUnicorn: I'm not sure I
understood this correctly. The easiest
way that I used during my testing was a
simple online converters that would work
on every page. And actually my system
doesn't have a shortcut for emoji so I
would always copy and paste from
emojipedia into an online punycode converter
and just use it from there. Because I
don't actually use emoji that much.
Herald: Okay, we've come to the end of our
time. We still would have another minute
or two, but we have no more questions.
Thank you in the meantime for coming and
holding this talk. You have another talk.
I think it's tomorrow?
Outro
Subtitles created by c3subtitles.de
in the year 2021. Join, and help us!