< Return to Video

#DiVOC R2R - Emoji domains and how wonderfully broken they are

  • 0:00 - 0:17
    Intro
  • 0:17 - 0:24
    Herald: So welcome to this evening's next
    talk with the wonderfully broken title, I
  • 0:24 - 0:30
    love it, "Emoji domains and how
    wonderfully broken they are" by a very,
  • 0:30 - 0:39
    very wonderful person, Jennifer, who is a
    web developer, and you wouldn't believe
  • 0:39 - 0:47
    it, her nick is "unicorn", here is a
    unicorn! ... Hi! Jennifer this is, tell us
  • 0:47 - 0:54
    everything about emoji domains, and why
    they are so rotten broken. You go!
  • 0:54 - 1:02
    dysphoricUnicorn: Yeah, thank you a lot.
    Exactly - are we speaking about these
  • 1:02 - 1:09
    wonderfully broken things and the talk
    will be kind of like... I start with a bit
  • 1:09 - 1:13
    of an intro dump about the history of
    emoji domains and what they actually are
  • 1:13 - 1:20
    and then I will talk about my personal
    experience breaking things with them. So
  • 1:20 - 1:26
    yeah, let's start right of with the
    history. So, DNS were standardized in
  • 1:26 - 1:37
    1987, with a very limited character set.
    So, you can see, like, only roman naturals
  • 1:37 - 1:47
    and some numbers, and, like, four non-
    letters. So these are definitely not
  • 1:47 - 1:52
    sufficient for many languages and it's a
    very euro-centric view, or not even just
  • 1:52 - 1:59
    euro-centric, but it's actually very
    centered on the english language and it
  • 1:59 - 2:07
    was clear that this won't suffice so in
    1996, internationalized domain names were
  • 2:07 - 2:15
    posed, which allow encoding characters
    that are not supported or that are not
  • 2:15 - 2:21
    officially supported into this very small
    character set so that browsers could
  • 2:21 - 2:31
    simply convert them on the fly. This...
    sources kind of disagree when this exactly
  • 2:31 - 2:36
    went live or when you could start it, when
    you were able to use it for the first
  • 2:36 - 2:43
    time. The IDNA2003 standard allowed the
    support, but the first emoji domains were
  • 2:43 - 2:50
    actually registered in 2001. Interesting
    about these is that, in 2001, emojis
  • 2:50 - 2:57
    weren't part of Unicode yet. So you can
    see these examples, like the "hot springs"
  • 2:57 - 3:03
    those do show as emoji. which is because
    they are both emoji and Unicode pictographs.
  • 3:03 - 3:10
    So, not actually emoji domains at the
    time, but right now, they were kind of
  • 3:10 - 3:17
    converted into emojis. Back then, they
    were just pictographs. I couldn't really
  • 3:17 - 3:20
    find out if those domains actually
    resolved if you entered the pictographs
  • 3:20 - 3:25
    back then or if it was just someone who
    just was hoping they would rise in price
  • 3:25 - 3:33
    once IDNA2003 or whatever standard would
    implement it, went live. So there was also
  • 3:33 - 3:39
    an IDNA2003 normalization, but that is not
    too interesting for us because we just
  • 3:39 - 3:46
    want to look at the emoji side of things.
    IDNA2008 actually banned emoji for most
  • 3:46 - 3:50
    major TLDs, because of concerns that it
    would be used for phishing domains that
  • 3:50 - 3:57
    looked very similiar to actual, other
    domains. Like every character exists as an
  • 3:57 - 4:06
    emoji, to be able to make to make country
    flags, so that could be used for phishing
  • 4:06 - 4:14
    and they decided to ban it for most major
    TLDs that comply with IDNA2003. Important
  • 4:14 - 4:24
    to my little story, in 2020, the emoji 13
    standard added transgender pride flag
  • 4:24 - 4:33
    emoji. You'll see why that's important
    later. So what actually is this punycode
  • 4:33 - 4:40
    encoding? It's non-human readable representation
    of Unicode characters. So you can see this
  • 4:40 - 4:46
    symbol here would be translated x-n-dash-
    dash C-8-H, which obviously doesn't make
  • 4:46 - 4:51
    much sense to type in but your browser
    would take care of this. So, DNS didn't have to
  • 4:51 - 4:57
    be changed, it's only inside your browser
    that these conversions happen. Compatible
  • 4:57 - 5:02
    browsers, depending on which browser you
    use, will either intransparently or
  • 5:02 - 5:08
    semitransparently translate, Firefox for
    example, as a mitigation to these phishing
  • 5:08 - 5:16
    attempts, does allow you to enter emoji or
    other Unicode characters, but as soon as
  • 5:16 - 5:23
    you hit enter it will, the URL bar will
    show this xn-dash-dash domain. Safari, as
  • 5:23 - 5:30
    far as I know, does not do it
    transparently, so you will not know what
  • 5:30 - 5:36
    exactly the punycode representation is of
    what you were just enterin'. And different
  • 5:36 - 5:44
    TLDs only support a specific subset as I
    said, IDNA2008 actually banned it. Fun
  • 5:44 - 5:50
    fact, I forgot on the last slide: IDNA2008
    went live in 2010 which is kind of
  • 5:50 - 5:58
    confusing, but whatever. Different TLDs
    only support specific charsets, most don't
  • 5:58 - 6:02
    support emoji, but there are TLDs that
    have "supporting emoji" as their main
  • 6:02 - 6:10
    selling point. TLDs that most people
    wouldn't want to use unless they just
  • 6:10 - 6:21
    simply are interested in emoji. Why did I
    end up breaking things with it? In early
  • 6:21 - 6:27
    2011... not 2011, 2021... this year - I
    was unemployed and looking for interesting
  • 6:27 - 6:33
    ways to build my portfolio. I knew that
    emoji were somewhat supported but I didn't
  • 6:33 - 6:38
    know what, how exactly it worked, I just
    knew that there were some people that had
  • 6:38 - 6:43
    emoji domains and I was kind of happy that
    there was a transgender pride emoji
  • 6:43 - 6:48
    added, so I decided, well, maybe it's a
    good idea to add some domain that contains
  • 6:48 - 6:55
    this transgender pride emoji to also kind
    of become less interesting for bigoted
  • 6:55 - 7:05
    potential employers. So, yeah, let's
    register domain with that emoji. Well...
  • 7:05 - 7:09
    that seems to be a bit more difficult
    because these domains, even though you
  • 7:09 - 7:16
    never really counter them, seemed to be
    sold out. Nothing that I looked up worked,
  • 7:16 - 7:24
    and actually the web interface broke a
    bit, but more to that later. Well... none
  • 7:24 - 7:28
    of these domains actually resolve to
    anything: .dev does not support emoji at
  • 7:28 - 7:33
    all and namecheap doesn't support emoji
    even with top-level domains that do
  • 7:33 - 7:41
    support them. So, I had to go to another
    registrar, which was a bit annoying
  • 7:41 - 7:46
    because I thought, well, I like everything
    in one place, not specifically I love
  • 7:46 - 7:52
    namecheap or anything. But, whatever. Few
    months later, I am now the proud owner of
  • 7:52 - 8:00
    "transgender pride flag purple heart .
    ws". At least, that what I think. So, I
  • 8:00 - 8:05
    just set up to build a small demo page for
    it, and deploy it on my server and test it
  • 8:05 - 8:14
    and - wow. My server usually isn't that
    slow. Timeouts... the route looks okay
  • 8:14 - 8:23
    inside my reverse proxy, trying again, and
    after long time, I end up with this
  • 8:23 - 8:31
    wonderful error message. So we're sorry,
    that domain is invalid. It also does not
  • 8:31 - 8:36
    show the transgender pride flag anymore,
    but that could be down to the simply their
  • 8:36 - 8:41
    webfont not supporting it yet because it
    was just added to emoji 13, at least
  • 8:41 - 8:45
    that's what I thought at that point.
    Obviously, I was a bit scared because,
  • 8:45 - 8:52
    well I just spent 10 euros at something
    and... I didn't really know when I would
  • 8:52 - 8:58
    have a stable income again so I did this
    to find a new job and german unemployment
  • 8:58 - 9:06
    benefits are really difficult to get, so I
    was a bit scared, but godaddy didn't sell
  • 9:06 - 9:14
    me some invalid domain or they also
    definitely did not scam me, because if you
  • 9:14 - 9:20
    enter these exact characters that
    apparently are invalid, it does resolve to
  • 9:20 - 9:25
    my server. So when I looked at the
    godaddy web interface, it also showed
  • 9:25 - 9:30
    these three characters, the purple heart,
    the white flag and the transgender symbol.
  • 9:30 - 9:35
    It's simply not the domain that I had
    entered into the emoji domain search
  • 9:35 - 9:41
    engine. Wasn't just their webfont that
    doesn't support it. And that is caused by
  • 9:41 - 9:48
    the wonderful zero-width joiners. To avoid
    having tons of similar emoji, each with
  • 9:48 - 9:53
    their own code, many emoji are created by
    combining others. So you have the skintone
  • 9:53 - 9:58
    modifiers for example or the country
    flags, that are a combination of different
  • 9:58 - 10:03
    emoji with a zero-width joiner. The
    transgender pride flag is a combination of
  • 10:03 - 10:09
    a white flag and a transgender symbol with
    a zero-width joiner inbetween. And the
  • 10:09 - 10:14
    thing is, punycode does not really support
    them so it was simply just dropped during
  • 10:14 - 10:26
    conversion while I bought my domain. But
    that's not everything. Because I still had
  • 10:26 - 10:34
    this project, I still wanted emoji domains
    and my interest was peaked so I wanted to
  • 10:34 - 10:41
    try out what else I could break. To avoid
    spending even more money on this
  • 10:41 - 10:47
    project, I just moved my testing to sub
    domains which was a good idea because I
  • 10:47 - 10:51
    have way more control over sub domains
    than I have over regular ones. I can
  • 10:51 - 10:57
    register them with any registrar, so I
    could use my go-to registrar. I can
  • 10:57 - 11:03
    register whatever strings I want, so even
    invalid punycode. I can register them
  • 11:03 - 11:08
    under a TLD that does not allow it because
    it's not a second-level domain but a
  • 11:08 - 11:16
    third-level domain. And, yeah, let's see
    what browsers do about that. So I created
  • 11:16 - 11:22
    the sub domain "transgender pride flag .
    dysphoric . dev". Firefox converts it to
  • 11:22 - 11:29
    xn-- and I'm not gonna say all that.
    Chromium converts it to a different
  • 11:29 - 11:36
    string. Which, if you plug any of those
    into a converter, it will tell you
  • 11:36 - 11:41
    that both are invalid punycode. However,
    both are understood and routed, so I just
  • 11:41 - 11:46
    simply added an [unintelligible] all-route
    to my reverse proxy, so that both would
  • 11:46 - 11:53
    work. If you use dig, which is a command-
    line tool that lets you look up domain
  • 11:53 - 11:59
    records - first of all, it doesn't do the
    punycode conversion at all, so I had to
  • 11:59 - 12:05
    use one of the strings that one of my
    browsers gave me, but when I use that
  • 12:05 - 12:13
    string it also gave me this "It's not a
    valid IDNA2008 name. Disable validation
  • 12:13 - 12:18
    using these tool parameters." also didn't
    tell me that I needed both. So I added the
  • 12:18 - 12:24
    first and then, oh, you still need the
    second. But, whatever. Once both were
  • 12:24 - 12:35
    added, I was able to get correct results
    and my site was reachable. The next thing
  • 12:35 - 12:40
    I thought of was, what if I will move my
    domain to a non-supported registrar,
  • 12:40 - 12:48
    because as I just talked about, namecheap
    does not actually allow emoji domains and I
  • 12:48 - 12:53
    was interested to see how their web interface
    would handle it. Sadly, it simply did not
  • 12:53 - 13:00
    handle at all, because they don't support
    .ws domains. I wasn't really going to
  • 13:00 - 13:08
    contact their support team to try and
    still get it because this was only a
  • 13:08 - 13:11
    simple thing that I will probably just
    simply not interested in hosting that
  • 13:11 - 13:18
    domain because it breaks their web
    interface if you try to. Or other things
  • 13:18 - 13:22
    about emoji domains break their web
    interface, so I don't really see why their
  • 13:22 - 13:30
    support team would actually be on my side
    here. So, what about email? Because,
  • 13:30 - 13:38
    apparently, email clients really enjoy
    breaking. From my experience at least. Do
  • 13:38 - 13:47
    they break with emoji? When trying to add
    an emoji domain as a sender, my mail
  • 13:47 - 13:51
    server actually broke because validation
    was run after punycode to unicode
  • 13:51 - 13:57
    conversion, which caused an uncaught
    exception, which was suprising, it's
  • 13:57 - 14:02
    already fixed but the patch is not
    released yet so I couldn't yet test it.
  • 14:02 - 14:06
    But there's still the local part which I
    could already control as much as I wanted
  • 14:06 - 14:14
    to and the [unintelligible] so Thunderbird
    simply ignored it and showed the punycode
  • 14:14 - 14:22
    and Apple Mail dropped the zero-width
    joiner and also showed the punycode under
  • 14:22 - 14:30
    the thing where it shows the exact domain.
    So, mixed results, nothing too spectacular,
  • 14:30 - 14:40
    no exceptions or crashing clients or
    anything interesting like that, sadly.
  • 14:40 - 14:47
    What did I learn doing this? Well,
    obviously emoji domains are very buggy.
  • 14:47 - 14:50
    Implementations vary from browser to
    browser so you can have the same input
  • 14:50 - 14:57
    string and get different punycodes out of
    it, so testing in just one browser
  • 14:57 - 15:03
    definitely is not enough, well, it never
    is, but here especially it isn't. And, you
  • 15:03 - 15:07
    may be able to buy a domain that won't
    work as you would think which can cause
  • 15:07 - 15:12
    quite the annoyance. But it's still a lot
    of fun to mess around with this stuff,
  • 15:12 - 15:16
    just not for productive use. I like to end
    my talks but telling people to join a
  • 15:16 - 15:21
    labor union that doesn't have anything to
    do with this but that's what I do for some
  • 15:21 - 15:27
    reason. And I've got also a blog post
    about this where I've written it up and I
  • 15:27 - 15:36
    would publish the slides under the
    wonderful domain "poop emoji nycode . ws".
  • 15:36 - 15:42
    It's just a link to my regular blog for
    now. I'm sorry. I think I went a bit fast
  • 15:42 - 15:52
    but I still thank you for your time and
    I'm open to questions.
  • 15:52 - 16:22
    Herald: [talks, but no sound is audible]
    Herald: I'm online... oops, I'm sorry. I'm
  • 16:22 - 16:29
    awfully sorry, my machine is slow. I muted
    myself about half a minute ago. Thank you
  • 16:29 - 16:35
    for that beautiful talk, Jennifer. I had
    to grin a couple of times, because it was
  • 16:35 - 16:45
    great and it made my day. And actually we
    have a question. The question is in
  • 16:45 - 16:55
    German, I'll say it in English: why is DNSSec
    so complicated for emoji domains?
  • 16:55 - 17:05
    dysphoricUnicorn: Well, because no one
    actually really likes emoji domains except
  • 17:05 - 17:09
    the people who sell them. At least that
    was my experience looking up things for
  • 17:09 - 17:18
    that. So, they are kind of disallowed in the
    standard, but just some of top level domains just
  • 17:18 - 17:24
    ignore the standard and still let you register
    them and it's just something that people
  • 17:24 - 17:30
    will implement things don't want to think
    about at all. I haven't actually tried
  • 17:30 - 17:36
    DNSSec, but it's just something that is
    easilly forgotten because it shouldn't
  • 17:36 - 17:43
    actually exist, which may
    be a bit harsh, but...
  • 17:43 - 17:51
    Herald: Is - you remember the ringtone
    fads when smartphones didn't exist yet -
  • 17:51 - 17:55
    is this just a fad like this ringtone
    thing and it will just disappear within
  • 17:55 - 18:00
    the next couple of years or would you
    think emojis are here to stay? Is this
  • 18:00 - 18:04
    serious?
    dysphoricUnicorn: I think emojis are here
  • 18:04 - 18:11
    to stay but not within domains or... like,
    it was possible since 2001, kind of, but
  • 18:11 - 18:18
    at least since 2011 where the first actual
    emoji domain was registered. But most
  • 18:18 - 18:23
    domains that are, like, popular examples
    already don't resolve anymore or resolve
  • 18:23 - 18:31
    to sites that say "emoji domains". So,
    emoji domains definitely are not much more
  • 18:31 - 18:40
    than a fad or a nice, funny thing to just
    look at for a bit. However, emojis as a
  • 18:40 - 18:43
    whole are such a large part of our
    culture, I don't think they're going to go
  • 18:43 - 18:48
    away any time soon because it's been more
    than ten years and the annoying
  • 18:48 - 18:57
    downloadable ringtones were popular
    for a bit less time, I think.
  • 18:57 - 19:02
    Herald: This is a question that I actually
    wanted to ask myself as well, because I
  • 19:02 - 19:06
    run my own email server as well and...
    which email server software do you talk
  • 19:06 - 19:12
    about? Do you know about
    supporting the others?
  • 19:12 - 19:14
    dysphoricUnicorn: Errm...
    Herald: What do you use as a software on
  • 19:14 - 19:18
    your email server?
    dysphoricUnicorn: My email server is
  • 19:18 - 19:24
    running on mailhue, which is a set of
    Docker containers that are specially made
  • 19:24 - 19:29
    to work together to make setting up an
    email server as painless as possible for
  • 19:29 - 19:37
    free. So I haven't actually tested any
    other servers, however in theory they
  • 19:37 - 19:44
    shouldn't actually have any issues. So, the part
    of mailhue that failed wasn't actually the
  • 19:44 - 19:52
    mail server part. It was simply a parser.
    So, in theory, with another mail server,
  • 19:52 - 19:59
    it should work, if they didn't also mess
    up parsing at some point.
  • 19:59 - 20:03
    Herald: Somebody asked here, is there a
    list of top-level domains that support
  • 20:03 - 20:08
    emojis and somebody posted and answering
    Wikipedia, is that correct? Wikipedia has
  • 20:08 - 20:11
    such a list?
    dysphoricUnicorn: It has, but it isn't
  • 20:11 - 20:16
    actually correct, the list, that it has it
    is the english Wikipedia. It lists at
  • 20:16 - 20:22
    least one domain that no longer supports
    emojis which is actually kind of a big
  • 20:22 - 20:31
    political thing where they removed
    support. So, the Wikipedia list is not
  • 20:31 - 20:39
    complete or contains too much. There are,
    however, registrars, that are specialized
  • 20:39 - 20:45
    in emoji domains and those will have
    current lists. So, I had .ws as one of
  • 20:45 - 20:51
    them. It's not the red heart emoji, though
    because that's invalid punycode and so I
  • 20:51 - 20:56
    don't really know what to enter in my URL
    bar to get to them other than searching
  • 20:56 - 21:02
    it on Google, so...
    Herald: laughs Next question, is there a
  • 21:02 - 21:07
    difference between single punycode and
    multiple emoji chained together as a
  • 21:07 - 21:17
    second or third level domain?
    dysphoricUnicorn: It's just different
  • 21:17 - 21:24
    punycode, depending on how many emoji you
    have but theoretically, the implementation
  • 21:24 - 21:33
    for this would just, I think the technical
    term was ASCII-to-Unicode something, which
  • 21:33 - 21:43
    is like, an algorithm to convert it, does
    handle multiple emoji similarly. Or - it
  • 21:43 - 21:50
    should work without any
    issues if one of the two works.
  • 21:50 - 21:55
    Herald: Are there any emoji
    first-level domains?
  • 21:55 - 22:00
    dysphoricUnicorn: No. There are not. There
    are punycode first-level domains, because
  • 22:00 - 22:08
    there are languages that simply do not use
    the same letters as english does, so
  • 22:08 - 22:12
    punycode first-level domains are existent
    but no emoji first-level domains at this
  • 22:12 - 22:18
    point. Maybe there will be, but I kind of
    doubt it because the people in charge of
  • 22:18 - 22:27
    this emoji domains are kind of an eye sore
    to them from what I could read, so...
  • 22:27 - 22:30
    Herald: Talking about eye sores: I always
    have the impression, that at least to the
  • 22:30 - 22:36
    old coders, diacritical signs in
    themselves were considered an eye sore.
  • 22:36 - 22:41
    You know, that funny little dots those
    German speaking people have up there.
  • 22:41 - 22:46
    Don't talk about the Czech and the Poles.
    Now, my name contains such a diacritical
  • 22:46 - 22:51
    sign, my first name is André and I've been
    fighting with all kinds of inputs that say
  • 22:51 - 23:03
    7 Bit ASCII and nothing else. Do
    diacritical signs still break domains?
  • 23:03 - 23:10
    dysphoricUnicorn: They should not, because
    are actually reason why IDN's exist. So it
  • 23:10 - 23:14
    was actually proposed by someone who has
    one of those sign in his name and probably
  • 23:14 - 23:22
    just wanted the domain with his name. This
    was the actual reason why we have punycode
  • 23:22 - 23:28
    in the first place and supporting emoji
    was kind of an unwanted side effect. So in
  • 23:28 - 23:36
    theory, it should work without issues but
    still many people don't think about it
  • 23:36 - 23:41
    enough when implementing their own thing,
    so you can never be too certain that it
  • 23:41 - 23:48
    will. But it should.
    Herald: seventy posted here, seventy
  • 23:48 - 23:53
    obviously runs a Windows, and in Windows
    10, the emoji menu with the combination of
  • 23:53 - 24:00
    the Windows and the full stop. Is that
    common already or is that new? I think
  • 24:00 - 24:06
    it's common by now, it's been implemented
    and ever since then everybody's been using
  • 24:06 - 24:14
    emojis. And there is also a remark that
    says "MS Outlook has actually pretty good
  • 24:14 - 24:27
    unicode-punycode support but still don't
    try emojis". I remember a story about when
  • 24:27 - 24:32
    the Bosnian wars broke, when the Yugoslav
    war broke, especially the ones in Bosnia
  • 24:32 - 24:37
    broke out, there were about a hundred
    thousand Bosnians that fled to
  • 24:37 - 24:42
    Switzerland, and about fifteen thousand
    were granted citizenship, but they
  • 24:42 - 24:46
    couldn't be registered in the citizenship
    register, because that only supported
  • 24:46 - 24:53
    7-bit or 8-bit ASCII but no diacritical
    sign of [unintelligible]. I think they
  • 24:53 - 25:00
    fixed it by now but that was quite a thing
    some years back. I see no further
  • 25:00 - 25:07
    question, - oh, there is one ... coughs
    ... one... coughs excuse me that came in
  • 25:07 - 25:14
    right now... coughs... is there a
    uniform way to generate punycode over
  • 25:14 - 25:19
    multiple platforms? Mobiles do not work
    well with entering unicode numbers as we
  • 25:19 - 25:27
    all know.
    dysphoricUnicorn: I'm not sure I
  • 25:27 - 25:33
    understood this correctly. The easiest
    way that I used during my testing was a
  • 25:33 - 25:40
    simple online converters that would work
    on every page. And actually my system
  • 25:40 - 25:45
    doesn't have a shortcut for emoji so I
    would always copy and paste from
  • 25:45 - 25:53
    emojipedia into an online punycode converter
    and just use it from there. Because I
  • 25:53 - 26:01
    don't actually use emoji that much.
    Herald: Okay, we've come to the end of our
  • 26:01 - 26:07
    time. We still would have another minute
    or two, but we have no more questions.
  • 26:07 - 26:11
    Thank you in the meantime for coming and
    holding this talk. You have another talk.
  • 26:11 - 26:13
    I think it's tomorrow?
  • 26:13 - 26:17
    Outro
  • 26:17 - 26:25
    Subtitles created by c3subtitles.de
    in the year 2021. Join, and help us!
Title:
#DiVOC R2R - Emoji domains and how wonderfully broken they are
Description:

more » « less
Video Language:
English
Duration:
26:25

English subtitles

Revisions