Return to Video

#rC3 - Accessible input for readers, coders, and hackers

  • 0:00 - 0:12
    rc3 preroll music
  • 0:12 - 0:18
    Herald: All right, so again, let's
    introduce the next talk, accessible inputs
  • 0:18 - 0:25
    for readers, coders and hackers, the talk
    by David Williams-King about custom off-,
  • 0:25 - 0:30
    well, not off the shelf, but custom
    accessibility solutions. He will give you
  • 0:30 - 0:35
    some demonstrations and that includes his
    own custom made voice input, an added link
  • 0:35 - 0:38
    system. Here is David Williams-King
  • 0:40 - 0:46
    David: Thank you for the introduction.
    Let's go ahead and get started. So, yeah,
  • 0:46 - 0:51
    I'm talking about accessibility,
    particularly accessible input for readers,
  • 0:51 - 0:58
    coders and hackers. So what do I mean by
    accessibility? I mean people that have
  • 0:58 - 1:03
    physical or motor impairments. This could
    be due to repetitive strain injury, carpal
  • 1:03 - 1:08
    tunnel, all kinds of medical conditions.
    If you have this type of thing, you
  • 1:08 - 1:12
    probably can't use a normal computer
    keyboard, computer mouse or even a phone
  • 1:12 - 1:19
    touch screen. However, technology does
    allow users to interact with these devices
  • 1:19 - 1:24
    just using different forms of input. And
    it's really valuable to these people
  • 1:24 - 1:29
    because, you know, being able to interact
    with the device provides some agency they
  • 1:29 - 1:33
    can they can do things on their own and it
    provides a means of communication with the
  • 1:33 - 1:38
    outside world. So it's an important
    problem to look at. And it's what I care
  • 1:38 - 1:45
    about a lot. Let's talk a bit about me for
    a moment. I'm a systems security person. I
  • 1:45 - 1:50
    did a phd in cybersecurity at Columbia. If
    you're interested in low level software
  • 1:50 - 1:55
    defenses, you can look that up. And I'm
    currently the CTO at a startup called
  • 1:55 - 2:03
    Elpha Secure. I started developing medical
    issues in around 2014. And as a result of
  • 2:03 - 2:08
    that, in an ongoing fashion, I can only
    type a few thousand keystrokes per day.
  • 2:08 - 2:12
    Roughly fifteen thousand is my maximum.
    That sounds like a lot, but imagine you're
  • 2:12 - 2:17
    typing at a hundred words per minute.
    That's five hundred characters per minute,
  • 2:17 - 2:23
    which means it takes you 30 minutes to hit
    fifteen thousand characters. So
  • 2:23 - 2:30
    essentially I, I can work like the
    equivalent of a fast programmer for, for
  • 2:30 - 2:34
    half an hour. And then after that I would
    be unable to use my hands for anything,
  • 2:34 - 2:38
    including like preparing food for myself
    or opening, closing doors and so on. So I
  • 2:38 - 2:42
    have to be very careful about my hand use
    and actually have a little program that
  • 2:42 - 2:47
    you can see on the slide there that
    measures the keystrokes for me so I can
  • 2:47 - 2:52
    tell it when I'm going over. So what do I
    do? Well, I do a lot of pair programming,
  • 2:52 - 2:57
    for sure. I log into the same machine as
    other people and we work together. I'm
  • 2:57 - 3:00
    also a very heavy user of speech
    recognition and I gave a talk at that
  • 3:00 - 3:07
    about voice coding with speech recognition
    at the Hope 11 conference. So you can go
  • 3:07 - 3:15
    check that out if you're interested. So
    when I talk about accessible input, I mean
  • 3:15 - 3:19
    different ways that a human can provide
    input to a computer. So ergonomic
  • 3:19 - 3:23
    keyboards are a simple one. Speech
    recognition, eye tracking or gaze tracking
  • 3:23 - 3:27
    so you can see where you're looking
    or where you're pointing your head and
  • 3:27 - 3:32
    maybe use that to replace a mouse, that's
    head gestures, I suppose. And there's
  • 3:32 - 3:39
    always this distinction between bespoke,
    like custom input mechanisms and somewhat
  • 3:39 - 3:44
    mainstream ones. So I'll give you some
    examples. You've probably heard of Stephen
  • 3:44 - 3:50
    Hawking. He's a very famous professor, and
    he was actually a bit of an extreme case.
  • 3:50 - 3:56
    He had, he was diagnosed with ALS when he
    was 21. So his his physical
  • 3:56 - 4:01
    ability, abilities degraded over the years
    because he lived for many decades after
  • 4:01 - 4:05
    that and he went through many
    communication mechanisms. Initially his
  • 4:05 - 4:08
    speech changed so that it was only
    intelligible to his family and close
  • 4:08 - 4:14
    friends, but he was still able to speak.
    And then after that he would work with the
  • 4:14 - 4:19
    human interpreter and raise his eyebrows
    to pick various letters. And then and keep
  • 4:19 - 4:25
    in mind, this is like the 60s or 70s,
    right? So computers were not really where
  • 4:25 - 4:30
    they are today. Later he would operate a
    switch with one hand, just like on off on
  • 4:30 - 4:35
    off, kind of morse code and select from a
    bank of words. And that was around 15
  • 4:35 - 4:41
    words per minute. Eventually, he was
    unable to move his hand, so a team of
  • 4:41 - 4:44
    engineers from Intel worked with him and
    they figured out, they were trying to do
  • 4:44 - 4:48
    like brain scans and all kinds of stuff.
    But again, this was like in the eighties,
  • 4:48 - 4:55
    so there was not not too much they could
    do. So they basically just created some
  • 4:55 - 4:59
    custom software to detect muscle movements
    in his cheek. And he used that with
  • 4:59 - 5:04
    predictive, predictive words, the same way
    that a phone, smartphone keyboard will
  • 5:04 - 5:07
    predict which word you want to say next.
    Stephen Hawking, used something similar to
  • 5:07 - 5:13
    that, except instead of swiping on a
    phone, he was moving his cheek muscles, so
  • 5:13 - 5:18
    that's obviously a sequence of like highly
    customized input mechanisms for, for
  • 5:18 - 5:24
    someone very, very specialized for that
    person. I also want to talk about someone
  • 5:24 - 5:30
    else named Professor Sang-Mook Lee, whom
    I've met. That was me when I had more of a
  • 5:30 - 5:36
    beard than I do now. He he's a professor
    at Seoul National University in South
  • 5:36 - 5:43
    Korea. And he sometimes called like the
    Korean Stephen Hawking, because he's a big
  • 5:43 - 5:48
    advocate for people with disabilities.
    Anyway, what he uses is you can
  • 5:48 - 5:52
    see a little orange device near his mouth
    there. It's called a sip and puff mouse
  • 5:52 - 5:57
    so he can blow into it and suck air
    through it and also move it around. And
  • 5:57 - 6:02
    that acts as a mouse cursor on the Android
    device in front of him. It will move the
  • 6:02 - 6:08
    cursor around and click when he when he
    blows air and so on. So that combined
  • 6:08 - 6:14
    with speech recognition, lets him use
    mainstream Android hardware. He still has
  • 6:14 - 6:21
    access to, you know, email apps and like
    Web Browsers and like Maps and everything
  • 6:21 - 6:26
    that comes on a normal Android device. So
    he's way more capable than Stephen
  • 6:26 - 6:30
    Hawking, as who could, Stephen Hawking
    could communicate, but just to a person at
  • 6:30 - 6:36
    a very slow rate. Right. Part of it's due
    to the nature of his injury. But it's also
  • 6:36 - 6:44
    a testament to how far the technology has
    improved. So let's talk a little bit about
  • 6:44 - 6:49
    what makes good accessibility. I think
    performance is very important, right? You
  • 6:49 - 6:54
    want high accuracy. You don't want typos,
    low latency. I don't want to speak and
  • 6:54 - 6:58
    then five seconds later have words appear.
    It's too long, especially if I have to
  • 6:58 - 7:03
    make corrections. Right. And you want high
    throughput, which we already talked about.
  • 7:03 - 7:06
    Oh, I forgot to mention Stephen Hawking
    had like 15 words per minute. A normal
  • 7:06 - 7:12
    person speaking is 150. So that's
    a big difference. (laughs) The higher
  • 7:12 - 7:16
    throughput you can get, the better. And
    for input accessibility, I think and this
  • 7:16 - 7:21
    is not scientific. This is just what I've
    learned from using myself and observing
  • 7:21 - 7:25
    many of these systems. I think it's
    important to get completeness, consistency
  • 7:25 - 7:31
    and customization. For completeness I
    mean, can I do any action? So Stephen or
  • 7:31 - 7:41
    Professor Sang-Mook Lee, his, his orange
    mouth input device, the sip and puff is
  • 7:41 - 7:44
    quite powerful, but it doesn't let him do
    every action. For example, for some reason
  • 7:44 - 7:48
    when he gets an incoming call, the the
    input doesn't work. So he has to call over
  • 7:48 - 7:52
    a person physically to tap the accept call
    button or the reject call button, which is
  • 7:52 - 7:56
    really annoying. Right. If you don't have
    completeness, you can't be fully
  • 7:56 - 8:02
    independent. Consistency, very important
    as well. The same way we develop motor
  • 8:02 - 8:08
    memory for muscle memory, for a keyboard.
    You develop memory for any types of
  • 8:08 - 8:12
    patterns that you do. But if the thing you
    say or the thing you do keeps changing in
  • 8:12 - 8:18
    order to do the same action. That's not
    good. And finally, customization. So the
  • 8:18 - 8:23
    learning curve for beginners is important
    for any accessibility device, but
  • 8:23 - 8:27
    designing for expert use is almost more
    important because anyone who uses an
  • 8:27 - 8:31
    accessibility interface becomes an expert
    at it. The example I like to give is
  • 8:31 - 8:35
    screen readers like a blind person using a
    screen reader on a phone. They will crank
  • 8:35 - 8:42
    up the speed at which the speech is being
    produced. And I actually met someone who
  • 8:42 - 8:46
    made his speech 16 times faster than
    normal human speech. I could not
  • 8:46 - 8:51
    understand it at all, I sound like brbrbrbr, but
    he could understand it perfectly. And that's just
  • 8:51 - 8:56
    because he used it so much that he's
    become an expert at its use. Let's analyze
  • 8:56 - 9:01
    ergonomic keyboards just for a moment,
    because it's fun. You know, they are kind
  • 9:01 - 9:04
    of like a normal keyboard. They'll have a,
    you'll have a slow pace when you're
  • 9:04 - 9:08
    starting to learn them. But once you're
    good at it, you have very good accuracy,
  • 9:08 - 9:12
    like instantaneous low latency. Right. You
    press the key, the computer receives it
  • 9:12 - 9:18
    immediately and very high throughput. It
    has high as you are on a regular keyboard.
  • 9:18 - 9:20
    So they're actually fantastic
    accessibility devices, right. They're
  • 9:20 - 9:24
    completely compatible with original
    keyboards. And if all you need is an
  • 9:24 - 9:29
    ergonomic keyboard, then you're in luck
    because it's a very good accessibility
  • 9:29 - 9:34
    device. I'm going to talk about two
    things, computers, but also Android
  • 9:34 - 9:40
    devices, so let's start with Android
    devices. Yes, the built in voice
  • 9:40 - 9:43
    recognition and Android is really
    incredible. So even though the microphones
  • 9:43 - 9:47
    on the devices aren't great, Google has
    just collected so much data from so many
  • 9:47 - 9:52
    different sources that they've built like
    better than human accuracy for for their
  • 9:52 - 9:57
    voice recognition. The voice accessibility
    interface is kind of so so we'll talk
  • 9:57 - 10:00
    about that in a bit. That's the interface
    where you can control the Android device
  • 10:00 - 10:04
    entirely by voice. For other input
    mechanisms. You could use like a sip and
  • 10:04 - 10:09
    puff device or you could use physical
    styluses. That's something that I do a
  • 10:09 - 10:13
    lot, actually, because for me, my fingers
    get sore. And if I can hold a stylus in my
  • 10:13 - 10:19
    hand and kind of not use my fingers, then
    that's very effective. So and the Elecom
  • 10:19 - 10:24
    styluses from a Japanese company are the
    lightest I've found and they don't require
  • 10:24 - 10:30
    a lot of force. So the ones at the top
    there are they're like 12 grams and the
  • 10:30 - 10:34
    one on the bottom is 4.7 grams. And you've
    got almost no force to use them. So very
  • 10:34 - 10:38
    nice on the left there you can see the
    Android speech recognition is built into
  • 10:38 - 10:42
    the keyboard now. Right. You can just
    press that and start speaking. It
  • 10:42 - 10:46
    supports different languages, and it's
    very accurate, it's very nice. And
  • 10:46 - 10:51
    actually, when I was working at Google for
    a bit, I talked to the speech recognition
  • 10:51 - 10:54
    team as like: Why are you doing on
    server speech recognition? You should do
  • 10:54 - 10:58
    it on the devices. But of course, Android
    devices are, they're all very different
  • 10:58 - 11:03
    and many of them are not very powerful. So
    they were having trouble getting
  • 11:03 - 11:06
    satisfactory speech recognition on the
    device. So for a long time, there's some
  • 11:06 - 11:11
    server latency, server lag that you do
    speech recognition and you wait a bit. And
  • 11:11 - 11:14
    then sometime this year, I just was using
    speech recognition and it became so much
  • 11:14 - 11:18
    faster. I was extremely excited and I
    looked into it and yeah, they just
  • 11:18 - 11:22
    switched on my device. At least they
    switched on the On device speech recognition
  • 11:22 - 11:26
    model. And so now it's incredibly fast and
    also incredibly accurate. I'm a huge fan
  • 11:26 - 11:31
    of it. On the right hand side. We can
    actually see the voice access interface.
  • 11:31 - 11:35
    So this is meant to allow you to use a
    phone entirely by voice. Again, while I
  • 11:35 - 11:38
    was at Google, I tried the the beta
    version before it was publicly released
  • 11:38 - 11:44
    and I was like, this is pretty bad, mostly
    because it did, it lacked completeness.
  • 11:44 - 11:47
    There would be things on the screen that
    would not be selected. So here we see show
  • 11:47 - 11:53
    labels. And then I can I can say like four,
    five, six, whatever, to tap on that
  • 11:53 - 11:57
    thing. But as you can see at the bottom,
    there was like a Twitter Web app link and
  • 11:57 - 12:00
    there's no number on it. So if I want to
    click on that, I'm out of luck. And this
  • 12:00 - 12:06
    is actually a problem in the design of the
    accessibility interface that it only, it
  • 12:06 - 12:12
    doesn't expose the full DOM. It exposes
    only a subset of it. And so an
  • 12:12 - 12:19
    accessibility mechanism can't ever see
    those other things. And furthermore, the
  • 12:19 - 12:22
    way the Google speech recognition works,
    they have to reestablish a new connection
  • 12:22 - 12:26
    every 30 seconds. And if you're in the
    middle of speaking, it will just throw
  • 12:26 - 12:30
    away whatever you were saying because it
    just decided it had to reconnect, which is
  • 12:30 - 12:35
    really unfortunate. They later released
    that publicly and then sometime this year
  • 12:35 - 12:40
    they did the update, which is pretty nice.
    It now has like a mouse grid, which lets,
  • 12:40 - 12:44
    which solves a lot of the completeness
    problems. Like you can, you can use a grid
  • 12:44 - 12:50
    to narrow down somewhere on the screen and
    then tap there. But the server issues and
  • 12:50 - 12:55
    the expert use is still not good, like, if
    I want to turn it, if I want to do
  • 12:55 - 13:00
    something with the mouse grid, I have to
    say "mouse grid on. 6. 5. mouse grid off".
  • 13:00 - 13:03
    And I can't combine those together. So
    there's a lot of latency and it's not
  • 13:03 - 13:10
    really that fun to use, but better than
    nothing? Absolutely! I just want to really
  • 13:10 - 13:13
    briefly show you as well that this same
    feature of like being able to select links
  • 13:13 - 13:17
    on a screen is available on desktops. This
    is a plug in for Chrome called Vimium. And
  • 13:17 - 13:23
    it's very powerful because you can then
    combine this with keyboards or other input
  • 13:23 - 13:27
    mechanisms. And this one is complete. It
    uses the entire DOM and anything you can
  • 13:27 - 13:31
    click on will be highlighted. So very
    nice. I just want to give a quick example
  • 13:31 - 13:35
    of me using some of these systems. So I've
    been trying to learn Japanese and there's
  • 13:35 - 13:39
    a couple of highly regarded websites for
    this, but they're not consistent. When I
  • 13:39 - 13:44
    use the browser show labels like, you
    know, the thing to press next page or
  • 13:44 - 13:48
    something like that or like, you know, I
    give up or whatever it is, it keeps
  • 13:48 - 13:52
    changing. So the letters that are being
    used keep changing. And that's because of
  • 13:52 - 13:56
    the dynamic way that they're generating
    the HTML. So not really very useful. What
  • 13:56 - 14:01
    I do instead is I use a program called
    Anki and that has very simple shortcuts in
  • 14:01 - 14:06
    its desktop app. One, two, three, four. So
    it's nice to use and consistent and it's
  • 14:06 - 14:12
    syncs with an Android app and then I can
    use my stylus on the Android device. So it
  • 14:12 - 14:16
    works pretty well. But even so, as you can
    see from the chart in the bottom there,
  • 14:16 - 14:20
    there are many days when I can't use this,
    even though I would like to, because I've
  • 14:20 - 14:26
    overused my hands or overused my voice.
    When I'm using voice recognition all day,
  • 14:26 - 14:29
    every day, I do tend to lose my voice. And
    as you can see from the graph, sometimes I
  • 14:29 - 14:34
    lose it for a week or two at a time. So
    same thing with any accessibility
  • 14:34 - 14:38
    interface, you know, you've got to use
    many different techniques and it's always,
  • 14:38 - 14:44
    it's never perfect is just the best you
    can do at that moment. Something else I
  • 14:44 - 14:50
    like to do is read books. I read a lot of
    books and I love e-book readers, the
  • 14:50 - 14:54
    dedicated e-ink displays. You can read them
    in sunlight, they last forever, battery
  • 14:54 - 14:59
    wise. Unfortunately, it's hard to add other
    input mechanisms to them. They don't have
  • 14:59 - 15:04
    microphones or other sensors and you can't
    really install custom software on them.
  • 15:04 - 15:07
    But for Android based devices and there
    are also like e-book reading apps for
  • 15:07 - 15:10
    Android devices, they have everything you
    can install custom software and they have
  • 15:10 - 15:16
    microphones and many other sensors. So I
    made two apps that allow you to read
  • 15:16 - 15:21
    e-books with an e-book reader. The first
    one is Voice Next Page. It's based on one
  • 15:21 - 15:26
    of my speech recognition engines called
    Silvius, and it does do server based
  • 15:26 - 15:29
    recognition. So you have to capture all
    the audio, use 300 kilobits a second to
  • 15:29 - 15:36
    send it to the server and recognize things
    like next page, previous page. However, it
  • 15:36 - 15:40
    doesn't cut out every 30 seconds. It keeps
    going. So that's that's one win for it I
  • 15:40 - 15:46
    guess. And it is published in the Play
    store. Huge thanks to Sarah Leventhal, who
  • 15:46 - 15:50
    did a lot of the implementation. Very
    complicated to make an accessibility app
  • 15:50 - 15:56
    on Android. But we persevered and it works
    quite nicely. So I'm going to actually
  • 15:56 - 16:03
    show you an example of voice next page.
    This over here is my phone on the left
  • 16:03 - 16:09
    hand side just captured so that you guys
    can see it. So here's the Voice Next Page.
  • 16:09 - 16:14
    And basically the connection is green. I
    can do, the server is up and running and
  • 16:14 - 16:20
    so on. I just press start and then I'll
    switch to an Android reading app and say,
  • 16:20 - 16:23
    next page, previous page. I won't speak
    otherwise because it will chapel
  • 16:23 - 16:26
    everything I'm saying.
  • 16:33 - 16:35
    Next Page
  • 16:36 - 16:38
    Next Page
  • 16:38 - 16:40
    Previous Page
  • 16:42 - 16:43
    Center
  • 16:44 - 16:45
    Center
  • 16:47 - 16:48
    Foreground
  • 16:49 - 16:51
    Stop listening
  • 16:55 - 16:59
    So that's a demo of
    The Voice Next Page, and it's
  • 16:59 - 17:03
    extremely helpful. I built it a couple of
    years ago along with Sarah, and I use it a
  • 17:03 - 17:08
    lot. So, yeah, you can go ahead and
    download it if you guys wanna try it out.
  • 17:08 - 17:13
    And the other one is called Blink Next
    Page. So the idea for this, I got this
  • 17:13 - 17:18
    idea from a research paper this year that
    was studying eyelid gestures. I didn't use
  • 17:18 - 17:24
    any of their code, but it's a great idea.
    So the way this works is you detect blinks
  • 17:24 - 17:29
    by using the Android camera and then you
    can trigger an action like turning pages
  • 17:29 - 17:34
    in an e-book reader. This actually doesn't
    need any networking. It's able to use the
  • 17:34 - 17:39
    on device face recognition models from
    Google, and it is still under development.
  • 17:39 - 17:45
    So it's not on the play store yet, but it
    is working. And, you know, please contact
  • 17:45 - 17:54
    me if you want to try it. So just give me
    one moment to set that demo up here. So
  • 17:54 - 18:01
    I'm going to use... The main problem with
    this current implementation is that it
  • 18:01 - 18:07
    uses two devices. So that was easier to
    implement. And I use two devices anyway.
  • 18:07 - 18:14
    But obviously I want a one device version
    if I'm actually going to use it for
  • 18:14 - 18:18
    anything. So here's how this works. This
    device I point at me, at my eyes, the
  • 18:18 - 18:24
    other device I put wherever it's
    convenient to read, ups sorry, and if I blink
  • 18:24 - 18:29
    my eyes, the phone will buzz once it
    detects that I blink my eyes and it will
  • 18:29 - 18:35
    turn the page automatically on the other
    Android device. Now I have to blink both
  • 18:35 - 18:42
    my eyes for half a second. If I want to go
    backwards, I can blink just my left eye.
  • 18:42 - 18:50
    And if I want to go forwards like quickly,
    I can blink my right eye and hold it. (background buzzing)
  • 18:50 - 18:55
    Anyway, it does have some false positives.
    That's why like you can go backwards in
  • 18:55 - 19:00
    case it detects that you've accidentally
    flipped the page. And lighting is also
  • 19:00 - 19:04
    very important. Like if I have a light
    behind me, then this is not going to be
  • 19:04 - 19:08
    able to identify whether my eyes are open
    or closed properly. So it has some
  • 19:08 - 19:19
    limitations, but very simple to use. So
    I'm a big fan. OK, so that's enough about
  • 19:19 - 19:24
    Android devices, let's talk very briefly
    about desktop computers. So if you're
  • 19:24 - 19:27
    going to use a desktop computer, of
    course, try using that show labels plugin
  • 19:27 - 19:33
    in a browser. For native apps you can try
    Dragon NaturallySpeaking, which is fine if
  • 19:33 - 19:37
    you're just like using basic things. But
    if you're trying to do complicated things,
  • 19:37 - 19:41
    you should definitely use a voice coding
    system. You could also consider using eye
  • 19:41 - 19:46
    tracking to replace a mouse. I personally,
    I don't use that. I find it hurts my eyes,
  • 19:46 - 19:50
    but I do use a trackball with very little
    force and a wacom tablet. Some people will
  • 19:50 - 19:56
    even scroll up and down by humming, for
    example, but I don't have that setup.
  • 19:56 - 20:01
    There's a bunch of nice talks out there on
    voice coding. The top left is Tavis Rudds
  • 20:01 - 20:06
    talk from many years ago that got many of
    us interested. Emily Shea gave a talk
  • 20:06 - 20:11
    there about best practices for voice
    coding. And then I gave a talk a couple of
  • 20:11 - 20:16
    years ago at the Hope 11 conference, which
    you can also check out. It's mostly out of
  • 20:16 - 20:22
    date by now, but it's still interesting.
    So there are a lot of voice coding
  • 20:22 - 20:28
    systems, the sort of grandfather of them
    all is Dragonfly. It's become a grammar
  • 20:28 - 20:35
    standard. Caster is if you're willing to
    memorize lots of unusual words, you can
  • 20:35 - 20:41
    become much better, much faster than I
    currently am at voice coding. aenea is how
  • 20:41 - 20:46
    you originally used Dragon to work on a
    Linux machine, for example, because Dragon
  • 20:46 - 20:53
    only runs on Windows. Talon is a closed
    source program, which is, but it's very
  • 20:53 - 20:57
    powerful. Has a big user base, especially
    for Mac OS. There are ports now. And Talon
  • 20:57 - 21:05
    used to use Dragon, but it's now using a
    speech system from Facebook. Silvius is
  • 21:05 - 21:10
    the system that I created, the models are
    not very accurate, but it's a nice
  • 21:10 - 21:13
    architecture where there's client- server,
    so it makes it easy to build things like
  • 21:13 - 21:18
    the voice next page. So Voice next page
    was using Silvius. And then the the most
  • 21:18 - 21:22
    recent one I think on this list is kaldi-
    active-grammar, which is extremely
  • 21:22 - 21:26
    powerful and extremely customizable. And
    it's also open source. It works on all
  • 21:26 - 21:30
    platforms. So I really highly recommend
    that. So let's talk a bit more about
  • 21:30 - 21:35
    kaldi-active-grammar. But first, for voice
    coding, I've already mentioned, you have
  • 21:35 - 21:39
    to be careful how you use your voice
    right. Breathe from your belly. Don't
  • 21:39 - 21:42
    tighten your muscles and breathe from your
    chest. Try to speak normally. And I'm not
  • 21:42 - 21:45
    particularly good at this. Like you'll
    hear me when I'm speaking commands that my
  • 21:45 - 21:51
    inflection changes. So I do tend to
    overuse my voice, but you just have to be
  • 21:51 - 21:54
    conscious of that. The microphone hardware
    does matter. I do recommend like a blue
  • 21:54 - 22:00
    yeti on a microphone arm that you can pull
    and put close to your face like this. I
  • 22:00 - 22:04
    will use this one for my speaking demo
    and. Yeah. And the other thing is your
  • 22:04 - 22:08
    grammar is fully customizable. So if you
    keep saying a word and the system doesn't
  • 22:08 - 22:14
    recognize it, just change it to another
    word. And it's complete in the sense you
  • 22:14 - 22:18
    can type any key on the keyboard. And the
    most important thing for expert use or
  • 22:18 - 22:22
    customizability is that you can do
    chaining. So with the voice coding system,
  • 22:22 - 22:27
    you can say multiple commands at once. If
    there's, and it's a huge time saving,
  • 22:27 - 22:32
    you'll see what I mean when I give a quick
    demo. When I do voice coding, I'm a very
  • 22:32 - 22:39
    heavy vim and tmux user. You know, there
    have been I've worked with many people
  • 22:39 - 22:42
    before, so I have some cheat sheet
    information there. So if you're
  • 22:42 - 22:45
    interested, you can go check that out. But
    yeah, let's just do a quick demo of voice
  • 22:45 - 22:54
    coding here. "Turn this mic on". "Desk left
    two". "Control delta", "open new terminal".
  • 22:54 - 23:00
    "Charlie delta space slash tango mike papa
    enter". "Command vim". "Hotel hotel point
  • 23:00 - 23:09
    charlie papa papa, enter". "India , hash
    word include space langel", "india oscar word
  • 23:09 - 23:16
    stream rangel, enter, enter", "india noi
    tango space word mean", "no mike arch india
  • 23:16 - 23:24
    noi space len ren space lace enter enter
    race up tab word print fox scratch nope code
  • 23:24 - 23:31
    standard charlie oscar uniform tango space
    langel langel space quote. Sentence hello,
  • 23:31 - 23:40
    voice coding bang, scratch six delta india
    noi golf, bang, backslash, noi quote
  • 23:40 - 23:46
    semicolon act sky fox mike romeo noi oscar
    word return space number zero semicolon
  • 23:46 - 23:53
    act vim save and quit. Golf plus plus
    space hotel hotel tab minus oscar space
  • 23:53 - 24:04
    hotel hotel enter. Point slash hotel hotel
    enter. Desk right. So that's just a quick
  • 24:04 - 24:09
    example of voice coding, you can use it to
    write any programing language, you can use
  • 24:09 - 24:14
    it to control anything on your desktop.
    It's very powerful. It has a bit of a
  • 24:14 - 24:19
    learning curve, but it's very powerful. So
    the creator of kaldi-active-grammar is
  • 24:19 - 24:26
    also named David. I'm named David, but
    just a coincidence. And he says of kaldi-
  • 24:26 - 24:31
    active-grammar, that I haven't typed with
    the keyboard in many years and kaldi-
  • 24:31 - 24:36
    active-grammar is bootstrapped in that I
    have been developing it entirely using the
  • 24:36 - 24:42
    previous versions of it. So, David has a
    medical condition that means he has very
  • 24:42 - 24:48
    low dexterity, so it's hard for him to use
    a keyboard. And yet he basically got
  • 24:48 - 24:53
    kaldi-active-grammar working through the
    skin of his teeth or something and then
  • 24:53 - 24:59
    continues to develop it using it. And
    yeah, I'm a huge fan of the project. I
  • 24:59 - 25:03
    haven't contributed much, but I did give
    some of the hardware resources like GPU
  • 25:03 - 25:08
    and CPU compute resources to allow
    training to happen. But I would also like
  • 25:08 - 25:13
    to show you a video of David using kaldi-
    active-grammar, just, so you can see it as
  • 25:13 - 25:21
    well. So, the other thing about David is,
    that he has a speech impediment or a
  • 25:21 - 25:25
    speech, I don't know, an accent or
    whatever. So it's difficult to, for a
  • 25:25 - 25:28
    normal speech recognition system, to
    understand him. And you might have trouble
  • 25:28 - 25:31
    understanding him here. But you can see in
    the lower right, what the speech system
  • 25:31 - 25:37
    understands what he's saying. Oh, I
    realized, that I do need to switch
  • 25:37 - 25:42
    something in OBS, so that you guys can
    hear it. Sorry. There you go.
  • 25:42 - 26:03
    (Other) David using kaldi-active-grammar system (not understandable)
  • 26:03 - 26:06
    Here, you get the idea and hopefully, you
  • 26:06 - 26:11
    guys were able to hear that. If not, you
    can also find this on the website that I'm
  • 26:11 - 26:18
    going to show you at the end. One other
    thing, I want to show you about this is,
  • 26:18 - 26:23
    David has actually set up this humming to
    scroll, which I think is pretty cool. Of
  • 26:23 - 26:28
    course, I've gone and turned off the OBS
    there. But he's just doing hmmm and it's
  • 26:28 - 26:33
    understanding that and scrolling down. So,
    something that I'm able to do with my
  • 26:33 - 26:42
    trackball, but he's using his voice for,
    so pretty cool. So I'm almost done here.
  • 26:42 - 26:47
    In summary, good input accessibility means
    you need completeness, consistency and
  • 26:47 - 26:50
    customization. You need to be able to do
    any action that you could do with the
  • 26:50 - 26:55
    other input mechanisms. And doing the same
    input should have the same action. And
  • 26:55 - 27:00
    remember, your users will become experts,
    so the system needs to be designed for
  • 27:00 - 27:06
    that. For e-book reading: Yes, I'm trying
    to allow anyone to read, even if they're
  • 27:06 - 27:11
    experiencing some severe physical or motor
    impairment, because I think that gives you
  • 27:11 - 27:15
    a lot of power to be able to turn the
    pages and read your favorite books. And
  • 27:15 - 27:19
    for speech recognition, yeah, Android
    speech recognition is very good. Silvius
  • 27:19 - 27:23
    accuracy is not so good, but it's easy to
    use quickly for experimentation and to
  • 27:23 - 27:28
    make other types of things like Voice Next
    Page. And please do check out kaldi-
  • 27:28 - 27:34
    active-grammar if you have some serious
    need for voice recognition. Lastly, I put
  • 27:34 - 27:39
    all of this onto a website, voxhub.io, so
    you can see Voice Next Page, Blink Next
  • 27:39 - 27:42
    Page, kaldi-active-grammar and so on, just
    instructions for how to use it and how to
  • 27:42 - 27:47
    set it up. So please do check that out.
    And tons of acknowledgments, lots of
  • 27:47 - 27:50
    people that have helped me along the way,
    but I want to especially call out
  • 27:50 - 27:54
    Professor Sang-Mook Lee, who actually
    invited me to Korea a couple of times to
  • 27:54 - 27:58
    give talks - a big inspiration. And of
    course, David Zurow, who has actually been
  • 27:58 - 28:03
    able to bootstrap into a fully voice
    coding environment. So that's all I have
  • 28:03 - 28:07
    for today. Thank you very much.
  • 28:07 - 28:16
    Herald: Alright, I suppose I'm back on the
    air, so let me see. I want to remind
  • 28:16 - 28:22
    everyone before we go into the Q&A that
    you can ask your questions for this talk
  • 28:22 - 28:26
    on IRC, the link is under the video, or
    you can use Twitter or the Fediverse with
  • 28:26 - 28:34
    the hashtag #rc3two. Again, I'll hold it
    up here, "rc3two".
  • 28:34 - 28:39
    Thanks for your talk, David. That was
    really interesting. Thanks for talk,
  • 28:39 - 28:47
    David. I, yeah, I think we have a couple
    of questions from the Signal Angels.
  • 28:47 - 28:51
    Before that, I just wanted to say I've
    recently spent some time playing with a
  • 28:51 - 28:57
    like the VoiceOver system in iOS and that
    can now actually tell you what is on a
  • 28:57 - 29:03
    photo, which is kind of amazing. Oh, by
    the way, I can't hear you here on on the
  • 29:03 - 29:05
    Mumble.
    David: Yeah. Sorry, I wasn't saying
  • 29:05 - 29:10
    anything. Yeah, no, it's so I focused
    mostly on input accessability, right?
  • 29:10 - 29:14
    Which is like how do you get data to the
    computer. But there's been huge
  • 29:14 - 29:17
    improvements in the other way around as
    well, right? The computer doing VoiceOver
  • 29:17 - 29:19
    things.
    Herald: So we have about let's see,
  • 29:19 - 29:25
    five-six minutes left at least for Q&A. We
    have a question by Toby++, he asks: "Your
  • 29:25 - 29:29
    next page application looks cool. Do you
    have statistics of how many people use it
  • 29:29 - 29:36
    or found it on the App Store?"
    David: Not very many. The Voice Next Page
  • 29:36 - 29:41
    was advertised only so far as a little
    academic poster. So I've gotten a few
  • 29:41 - 29:46
    people to use it. But I run eight
    concurrent workers and we've never hit
  • 29:46 - 29:52
    more than that. (laughs) So not super popular,
    but I do hope that some people will see it
  • 29:52 - 29:55
    because of this talk and go and check out.
    Herald: That's cool. Next question. How
  • 29:55 - 30:00
    error prone are the speech recognition
    systems at all? E.g., can you do coding
  • 30:00 - 30:06
    while doing workouts?
    David: So one thing about speech
  • 30:06 - 30:10
    recognition is very sensitive to the
    microphone, so when you're doing it
  • 30:10 - 30:38
    Technical malfunction. We'll be back soon.
  • 30:38 - 30:41
    David (cont.): Any mistakes, right?
  • 30:41 - 30:44
    That's the thing about having low latency,
    you just say something and you watch it
  • 30:44 - 30:48
    and you make sure that it was what you
    wanted to say. I don't know exactly how
  • 30:48 - 30:52
    many words per minute I can say with voice
    coding, but I can say it much faster than
  • 30:52 - 30:56
    regular speech. So I'd say at least like
    200, maybe 300 words per minute.
  • 30:56 - 30:57
    So it's actually a very high bandwidth
    mechanism.
  • 30:57 - 31:03
    Herald: That's really awesome. A question from
    peppyjndivos: "Any advice for software
  • 31:03 - 31:08
    authors to make their stuff more
    accessible?"
  • 31:08 - 31:15
    David: There are good web accessibility
    guidelines. So if you're just making a
  • 31:15 - 31:19
    website or something, I would definitely
    follow those. They tend to be focused more
  • 31:19 - 31:24
    on people that are blind because that is,
    you know, it's more of an obvious fail.
  • 31:24 - 31:30
    like they just can't interact at all with
    your website. But things like, you know,
  • 31:30 - 31:37
    if Duolingo, for example, had used the
    same, like, the same accessibility access
  • 31:37 - 31:40
    tag on their, like, next button, then they
    would always be the same letter for me and
  • 31:40 - 31:46
    I wouldn't have to be like Fox-Charlie ,
    Fox-Delta, Fox-something - changes all the
  • 31:46 - 31:52
    time. So I think consistency is very
    important. And integrating with any
  • 31:52 - 31:58
    existing accessibility APIs is also a very
    important - Web APIs, Android APIs and so
  • 31:58 - 32:02
    on, because, you know, we can't make every
    program out there like voice compatible.
  • 32:02 - 32:05
    We just have to meet in the middle where
    they interact at the keyboard layer or the
  • 32:05 - 32:08
    accessibility layer.
    Herald: Awesome. AmericN has a question,
  • 32:08 - 32:14
    wonders if these systems use similar
    approaches like stenography with mnemonics
  • 32:14 - 32:19
    or if there's any projects working having
    that in mind.
  • 32:19 - 32:27
    David: A very good question. So, the first
    thing everyone uses is the NATO phonetic
  • 32:27 - 32:33
    alphabet to spell letters, for example,
    Alpha. Bravo, Charlie. Some people then
  • 32:33 - 32:39
    will substitute letters for things that
    are too long, like November. I use noi.
  • 32:39 - 32:42
    Sometimes the speech system doesn't
    understand you. Whenever I said Alpha,
  • 32:42 - 32:46
    Dragon was like, oh, you're saying
    "offer". So I changed it. It's Arch for
  • 32:46 - 32:53
    me, Arch, Brav, Char. So, and also most of
    these grammars are in a common grammar
  • 32:53 - 32:57
    format. They are written in Python and
    they're compatible with Dragonfly. So you
  • 32:57 - 33:01
    can grab a grammar for, I don't know, for
    Aenea and get it to work with kaldi-
  • 33:01 - 33:05
    active-grammar with very little effort. I
    actually have a grammar that works on both
  • 33:05 - 33:11
    Aenea and kaldi-active-grammar, and that's
    what I use. So there's a bit of lingua
  • 33:11 - 33:14
    franca, I guess, you can kind of guess
    what other people are using. But at the
  • 33:14 - 33:19
    same time there's a lot of customization,
    you know, because people change words,
  • 33:19 - 33:23
    they add their own commands, they change
    words based on what the speech system
  • 33:23 - 33:27
    understands.
    Herald: Alright, LEB asks, is there an online
  • 33:27 - 33:32
    community you can propose for
    accessibility technologies?
  • 33:32 - 33:40
    David: There's an amazing forum for anything
    related to voice coding. All the
  • 33:40 - 33:52
    developers of new voice coding software
    are there. Sorry, I just need to drink. So
  • 33:52 - 33:57
    it's a really fantastic resource. I do
    link to it from voxhub.io. I believe it's
  • 33:57 - 34:02
    at the bottom of the kaldi-active-grammar
    page. So you can definitely check that
  • 34:02 - 34:07
    out. For general accessibility, I don't
    know, I could recommend the accessibility
  • 34:07 - 34:12
    mailing list at Google, but that's only if
    you work at Google. Other than that, yeah,
  • 34:12 - 34:16
    I think it depends on your community,
    right? I think if you're looking for web
  • 34:16 - 34:20
    accessibility, you could go for some
    Mozilla mailing list and so on. If you're
  • 34:20 - 34:25
    looking for desktop accessibility, then
    maybe you could go find some stuff about
  • 34:25 - 34:30
    the Windows Speech API. unintelligible
    Herald: One last question from Joe Neilson.
  • 34:30 - 34:35
    Could there be legal issues if you make an
    e-book into audio? I'm not sure what that
  • 34:35 - 34:43
    refers to.
    David: Yeah. So if you are like doing, if
  • 34:43 - 34:46
    you're using a screen reader and you're
    like, you try to get it to read out the
  • 34:46 - 34:55
    contents of an e-book, right? So most,
    most of the time there are fair use
  • 34:55 - 35:03
    exceptions for copyright law, even in the
    US, and making a copy yourself for
  • 35:03 - 35:09
    personal purposes so that you can access
    it is usually considered fair use. If you
  • 35:09 - 35:14
    were trying to commercialize it or make
    money off of that or like, I don't know,
  • 35:14 - 35:18
    you're a famous streamer and all you do is
    highlight text and have it read it out,
  • 35:18 - 35:21
    then maybe, but I would say that
    definitely falls under fair use.
  • 35:21 - 35:27
    Herald: Alright. So I guess that's it for
    the talk. I think we're hitting the timing
  • 35:27 - 35:30
    mark really well. Thank you so much,
    David, for that. That was really, really
  • 35:30 - 35:36
    interesting. I learned a lot and thanks
    everyone for watching and stay on. I think
  • 35:36 - 35:40
    there might be some news coming up. Thanks
    and everyone.
  • 35:40 - 35:56
    rc3 postroll music
  • 35:56 - 36:19
    Subtitles created by c3subtitles.de
    in the year 2020. Join, and help us!
Title:
#rC3 - Accessible input for readers, coders, and hackers
Description:

more » « less
Video Language:
English
Duration:
36:20

English subtitles

Revisions