0:00:00.000,0:00:12.260
rc3 preroll music
0:00:12.260,0:00:17.930
Herald: All right, so again, let's[br]introduce the next talk, accessible inputs
0:00:17.930,0:00:25.320
for readers, coders and hackers, the talk[br]by David Williams-King about custom off-,
0:00:25.320,0:00:30.230
well, not off the shelf, but custom[br]accessibility solutions. He will give you
0:00:30.230,0:00:35.420
some demonstrations and that includes his[br]own custom made voice input, an added link
0:00:35.420,0:00:38.110
system. Here is David Williams-King
0:00:40.020,0:00:46.440
David: Thank you for the introduction.[br]Let's go ahead and get started. So, yeah,
0:00:46.440,0:00:50.650
I'm talking about accessibility,[br]particularly accessible input for readers,
0:00:50.650,0:00:57.840
coders and hackers. So what do I mean by[br]accessibility? I mean people that have
0:00:57.840,0:01:02.780
physical or motor impairments. This could[br]be due to repetitive strain injury, carpal
0:01:02.780,0:01:08.030
tunnel, all kinds of medical conditions.[br]If you have this type of thing, you
0:01:08.030,0:01:11.860
probably can't use a normal computer[br]keyboard, computer mouse or even a phone
0:01:11.860,0:01:18.720
touch screen. However, technology does[br]allow users to interact with these devices
0:01:18.720,0:01:23.780
just using different forms of input. And[br]it's really valuable to these people
0:01:23.780,0:01:28.909
because, you know, being able to interact[br]with the device provides some agency they
0:01:28.909,0:01:32.781
can they can do things on their own and it[br]provides a means of communication with the
0:01:32.781,0:01:38.439
outside world. So it's an important[br]problem to look at. And it's what I care
0:01:38.439,0:01:45.200
about a lot. Let's talk a bit about me for[br]a moment. I'm a systems security person. I
0:01:45.200,0:01:49.920
did a phd in cybersecurity at Columbia. If[br]you're interested in low level software
0:01:49.920,0:01:54.509
defenses, you can look that up. And I'm[br]currently the CTO at a startup called
0:01:54.509,0:02:03.360
Elpha Secure. I started developing medical[br]issues in around 2014. And as a result of
0:02:03.360,0:02:07.770
that, in an ongoing fashion, I can only[br]type a few thousand keystrokes per day.
0:02:07.770,0:02:12.090
Roughly fifteen thousand is my maximum.[br]That sounds like a lot, but imagine you're
0:02:12.090,0:02:17.069
typing at a hundred words per minute.[br]That's five hundred characters per minute,
0:02:17.069,0:02:23.349
which means it takes you 30 minutes to hit[br]fifteen thousand characters. So
0:02:23.349,0:02:29.519
essentially I, I can work like the[br]equivalent of a fast programmer for, for
0:02:29.519,0:02:33.700
half an hour. And then after that I would[br]be unable to use my hands for anything,
0:02:33.700,0:02:38.420
including like preparing food for myself[br]or opening, closing doors and so on. So I
0:02:38.420,0:02:42.189
have to be very careful about my hand use[br]and actually have a little program that
0:02:42.189,0:02:46.690
you can see on the slide there that[br]measures the keystrokes for me so I can
0:02:46.690,0:02:51.809
tell it when I'm going over. So what do I[br]do? Well, I do a lot of pair programming,
0:02:51.809,0:02:56.650
for sure. I log into the same machine as[br]other people and we work together. I'm
0:02:56.650,0:03:00.430
also a very heavy user of speech[br]recognition and I gave a talk at that
0:03:00.430,0:03:06.900
about voice coding with speech recognition[br]at the Hope 11 conference. So you can go
0:03:06.900,0:03:15.419
check that out if you're interested. So[br]when I talk about accessible input, I mean
0:03:15.419,0:03:18.779
different ways that a human can provide[br]input to a computer. So ergonomic
0:03:18.779,0:03:23.019
keyboards are a simple one. Speech[br]recognition, eye tracking or gaze tracking
0:03:23.019,0:03:26.699
so you can see where you're looking[br]or where you're pointing your head and
0:03:26.699,0:03:32.229
maybe use that to replace a mouse, that's[br]head gestures, I suppose. And there's
0:03:32.229,0:03:38.650
always this distinction between bespoke,[br]like custom input mechanisms and somewhat
0:03:38.650,0:03:44.499
mainstream ones. So I'll give you some[br]examples. You've probably heard of Stephen
0:03:44.499,0:03:50.230
Hawking. He's a very famous professor, and[br]he was actually a bit of an extreme case.
0:03:50.230,0:03:56.142
He had, he was diagnosed with ALS when he[br]was 21. So his his physical
0:03:56.142,0:04:00.669
ability, abilities degraded over the years[br]because he lived for many decades after
0:04:00.669,0:04:05.059
that and he went through many[br]communication mechanisms. Initially his
0:04:05.059,0:04:08.309
speech changed so that it was only[br]intelligible to his family and close
0:04:08.309,0:04:14.239
friends, but he was still able to speak.[br]And then after that he would work with the
0:04:14.239,0:04:19.440
human interpreter and raise his eyebrows[br]to pick various letters. And then and keep
0:04:19.440,0:04:24.690
in mind, this is like the 60s or 70s,[br]right? So computers were not really where
0:04:24.690,0:04:29.840
they are today. Later he would operate a[br]switch with one hand, just like on off on
0:04:29.840,0:04:35.009
off, kind of morse code and select from a[br]bank of words. And that was around 15
0:04:35.009,0:04:41.080
words per minute. Eventually, he was[br]unable to move his hand, so a team of
0:04:41.080,0:04:44.490
engineers from Intel worked with him and[br]they figured out, they were trying to do
0:04:44.490,0:04:48.229
like brain scans and all kinds of stuff.[br]But again, this was like in the eighties,
0:04:48.229,0:04:54.599
so there was not not too much they could[br]do. So they basically just created some
0:04:54.599,0:04:59.120
custom software to detect muscle movements[br]in his cheek. And he used that with
0:04:59.120,0:05:03.550
predictive, predictive words, the same way[br]that a phone, smartphone keyboard will
0:05:03.550,0:05:07.180
predict which word you want to say next.[br]Stephen Hawking, used something similar to
0:05:07.180,0:05:12.689
that, except instead of swiping on a[br]phone, he was moving his cheek muscles, so
0:05:12.689,0:05:17.810
that's obviously a sequence of like highly[br]customized input mechanisms for, for
0:05:17.810,0:05:23.979
someone very, very specialized for that[br]person. I also want to talk about someone
0:05:23.979,0:05:29.592
else named Professor Sang-Mook Lee, whom[br]I've met. That was me when I had more of a
0:05:29.592,0:05:36.180
beard than I do now. He he's a professor[br]at Seoul National University in South
0:05:36.180,0:05:42.969
Korea. And he sometimes called like the[br]Korean Stephen Hawking, because he's a big
0:05:42.969,0:05:47.990
advocate for people with disabilities.[br]Anyway, what he uses is you can
0:05:47.990,0:05:52.360
see a little orange device near his mouth[br]there. It's called a sip and puff mouse
0:05:52.360,0:05:56.930
so he can blow into it and suck air[br]through it and also move it around. And
0:05:56.930,0:06:02.280
that acts as a mouse cursor on the Android[br]device in front of him. It will move the
0:06:02.280,0:06:08.229
cursor around and click when he when he[br]blows air and so on. So that combined
0:06:08.229,0:06:13.909
with speech recognition, lets him use[br]mainstream Android hardware. He still has
0:06:13.909,0:06:21.249
access to, you know, email apps and like[br]Web Browsers and like Maps and everything
0:06:21.249,0:06:26.159
that comes on a normal Android device. So[br]he's way more capable than Stephen
0:06:26.159,0:06:29.949
Hawking, as who could, Stephen Hawking[br]could communicate, but just to a person at
0:06:29.949,0:06:35.830
a very slow rate. Right. Part of it's due[br]to the nature of his injury. But it's also
0:06:35.830,0:06:43.939
a testament to how far the technology has[br]improved. So let's talk a little bit about
0:06:43.939,0:06:49.480
what makes good accessibility. I think[br]performance is very important, right? You
0:06:49.480,0:06:53.889
want high accuracy. You don't want typos,[br]low latency. I don't want to speak and
0:06:53.889,0:06:58.389
then five seconds later have words appear.[br]It's too long, especially if I have to
0:06:58.389,0:07:02.509
make corrections. Right. And you want high[br]throughput, which we already talked about.
0:07:02.509,0:07:06.240
Oh, I forgot to mention Stephen Hawking[br]had like 15 words per minute. A normal
0:07:06.240,0:07:12.349
person speaking is 150. So that's [br]a big difference. (laughs) The higher
0:07:12.349,0:07:16.479
throughput you can get, the better. And[br]for input accessibility, I think and this
0:07:16.479,0:07:20.879
is not scientific. This is just what I've[br]learned from using myself and observing
0:07:20.879,0:07:25.330
many of these systems. I think it's[br]important to get completeness, consistency
0:07:25.330,0:07:31.479
and customization. For completeness I[br]mean, can I do any action? So Stephen or
0:07:31.479,0:07:40.590
Professor Sang-Mook Lee, his, his orange[br]mouth input device, the sip and puff is
0:07:40.590,0:07:44.199
quite powerful, but it doesn't let him do[br]every action. For example, for some reason
0:07:44.199,0:07:48.379
when he gets an incoming call, the the[br]input doesn't work. So he has to call over
0:07:48.379,0:07:52.430
a person physically to tap the accept call[br]button or the reject call button, which is
0:07:52.430,0:07:55.729
really annoying. Right. If you don't have[br]completeness, you can't be fully
0:07:55.729,0:08:01.729
independent. Consistency, very important[br]as well. The same way we develop motor
0:08:01.729,0:08:07.580
memory for muscle memory, for a keyboard.[br]You develop memory for any types of
0:08:07.580,0:08:11.690
patterns that you do. But if the thing you[br]say or the thing you do keeps changing in
0:08:11.690,0:08:18.220
order to do the same action. That's not[br]good. And finally, customization. So the
0:08:18.220,0:08:22.809
learning curve for beginners is important[br]for any accessibility device, but
0:08:22.809,0:08:27.150
designing for expert use is almost more[br]important because anyone who uses an
0:08:27.150,0:08:31.229
accessibility interface becomes an expert[br]at it. The example I like to give is
0:08:31.229,0:08:35.440
screen readers like a blind person using a[br]screen reader on a phone. They will crank
0:08:35.440,0:08:41.880
up the speed at which the speech is being[br]produced. And I actually met someone who
0:08:41.880,0:08:46.321
made his speech 16 times faster than[br]normal human speech. I could not
0:08:46.341,0:08:51.020
understand it at all, I sound like brbrbrbr, but [br]he could understand it perfectly. And that's just
0:08:51.020,0:08:56.190
because he used it so much that he's[br]become an expert at its use. Let's analyze
0:08:56.190,0:09:01.040
ergonomic keyboards just for a moment,[br]because it's fun. You know, they are kind
0:09:01.040,0:09:04.260
of like a normal keyboard. They'll have a,[br]you'll have a slow pace when you're
0:09:04.260,0:09:07.630
starting to learn them. But once you're[br]good at it, you have very good accuracy,
0:09:07.630,0:09:11.709
like instantaneous low latency. Right. You[br]press the key, the computer receives it
0:09:11.709,0:09:17.510
immediately and very high throughput. It[br]has high as you are on a regular keyboard.
0:09:17.510,0:09:20.329
So they're actually fantastic[br]accessibility devices, right. They're
0:09:20.329,0:09:23.950
completely compatible with original[br]keyboards. And if all you need is an
0:09:23.950,0:09:28.600
ergonomic keyboard, then you're in luck[br]because it's a very good accessibility
0:09:28.600,0:09:34.480
device. I'm going to talk about two[br]things, computers, but also Android
0:09:34.480,0:09:39.750
devices, so let's start with Android[br]devices. Yes, the built in voice
0:09:39.750,0:09:43.340
recognition and Android is really[br]incredible. So even though the microphones
0:09:43.340,0:09:47.000
on the devices aren't great, Google has[br]just collected so much data from so many
0:09:47.000,0:09:51.590
different sources that they've built like[br]better than human accuracy for for their
0:09:51.590,0:09:56.570
voice recognition. The voice accessibility[br]interface is kind of so so we'll talk
0:09:56.570,0:09:59.649
about that in a bit. That's the interface[br]where you can control the Android device
0:09:59.649,0:10:04.230
entirely by voice. For other input[br]mechanisms. You could use like a sip and
0:10:04.230,0:10:09.010
puff device or you could use physical[br]styluses. That's something that I do a
0:10:09.010,0:10:13.320
lot, actually, because for me, my fingers[br]get sore. And if I can hold a stylus in my
0:10:13.320,0:10:19.220
hand and kind of not use my fingers, then[br]that's very effective. So and the Elecom
0:10:19.220,0:10:23.750
styluses from a Japanese company are the[br]lightest I've found and they don't require
0:10:23.750,0:10:30.131
a lot of force. So the ones at the top[br]there are they're like 12 grams and the
0:10:30.131,0:10:34.160
one on the bottom is 4.7 grams. And you've[br]got almost no force to use them. So very
0:10:34.160,0:10:38.040
nice on the left there you can see the[br]Android speech recognition is built into
0:10:38.040,0:10:41.860
the keyboard now. Right. You can just[br]press that and start speaking. It
0:10:41.860,0:10:46.160
supports different languages, and it's[br]very accurate, it's very nice. And
0:10:46.160,0:10:51.470
actually, when I was working at Google for[br]a bit, I talked to the speech recognition
0:10:51.470,0:10:54.470
team as like: Why are you doing on[br]server speech recognition? You should do
0:10:54.470,0:10:58.029
it on the devices. But of course, Android[br]devices are, they're all very different
0:10:58.029,0:11:02.529
and many of them are not very powerful. So[br]they were having trouble getting
0:11:02.529,0:11:06.450
satisfactory speech recognition on the[br]device. So for a long time, there's some
0:11:06.450,0:11:10.630
server latency, server lag that you do[br]speech recognition and you wait a bit. And
0:11:10.630,0:11:14.190
then sometime this year, I just was using[br]speech recognition and it became so much
0:11:14.190,0:11:18.360
faster. I was extremely excited and I[br]looked into it and yeah, they just
0:11:18.360,0:11:22.000
switched on my device. At least they[br]switched on the On device speech recognition
0:11:22.000,0:11:25.710
model. And so now it's incredibly fast and[br]also incredibly accurate. I'm a huge fan
0:11:25.710,0:11:30.949
of it. On the right hand side. We can[br]actually see the voice access interface.
0:11:30.949,0:11:34.899
So this is meant to allow you to use a[br]phone entirely by voice. Again, while I
0:11:34.899,0:11:37.940
was at Google, I tried the the beta[br]version before it was publicly released
0:11:37.940,0:11:43.529
and I was like, this is pretty bad, mostly[br]because it did, it lacked completeness.
0:11:43.529,0:11:47.209
There would be things on the screen that[br]would not be selected. So here we see show
0:11:47.209,0:11:52.510
labels. And then I can I can say like four,[br]five, six, whatever, to tap on that
0:11:52.510,0:11:57.070
thing. But as you can see at the bottom,[br]there was like a Twitter Web app link and
0:11:57.070,0:12:00.140
there's no number on it. So if I want to[br]click on that, I'm out of luck. And this
0:12:00.140,0:12:06.500
is actually a problem in the design of the[br]accessibility interface that it only, it
0:12:06.500,0:12:11.519
doesn't expose the full DOM. It exposes[br]only a subset of it. And so an
0:12:11.519,0:12:18.959
accessibility mechanism can't ever see[br]those other things. And furthermore, the
0:12:18.959,0:12:22.279
way the Google speech recognition works,[br]they have to reestablish a new connection
0:12:22.279,0:12:26.480
every 30 seconds. And if you're in the[br]middle of speaking, it will just throw
0:12:26.480,0:12:29.959
away whatever you were saying because it[br]just decided it had to reconnect, which is
0:12:29.959,0:12:34.610
really unfortunate. They later released[br]that publicly and then sometime this year
0:12:34.610,0:12:39.860
they did the update, which is pretty nice.[br]It now has like a mouse grid, which lets,
0:12:39.860,0:12:44.050
which solves a lot of the completeness[br]problems. Like you can, you can use a grid
0:12:44.050,0:12:50.040
to narrow down somewhere on the screen and[br]then tap there. But the server issues and
0:12:50.040,0:12:54.870
the expert use is still not good, like, if[br]I want to turn it, if I want to do
0:12:54.870,0:12:59.540
something with the mouse grid, I have to[br]say "mouse grid on. 6. 5. mouse grid off".
0:12:59.540,0:13:02.899
And I can't combine those together. So[br]there's a lot of latency and it's not
0:13:02.899,0:13:09.611
really that fun to use, but better than[br]nothing? Absolutely! I just want to really
0:13:09.611,0:13:13.149
briefly show you as well that this same[br]feature of like being able to select links
0:13:13.149,0:13:17.209
on a screen is available on desktops. This[br]is a plug in for Chrome called Vimium. And
0:13:17.209,0:13:22.670
it's very powerful because you can then[br]combine this with keyboards or other input
0:13:22.670,0:13:26.650
mechanisms. And this one is complete. It[br]uses the entire DOM and anything you can
0:13:26.650,0:13:31.130
click on will be highlighted. So very[br]nice. I just want to give a quick example
0:13:31.130,0:13:35.380
of me using some of these systems. So I've[br]been trying to learn Japanese and there's
0:13:35.380,0:13:39.130
a couple of highly regarded websites for[br]this, but they're not consistent. When I
0:13:39.130,0:13:43.829
use the browser show labels like, you[br]know, the thing to press next page or
0:13:43.829,0:13:47.970
something like that or like, you know, I[br]give up or whatever it is, it keeps
0:13:47.970,0:13:51.980
changing. So the letters that are being[br]used keep changing. And that's because of
0:13:51.980,0:13:56.500
the dynamic way that they're generating[br]the HTML. So not really very useful. What
0:13:56.500,0:14:01.160
I do instead is I use a program called[br]Anki and that has very simple shortcuts in
0:14:01.160,0:14:06.410
its desktop app. One, two, three, four. So[br]it's nice to use and consistent and it's
0:14:06.410,0:14:11.530
syncs with an Android app and then I can[br]use my stylus on the Android device. So it
0:14:11.530,0:14:16.450
works pretty well. But even so, as you can[br]see from the chart in the bottom there,
0:14:16.450,0:14:20.220
there are many days when I can't use this,[br]even though I would like to, because I've
0:14:20.220,0:14:25.770
overused my hands or overused my voice.[br]When I'm using voice recognition all day,
0:14:25.770,0:14:28.649
every day, I do tend to lose my voice. And[br]as you can see from the graph, sometimes I
0:14:28.649,0:14:33.700
lose it for a week or two at a time. So[br]same thing with any accessibility
0:14:33.700,0:14:38.410
interface, you know, you've got to use[br]many different techniques and it's always,
0:14:38.410,0:14:44.259
it's never perfect is just the best you[br]can do at that moment. Something else I
0:14:44.259,0:14:49.770
like to do is read books. I read a lot of[br]books and I love e-book readers, the
0:14:49.770,0:14:54.139
dedicated e-ink displays. You can read them[br]in sunlight, they last forever, battery
0:14:54.139,0:14:59.060
wise. Unfortunately, it's hard to add other[br]input mechanisms to them. They don't have
0:14:59.060,0:15:03.569
microphones or other sensors and you can't[br]really install custom software on them.
0:15:03.569,0:15:07.250
But for Android based devices and there[br]are also like e-book reading apps for
0:15:07.250,0:15:10.399
Android devices, they have everything you[br]can install custom software and they have
0:15:10.399,0:15:15.569
microphones and many other sensors. So I[br]made two apps that allow you to read
0:15:15.569,0:15:21.319
e-books with an e-book reader. The first[br]one is Voice Next Page. It's based on one
0:15:21.319,0:15:25.759
of my speech recognition engines called[br]Silvius, and it does do server based
0:15:25.759,0:15:29.290
recognition. So you have to capture all[br]the audio, use 300 kilobits a second to
0:15:29.290,0:15:35.560
send it to the server and recognize things[br]like next page, previous page. However, it
0:15:35.560,0:15:40.329
doesn't cut out every 30 seconds. It keeps[br]going. So that's that's one win for it I
0:15:40.329,0:15:46.470
guess. And it is published in the Play[br]store. Huge thanks to Sarah Leventhal, who
0:15:46.470,0:15:49.670
did a lot of the implementation. Very[br]complicated to make an accessibility app
0:15:49.670,0:15:55.819
on Android. But we persevered and it works[br]quite nicely. So I'm going to actually
0:15:55.819,0:16:03.149
show you an example of voice next page.[br]This over here is my phone on the left
0:16:03.149,0:16:08.649
hand side just captured so that you guys[br]can see it. So here's the Voice Next Page.
0:16:08.649,0:16:13.820
And basically the connection is green. I[br]can do, the server is up and running and
0:16:13.820,0:16:19.700
so on. I just press start and then I'll[br]switch to an Android reading app and say,
0:16:19.700,0:16:23.120
next page, previous page. I won't speak[br]otherwise because it will chapel
0:16:23.120,0:16:26.400
everything I'm saying.
0:16:32.910,0:16:34.880
Next Page
0:16:36.090,0:16:37.640
Next Page
0:16:38.310,0:16:40.100
Previous Page
0:16:41.520,0:16:42.860
Center
0:16:43.680,0:16:45.030
Center
0:16:46.620,0:16:48.120
Foreground
0:16:49.155,0:16:50.845
Stop listening
0:16:54.960,0:16:58.680
So that's a demo of [br]The Voice Next Page, and it's
0:16:58.680,0:17:03.259
extremely helpful. I built it a couple of[br]years ago along with Sarah, and I use it a
0:17:03.259,0:17:07.800
lot. So, yeah, you can go ahead and[br]download it if you guys wanna try it out.
0:17:07.800,0:17:12.530
And the other one is called Blink Next[br]Page. So the idea for this, I got this
0:17:12.530,0:17:18.260
idea from a research paper this year that[br]was studying eyelid gestures. I didn't use
0:17:18.260,0:17:24.210
any of their code, but it's a great idea.[br]So the way this works is you detect blinks
0:17:24.210,0:17:28.590
by using the Android camera and then you[br]can trigger an action like turning pages
0:17:28.590,0:17:34.330
in an e-book reader. This actually doesn't[br]need any networking. It's able to use the
0:17:34.330,0:17:38.960
on device face recognition models from[br]Google, and it is still under development.
0:17:38.960,0:17:44.630
So it's not on the play store yet, but it[br]is working. And, you know, please contact
0:17:44.630,0:17:54.430
me if you want to try it. So just give me[br]one moment to set that demo up here. So
0:17:54.430,0:18:00.590
I'm going to use... The main problem with[br]this current implementation is that it
0:18:00.590,0:18:07.030
uses two devices. So that was easier to[br]implement. And I use two devices anyway.
0:18:07.030,0:18:14.040
But obviously I want a one device version[br]if I'm actually going to use it for
0:18:14.040,0:18:18.281
anything. So here's how this works. This[br]device I point at me, at my eyes, the
0:18:18.281,0:18:24.010
other device I put wherever it's[br]convenient to read, ups sorry, and if I blink
0:18:24.010,0:18:28.780
my eyes, the phone will buzz once it[br]detects that I blink my eyes and it will
0:18:28.780,0:18:35.410
turn the page automatically on the other[br]Android device. Now I have to blink both
0:18:35.410,0:18:41.500
my eyes for half a second. If I want to go[br]backwards, I can blink just my left eye.
0:18:41.500,0:18:49.510
And if I want to go forwards like quickly,[br]I can blink my right eye and hold it. (background buzzing)
0:18:49.510,0:18:54.640
Anyway, it does have some false positives.[br]That's why like you can go backwards in
0:18:54.640,0:18:59.790
case it detects that you've accidentally[br]flipped the page. And lighting is also
0:18:59.790,0:19:03.560
very important. Like if I have a light[br]behind me, then this is not going to be
0:19:03.560,0:19:07.760
able to identify whether my eyes are open[br]or closed properly. So it has some
0:19:07.760,0:19:19.150
limitations, but very simple to use. So[br]I'm a big fan. OK, so that's enough about
0:19:19.150,0:19:23.760
Android devices, let's talk very briefly[br]about desktop computers. So if you're
0:19:23.760,0:19:27.450
going to use a desktop computer, of[br]course, try using that show labels plugin
0:19:27.450,0:19:33.210
in a browser. For native apps you can try[br]Dragon NaturallySpeaking, which is fine if
0:19:33.210,0:19:37.190
you're just like using basic things. But[br]if you're trying to do complicated things,
0:19:37.190,0:19:40.830
you should definitely use a voice coding[br]system. You could also consider using eye
0:19:40.830,0:19:45.810
tracking to replace a mouse. I personally,[br]I don't use that. I find it hurts my eyes,
0:19:45.810,0:19:50.400
but I do use a trackball with very little[br]force and a wacom tablet. Some people will
0:19:50.400,0:19:55.640
even scroll up and down by humming, for[br]example, but I don't have that setup.
0:19:55.640,0:20:00.600
There's a bunch of nice talks out there on[br]voice coding. The top left is Tavis Rudds
0:20:00.600,0:20:06.110
talk from many years ago that got many of[br]us interested. Emily Shea gave a talk
0:20:06.110,0:20:10.971
there about best practices for voice[br]coding. And then I gave a talk a couple of
0:20:10.971,0:20:16.470
years ago at the Hope 11 conference, which[br]you can also check out. It's mostly out of
0:20:16.470,0:20:21.560
date by now, but it's still interesting.[br]So there are a lot of voice coding
0:20:21.560,0:20:27.660
systems, the sort of grandfather of them[br]all is Dragonfly. It's become a grammar
0:20:27.660,0:20:35.370
standard. Caster is if you're willing to[br]memorize lots of unusual words, you can
0:20:35.370,0:20:40.950
become much better, much faster than I[br]currently am at voice coding. aenea is how
0:20:40.950,0:20:45.710
you originally used Dragon to work on a[br]Linux machine, for example, because Dragon
0:20:45.710,0:20:52.620
only runs on Windows. Talon is a closed[br]source program, which is, but it's very
0:20:52.620,0:20:56.790
powerful. Has a big user base, especially[br]for Mac OS. There are ports now. And Talon
0:20:56.790,0:21:04.640
used to use Dragon, but it's now using a[br]speech system from Facebook. Silvius is
0:21:04.640,0:21:09.640
the system that I created, the models are[br]not very accurate, but it's a nice
0:21:09.640,0:21:12.910
architecture where there's client- server,[br]so it makes it easy to build things like
0:21:12.910,0:21:18.130
the voice next page. So Voice next page[br]was using Silvius. And then the the most
0:21:18.130,0:21:22.390
recent one I think on this list is kaldi-[br]active-grammar, which is extremely
0:21:22.390,0:21:26.420
powerful and extremely customizable. And[br]it's also open source. It works on all
0:21:26.420,0:21:29.590
platforms. So I really highly recommend[br]that. So let's talk a bit more about
0:21:29.590,0:21:35.300
kaldi-active-grammar. But first, for voice[br]coding, I've already mentioned, you have
0:21:35.300,0:21:38.890
to be careful how you use your voice[br]right. Breathe from your belly. Don't
0:21:38.890,0:21:42.180
tighten your muscles and breathe from your[br]chest. Try to speak normally. And I'm not
0:21:42.180,0:21:45.230
particularly good at this. Like you'll[br]hear me when I'm speaking commands that my
0:21:45.230,0:21:50.550
inflection changes. So I do tend to[br]overuse my voice, but you just have to be
0:21:50.550,0:21:53.780
conscious of that. The microphone hardware[br]does matter. I do recommend like a blue
0:21:53.780,0:21:59.801
yeti on a microphone arm that you can pull[br]and put close to your face like this. I
0:21:59.801,0:22:04.340
will use this one for my speaking demo[br]and. Yeah. And the other thing is your
0:22:04.340,0:22:08.190
grammar is fully customizable. So if you[br]keep saying a word and the system doesn't
0:22:08.190,0:22:14.190
recognize it, just change it to another[br]word. And it's complete in the sense you
0:22:14.190,0:22:17.680
can type any key on the keyboard. And the[br]most important thing for expert use or
0:22:17.680,0:22:22.120
customizability is that you can do[br]chaining. So with the voice coding system,
0:22:22.120,0:22:27.040
you can say multiple commands at once. If[br]there's, and it's a huge time saving,
0:22:27.040,0:22:32.140
you'll see what I mean when I give a quick[br]demo. When I do voice coding, I'm a very
0:22:32.140,0:22:39.150
heavy vim and tmux user. You know, there[br]have been I've worked with many people
0:22:39.150,0:22:41.870
before, so I have some cheat sheet[br]information there. So if you're
0:22:41.870,0:22:45.130
interested, you can go check that out. But[br]yeah, let's just do a quick demo of voice
0:22:45.130,0:22:54.350
coding here. "Turn this mic on". "Desk left[br]two". "Control delta", "open new terminal".
0:22:54.350,0:22:59.930
"Charlie delta space slash tango mike papa[br]enter". "Command vim". "Hotel hotel point
0:22:59.930,0:23:08.720
charlie papa papa, enter". "India , hash[br]word include space langel", "india oscar word
0:23:08.720,0:23:16.030
stream rangel, enter, enter", "india noi[br]tango space word mean", "no mike arch india
0:23:16.030,0:23:23.750
noi space len ren space lace enter enter[br]race up tab word print fox scratch nope code
0:23:23.750,0:23:31.080
standard charlie oscar uniform tango space[br]langel langel space quote. Sentence hello,
0:23:31.080,0:23:40.250
voice coding bang, scratch six delta india[br]noi golf, bang, backslash, noi quote
0:23:40.250,0:23:46.340
semicolon act sky fox mike romeo noi oscar[br]word return space number zero semicolon
0:23:46.340,0:23:53.450
act vim save and quit. Golf plus plus[br]space hotel hotel tab minus oscar space
0:23:53.450,0:24:03.840
hotel hotel enter. Point slash hotel hotel[br]enter. Desk right. So that's just a quick
0:24:03.840,0:24:09.010
example of voice coding, you can use it to[br]write any programing language, you can use
0:24:09.010,0:24:13.881
it to control anything on your desktop.[br]It's very powerful. It has a bit of a
0:24:13.881,0:24:18.990
learning curve, but it's very powerful. So[br]the creator of kaldi-active-grammar is
0:24:18.990,0:24:26.050
also named David. I'm named David, but[br]just a coincidence. And he says of kaldi-
0:24:26.050,0:24:31.260
active-grammar, that I haven't typed with[br]the keyboard in many years and kaldi-
0:24:31.260,0:24:35.640
active-grammar is bootstrapped in that I[br]have been developing it entirely using the
0:24:35.640,0:24:42.490
previous versions of it. So, David has a[br]medical condition that means he has very
0:24:42.490,0:24:48.270
low dexterity, so it's hard for him to use[br]a keyboard. And yet he basically got
0:24:48.270,0:24:53.000
kaldi-active-grammar working through the[br]skin of his teeth or something and then
0:24:53.000,0:24:58.710
continues to develop it using it. And[br]yeah, I'm a huge fan of the project. I
0:24:58.710,0:25:02.640
haven't contributed much, but I did give[br]some of the hardware resources like GPU
0:25:02.640,0:25:08.100
and CPU compute resources to allow[br]training to happen. But I would also like
0:25:08.100,0:25:12.970
to show you a video of David using kaldi-[br]active-grammar, just, so you can see it as
0:25:12.970,0:25:20.780
well. So, the other thing about David is,[br]that he has a speech impediment or a
0:25:20.780,0:25:25.000
speech, I don't know, an accent or[br]whatever. So it's difficult to, for a
0:25:25.000,0:25:28.060
normal speech recognition system, to[br]understand him. And you might have trouble
0:25:28.060,0:25:31.050
understanding him here. But you can see in[br]the lower right, what the speech system
0:25:31.050,0:25:37.390
understands what he's saying. Oh, I[br]realized, that I do need to switch
0:25:37.390,0:25:41.502
something in OBS, so that you guys can[br]hear it. Sorry. There you go.
0:25:41.502,0:26:03.430
(Other) David using kaldi-active-grammar system (not understandable)
0:26:03.430,0:26:05.900
Here, you get the idea and hopefully, you
0:26:05.900,0:26:10.530
guys were able to hear that. If not, you[br]can also find this on the website that I'm
0:26:10.530,0:26:18.350
going to show you at the end. One other[br]thing, I want to show you about this is,
0:26:18.350,0:26:23.010
David has actually set up this humming to[br]scroll, which I think is pretty cool. Of
0:26:23.010,0:26:28.260
course, I've gone and turned off the OBS[br]there. But he's just doing hmmm and it's
0:26:28.260,0:26:33.240
understanding that and scrolling down. So,[br]something that I'm able to do with my
0:26:33.240,0:26:41.730
trackball, but he's using his voice for,[br]so pretty cool. So I'm almost done here.
0:26:41.730,0:26:46.550
In summary, good input accessibility means[br]you need completeness, consistency and
0:26:46.550,0:26:49.591
customization. You need to be able to do[br]any action that you could do with the
0:26:49.591,0:26:55.110
other input mechanisms. And doing the same[br]input should have the same action. And
0:26:55.110,0:27:00.210
remember, your users will become experts,[br]so the system needs to be designed for
0:27:00.210,0:27:05.640
that. For e-book reading: Yes, I'm trying[br]to allow anyone to read, even if they're
0:27:05.640,0:27:10.860
experiencing some severe physical or motor[br]impairment, because I think that gives you
0:27:10.860,0:27:15.031
a lot of power to be able to turn the[br]pages and read your favorite books. And
0:27:15.031,0:27:19.270
for speech recognition, yeah, Android[br]speech recognition is very good. Silvius
0:27:19.270,0:27:23.490
accuracy is not so good, but it's easy to[br]use quickly for experimentation and to
0:27:23.490,0:27:28.150
make other types of things like Voice Next[br]Page. And please do check out kaldi-
0:27:28.150,0:27:33.850
active-grammar if you have some serious[br]need for voice recognition. Lastly, I put
0:27:33.850,0:27:39.050
all of this onto a website, voxhub.io, so[br]you can see Voice Next Page, Blink Next
0:27:39.050,0:27:42.100
Page, kaldi-active-grammar and so on, just[br]instructions for how to use it and how to
0:27:42.100,0:27:47.130
set it up. So please do check that out.[br]And tons of acknowledgments, lots of
0:27:47.130,0:27:50.030
people that have helped me along the way,[br]but I want to especially call out
0:27:50.030,0:27:53.700
Professor Sang-Mook Lee, who actually[br]invited me to Korea a couple of times to
0:27:53.700,0:27:58.140
give talks - a big inspiration. And of[br]course, David Zurow, who has actually been
0:27:58.140,0:28:02.900
able to bootstrap into a fully voice[br]coding environment. So that's all I have
0:28:02.900,0:28:07.300
for today. Thank you very much.
0:28:07.300,0:28:15.600
Herald: Alright, I suppose I'm back on the[br]air, so let me see. I want to remind
0:28:15.600,0:28:21.780
everyone before we go into the Q&A that[br]you can ask your questions for this talk
0:28:21.780,0:28:25.880
on IRC, the link is under the video, or[br]you can use Twitter or the Fediverse with
0:28:25.880,0:28:34.380
the hashtag #rc3two. Again, I'll hold it[br]up here, "rc3two".
0:28:34.380,0:28:38.680
Thanks for your talk, David. That was[br]really interesting. Thanks for talk,
0:28:38.680,0:28:47.160
David. I, yeah, I think we have a couple[br]of questions from the Signal Angels.
0:28:47.160,0:28:50.600
Before that, I just wanted to say I've[br]recently spent some time playing with a
0:28:50.600,0:28:56.900
like the VoiceOver system in iOS and that[br]can now actually tell you what is on a
0:28:56.900,0:29:03.210
photo, which is kind of amazing. Oh, by[br]the way, I can't hear you here on on the
0:29:03.210,0:29:05.470
Mumble.[br]David: Yeah. Sorry, I wasn't saying
0:29:05.470,0:29:10.440
anything. Yeah, no, it's so I focused[br]mostly on input accessability, right?
0:29:10.440,0:29:13.890
Which is like how do you get data to the[br]computer. But there's been huge
0:29:13.890,0:29:16.610
improvements in the other way around as[br]well, right? The computer doing VoiceOver
0:29:16.610,0:29:19.150
things.[br]Herald: So we have about let's see,
0:29:19.150,0:29:25.010
five-six minutes left at least for Q&A. We[br]have a question by Toby++, he asks: "Your
0:29:25.010,0:29:29.080
next page application looks cool. Do you[br]have statistics of how many people use it
0:29:29.080,0:29:35.650
or found it on the App Store?"[br]David: Not very many. The Voice Next Page
0:29:35.650,0:29:40.950
was advertised only so far as a little[br]academic poster. So I've gotten a few
0:29:40.950,0:29:46.310
people to use it. But I run eight[br]concurrent workers and we've never hit
0:29:46.310,0:29:51.560
more than that. (laughs) So not super popular,[br]but I do hope that some people will see it
0:29:51.560,0:29:54.891
because of this talk and go and check out.[br]Herald: That's cool. Next question. How
0:29:54.891,0:30:00.000
error prone are the speech recognition[br]systems at all? E.g., can you do coding
0:30:00.000,0:30:06.490
while doing workouts?[br]David: So one thing about speech
0:30:06.490,0:30:09.640
recognition is very sensitive to the[br]microphone, so when you're doing it
0:30:09.640,0:30:38.270
Technical malfunction. We'll be back soon.
0:30:38.270,0:30:40.650
David (cont.): Any mistakes, right?
0:30:40.650,0:30:43.830
That's the thing about having low latency,[br]you just say something and you watch it
0:30:43.830,0:30:47.870
and you make sure that it was what you[br]wanted to say. I don't know exactly how
0:30:47.870,0:30:52.010
many words per minute I can say with voice[br]coding, but I can say it much faster than
0:30:52.010,0:30:55.500
regular speech. So I'd say at least like[br]200, maybe 300 words per minute.
0:30:55.500,0:30:57.050
So it's actually a very high bandwidth[br]mechanism.
0:30:57.050,0:31:02.590
Herald: That's really awesome. A question from[br]peppyjndivos: "Any advice for software
0:31:02.590,0:31:07.760
authors to make their stuff more[br]accessible?"
0:31:07.760,0:31:15.420
David: There are good web accessibility[br]guidelines. So if you're just making a
0:31:15.420,0:31:19.240
website or something, I would definitely[br]follow those. They tend to be focused more
0:31:19.240,0:31:24.350
on people that are blind because that is,[br]you know, it's more of an obvious fail.
0:31:24.350,0:31:29.880
like they just can't interact at all with[br]your website. But things like, you know,
0:31:29.880,0:31:36.580
if Duolingo, for example, had used the[br]same, like, the same accessibility access
0:31:36.580,0:31:40.360
tag on their, like, next button, then they[br]would always be the same letter for me and
0:31:40.360,0:31:46.400
I wouldn't have to be like Fox-Charlie ,[br]Fox-Delta, Fox-something - changes all the
0:31:46.400,0:31:51.850
time. So I think consistency is very[br]important. And integrating with any
0:31:51.850,0:31:57.690
existing accessibility APIs is also a very[br]important - Web APIs, Android APIs and so
0:31:57.690,0:32:01.730
on, because, you know, we can't make every[br]program out there like voice compatible.
0:32:01.730,0:32:05.360
We just have to meet in the middle where[br]they interact at the keyboard layer or the
0:32:05.360,0:32:08.490
accessibility layer.[br]Herald: Awesome. AmericN has a question,
0:32:08.490,0:32:13.730
wonders if these systems use similar[br]approaches like stenography with mnemonics
0:32:13.730,0:32:18.530
or if there's any projects working having[br]that in mind.
0:32:18.530,0:32:26.830
David: A very good question. So, the first[br]thing everyone uses is the NATO phonetic
0:32:26.830,0:32:32.900
alphabet to spell letters, for example,[br]Alpha. Bravo, Charlie. Some people then
0:32:32.900,0:32:38.910
will substitute letters for things that[br]are too long, like November. I use noi.
0:32:38.910,0:32:41.690
Sometimes the speech system doesn't[br]understand you. Whenever I said Alpha,
0:32:41.690,0:32:45.620
Dragon was like, oh, you're saying[br]"offer". So I changed it. It's Arch for
0:32:45.620,0:32:53.300
me, Arch, Brav, Char. So, and also most of[br]these grammars are in a common grammar
0:32:53.300,0:32:56.640
format. They are written in Python and[br]they're compatible with Dragonfly. So you
0:32:56.640,0:33:00.920
can grab a grammar for, I don't know, for[br]Aenea and get it to work with kaldi-
0:33:00.920,0:33:04.550
active-grammar with very little effort. I[br]actually have a grammar that works on both
0:33:04.550,0:33:10.970
Aenea and kaldi-active-grammar, and that's[br]what I use. So there's a bit of lingua
0:33:10.970,0:33:14.060
franca, I guess, you can kind of guess[br]what other people are using. But at the
0:33:14.060,0:33:19.190
same time there's a lot of customization,[br]you know, because people change words,
0:33:19.190,0:33:23.160
they add their own commands, they change[br]words based on what the speech system
0:33:23.160,0:33:27.150
understands.[br]Herald: Alright, LEB asks, is there an online
0:33:27.150,0:33:32.130
community you can propose for[br]accessibility technologies?
0:33:32.130,0:33:40.460
David: There's an amazing forum for anything[br]related to voice coding. All the
0:33:40.460,0:33:51.560
developers of new voice coding software[br]are there. Sorry, I just need to drink. So
0:33:51.560,0:33:56.760
it's a really fantastic resource. I do[br]link to it from voxhub.io. I believe it's
0:33:56.760,0:34:01.690
at the bottom of the kaldi-active-grammar[br]page. So you can definitely check that
0:34:01.690,0:34:07.450
out. For general accessibility, I don't[br]know, I could recommend the accessibility
0:34:07.450,0:34:11.530
mailing list at Google, but that's only if[br]you work at Google. Other than that, yeah,
0:34:11.530,0:34:16.240
I think it depends on your community,[br]right? I think if you're looking for web
0:34:16.240,0:34:20.220
accessibility, you could go for some[br]Mozilla mailing list and so on. If you're
0:34:20.220,0:34:24.509
looking for desktop accessibility, then[br]maybe you could go find some stuff about
0:34:24.509,0:34:29.579
the Windows Speech API. unintelligible[br]Herald: One last question from Joe Neilson.
0:34:29.579,0:34:34.730
Could there be legal issues if you make an[br]e-book into audio? I'm not sure what that
0:34:34.730,0:34:42.849
refers to.[br]David: Yeah. So if you are like doing, if
0:34:42.849,0:34:45.780
you're using a screen reader and you're[br]like, you try to get it to read out the
0:34:45.780,0:34:55.059
contents of an e-book, right? So most,[br]most of the time there are fair use
0:34:55.059,0:35:02.609
exceptions for copyright law, even in the[br]US, and making a copy yourself for
0:35:02.609,0:35:08.661
personal purposes so that you can access[br]it is usually considered fair use. If you
0:35:08.661,0:35:14.079
were trying to commercialize it or make[br]money off of that or like, I don't know,
0:35:14.079,0:35:18.270
you're a famous streamer and all you do is[br]highlight text and have it read it out,
0:35:18.270,0:35:21.280
then maybe, but I would say that[br]definitely falls under fair use.
0:35:21.280,0:35:26.740
Herald: Alright. So I guess that's it for[br]the talk. I think we're hitting the timing
0:35:26.740,0:35:30.380
mark really well. Thank you so much,[br]David, for that. That was really, really
0:35:30.380,0:35:36.160
interesting. I learned a lot and thanks[br]everyone for watching and stay on. I think
0:35:36.160,0:35:40.369
there might be some news coming up. Thanks[br]and everyone.
0:35:40.369,0:35:55.640
rc3 postroll music
0:35:55.640,0:36:18.549
Subtitles created by c3subtitles.de[br]in the year 2020. Join, and help us!