0:00:00.000,0:00:12.260 rc3 preroll music 0:00:12.260,0:00:17.930 Herald: All right, so again, let's[br]introduce the next talk, accessible inputs 0:00:17.930,0:00:25.320 for readers, coders and hackers, the talk[br]by David Williams-King about custom off-, 0:00:25.320,0:00:30.230 well, not off the shelf, but custom[br]accessibility solutions. He will give you 0:00:30.230,0:00:35.420 some demonstrations and that includes his[br]own custom made voice input, an added link 0:00:35.420,0:00:38.110 system. Here is David Williams-King 0:00:40.020,0:00:46.440 David: Thank you for the introduction.[br]Let's go ahead and get started. So, yeah, 0:00:46.440,0:00:50.650 I'm talking about accessibility,[br]particularly accessible input for readers, 0:00:50.650,0:00:57.840 coders and hackers. So what do I mean by[br]accessibility? I mean people that have 0:00:57.840,0:01:02.780 physical or motor impairments. This could[br]be due to repetitive strain injury, carpal 0:01:02.780,0:01:08.030 tunnel, all kinds of medical conditions.[br]If you have this type of thing, you 0:01:08.030,0:01:11.860 probably can't use a normal computer[br]keyboard, computer mouse or even a phone 0:01:11.860,0:01:18.720 touch screen. However, technology does[br]allow users to interact with these devices 0:01:18.720,0:01:23.780 just using different forms of input. And[br]it's really valuable to these people 0:01:23.780,0:01:28.909 because, you know, being able to interact[br]with the device provides some agency they 0:01:28.909,0:01:32.781 can they can do things on their own and it[br]provides a means of communication with the 0:01:32.781,0:01:38.439 outside world. So it's an important[br]problem to look at. And it's what I care 0:01:38.439,0:01:45.200 about a lot. Let's talk a bit about me for[br]a moment. I'm a systems security person. I 0:01:45.200,0:01:49.920 did a phd in cybersecurity at Columbia. If[br]you're interested in low level software 0:01:49.920,0:01:54.509 defenses, you can look that up. And I'm[br]currently the CTO at a startup called 0:01:54.509,0:02:03.360 Elpha Secure. I started developing medical[br]issues in around 2014. And as a result of 0:02:03.360,0:02:07.770 that, in an ongoing fashion, I can only[br]type a few thousand keystrokes per day. 0:02:07.770,0:02:12.090 Roughly fifteen thousand is my maximum.[br]That sounds like a lot, but imagine you're 0:02:12.090,0:02:17.069 typing at a hundred words per minute.[br]That's five hundred characters per minute, 0:02:17.069,0:02:23.349 which means it takes you 30 minutes to hit[br]fifteen thousand characters. So 0:02:23.349,0:02:29.519 essentially I, I can work like the[br]equivalent of a fast programmer for, for 0:02:29.519,0:02:33.700 half an hour. And then after that I would[br]be unable to use my hands for anything, 0:02:33.700,0:02:38.420 including like preparing food for myself[br]or opening, closing doors and so on. So I 0:02:38.420,0:02:42.189 have to be very careful about my hand use[br]and actually have a little program that 0:02:42.189,0:02:46.690 you can see on the slide there that[br]measures the keystrokes for me so I can 0:02:46.690,0:02:51.809 tell it when I'm going over. So what do I[br]do? Well, I do a lot of pair programming, 0:02:51.809,0:02:56.650 for sure. I log into the same machine as[br]other people and we work together. I'm 0:02:56.650,0:03:00.430 also a very heavy user of speech[br]recognition and I gave a talk at that 0:03:00.430,0:03:06.900 about voice coding with speech recognition[br]at the Hope 11 conference. So you can go 0:03:06.900,0:03:15.419 check that out if you're interested. So[br]when I talk about accessible input, I mean 0:03:15.419,0:03:18.779 different ways that a human can provide[br]input to a computer. So ergonomic 0:03:18.779,0:03:23.019 keyboards are a simple one. Speech[br]recognition, eye tracking or gaze tracking 0:03:23.019,0:03:26.699 so you can see where you're looking[br]or where you're pointing your head and 0:03:26.699,0:03:32.229 maybe use that to replace a mouse, that's[br]head gestures, I suppose. And there's 0:03:32.229,0:03:38.650 always this distinction between bespoke,[br]like custom input mechanisms and somewhat 0:03:38.650,0:03:44.499 mainstream ones. So I'll give you some[br]examples. You've probably heard of Stephen 0:03:44.499,0:03:50.230 Hawking. He's a very famous professor, and[br]he was actually a bit of an extreme case. 0:03:50.230,0:03:56.142 He had, he was diagnosed with ALS when he[br]was 21. So his his physical 0:03:56.142,0:04:00.669 ability, abilities degraded over the years[br]because he lived for many decades after 0:04:00.669,0:04:05.059 that and he went through many[br]communication mechanisms. Initially his 0:04:05.059,0:04:08.309 speech changed so that it was only[br]intelligible to his family and close 0:04:08.309,0:04:14.239 friends, but he was still able to speak.[br]And then after that he would work with the 0:04:14.239,0:04:19.440 human interpreter and raise his eyebrows[br]to pick various letters. And then and keep 0:04:19.440,0:04:24.690 in mind, this is like the 60s or 70s,[br]right? So computers were not really where 0:04:24.690,0:04:29.840 they are today. Later he would operate a[br]switch with one hand, just like on off on 0:04:29.840,0:04:35.009 off, kind of morse code and select from a[br]bank of words. And that was around 15 0:04:35.009,0:04:41.080 words per minute. Eventually, he was[br]unable to move his hand, so a team of 0:04:41.080,0:04:44.490 engineers from Intel worked with him and[br]they figured out, they were trying to do 0:04:44.490,0:04:48.229 like brain scans and all kinds of stuff.[br]But again, this was like in the eighties, 0:04:48.229,0:04:54.599 so there was not not too much they could[br]do. So they basically just created some 0:04:54.599,0:04:59.120 custom software to detect muscle movements[br]in his cheek. And he used that with 0:04:59.120,0:05:03.550 predictive, predictive words, the same way[br]that a phone, smartphone keyboard will 0:05:03.550,0:05:07.180 predict which word you want to say next.[br]Stephen Hawking, used something similar to 0:05:07.180,0:05:12.689 that, except instead of swiping on a[br]phone, he was moving his cheek muscles, so 0:05:12.689,0:05:17.810 that's obviously a sequence of like highly[br]customized input mechanisms for, for 0:05:17.810,0:05:23.979 someone very, very specialized for that[br]person. I also want to talk about someone 0:05:23.979,0:05:29.592 else named Professor Sang-Mook Lee, whom[br]I've met. That was me when I had more of a 0:05:29.592,0:05:36.180 beard than I do now. He he's a professor[br]at Seoul National University in South 0:05:36.180,0:05:42.969 Korea. And he sometimes called like the[br]Korean Stephen Hawking, because he's a big 0:05:42.969,0:05:47.990 advocate for people with disabilities.[br]Anyway, what he uses is you can 0:05:47.990,0:05:52.360 see a little orange device near his mouth[br]there. It's called a sip and puff mouse 0:05:52.360,0:05:56.930 so he can blow into it and suck air[br]through it and also move it around. And 0:05:56.930,0:06:02.280 that acts as a mouse cursor on the Android[br]device in front of him. It will move the 0:06:02.280,0:06:08.229 cursor around and click when he when he[br]blows air and so on. So that combined 0:06:08.229,0:06:13.909 with speech recognition, lets him use[br]mainstream Android hardware. He still has 0:06:13.909,0:06:21.249 access to, you know, email apps and like[br]Web Browsers and like Maps and everything 0:06:21.249,0:06:26.159 that comes on a normal Android device. So[br]he's way more capable than Stephen 0:06:26.159,0:06:29.949 Hawking, as who could, Stephen Hawking[br]could communicate, but just to a person at 0:06:29.949,0:06:35.830 a very slow rate. Right. Part of it's due[br]to the nature of his injury. But it's also 0:06:35.830,0:06:43.939 a testament to how far the technology has[br]improved. So let's talk a little bit about 0:06:43.939,0:06:49.480 what makes good accessibility. I think[br]performance is very important, right? You 0:06:49.480,0:06:53.889 want high accuracy. You don't want typos,[br]low latency. I don't want to speak and 0:06:53.889,0:06:58.389 then five seconds later have words appear.[br]It's too long, especially if I have to 0:06:58.389,0:07:02.509 make corrections. Right. And you want high[br]throughput, which we already talked about. 0:07:02.509,0:07:06.240 Oh, I forgot to mention Stephen Hawking[br]had like 15 words per minute. A normal 0:07:06.240,0:07:12.349 person speaking is 150. So that's [br]a big difference. (laughs) The higher 0:07:12.349,0:07:16.479 throughput you can get, the better. And[br]for input accessibility, I think and this 0:07:16.479,0:07:20.879 is not scientific. This is just what I've[br]learned from using myself and observing 0:07:20.879,0:07:25.330 many of these systems. I think it's[br]important to get completeness, consistency 0:07:25.330,0:07:31.479 and customization. For completeness I[br]mean, can I do any action? So Stephen or 0:07:31.479,0:07:40.590 Professor Sang-Mook Lee, his, his orange[br]mouth input device, the sip and puff is 0:07:40.590,0:07:44.199 quite powerful, but it doesn't let him do[br]every action. For example, for some reason 0:07:44.199,0:07:48.379 when he gets an incoming call, the the[br]input doesn't work. So he has to call over 0:07:48.379,0:07:52.430 a person physically to tap the accept call[br]button or the reject call button, which is 0:07:52.430,0:07:55.729 really annoying. Right. If you don't have[br]completeness, you can't be fully 0:07:55.729,0:08:01.729 independent. Consistency, very important[br]as well. The same way we develop motor 0:08:01.729,0:08:07.580 memory for muscle memory, for a keyboard.[br]You develop memory for any types of 0:08:07.580,0:08:11.690 patterns that you do. But if the thing you[br]say or the thing you do keeps changing in 0:08:11.690,0:08:18.220 order to do the same action. That's not[br]good. And finally, customization. So the 0:08:18.220,0:08:22.809 learning curve for beginners is important[br]for any accessibility device, but 0:08:22.809,0:08:27.150 designing for expert use is almost more[br]important because anyone who uses an 0:08:27.150,0:08:31.229 accessibility interface becomes an expert[br]at it. The example I like to give is 0:08:31.229,0:08:35.440 screen readers like a blind person using a[br]screen reader on a phone. They will crank 0:08:35.440,0:08:41.880 up the speed at which the speech is being[br]produced. And I actually met someone who 0:08:41.880,0:08:46.321 made his speech 16 times faster than[br]normal human speech. I could not 0:08:46.341,0:08:51.020 understand it at all, I sound like brbrbrbr, but [br]he could understand it perfectly. And that's just 0:08:51.020,0:08:56.190 because he used it so much that he's[br]become an expert at its use. Let's analyze 0:08:56.190,0:09:01.040 ergonomic keyboards just for a moment,[br]because it's fun. You know, they are kind 0:09:01.040,0:09:04.260 of like a normal keyboard. They'll have a,[br]you'll have a slow pace when you're 0:09:04.260,0:09:07.630 starting to learn them. But once you're[br]good at it, you have very good accuracy, 0:09:07.630,0:09:11.709 like instantaneous low latency. Right. You[br]press the key, the computer receives it 0:09:11.709,0:09:17.510 immediately and very high throughput. It[br]has high as you are on a regular keyboard. 0:09:17.510,0:09:20.329 So they're actually fantastic[br]accessibility devices, right. They're 0:09:20.329,0:09:23.950 completely compatible with original[br]keyboards. And if all you need is an 0:09:23.950,0:09:28.600 ergonomic keyboard, then you're in luck[br]because it's a very good accessibility 0:09:28.600,0:09:34.480 device. I'm going to talk about two[br]things, computers, but also Android 0:09:34.480,0:09:39.750 devices, so let's start with Android[br]devices. Yes, the built in voice 0:09:39.750,0:09:43.340 recognition and Android is really[br]incredible. So even though the microphones 0:09:43.340,0:09:47.000 on the devices aren't great, Google has[br]just collected so much data from so many 0:09:47.000,0:09:51.590 different sources that they've built like[br]better than human accuracy for for their 0:09:51.590,0:09:56.570 voice recognition. The voice accessibility[br]interface is kind of so so we'll talk 0:09:56.570,0:09:59.649 about that in a bit. That's the interface[br]where you can control the Android device 0:09:59.649,0:10:04.230 entirely by voice. For other input[br]mechanisms. You could use like a sip and 0:10:04.230,0:10:09.010 puff device or you could use physical[br]styluses. That's something that I do a 0:10:09.010,0:10:13.320 lot, actually, because for me, my fingers[br]get sore. And if I can hold a stylus in my 0:10:13.320,0:10:19.220 hand and kind of not use my fingers, then[br]that's very effective. So and the Elecom 0:10:19.220,0:10:23.750 styluses from a Japanese company are the[br]lightest I've found and they don't require 0:10:23.750,0:10:30.131 a lot of force. So the ones at the top[br]there are they're like 12 grams and the 0:10:30.131,0:10:34.160 one on the bottom is 4.7 grams. And you've[br]got almost no force to use them. So very 0:10:34.160,0:10:38.040 nice on the left there you can see the[br]Android speech recognition is built into 0:10:38.040,0:10:41.860 the keyboard now. Right. You can just[br]press that and start speaking. It 0:10:41.860,0:10:46.160 supports different languages, and it's[br]very accurate, it's very nice. And 0:10:46.160,0:10:51.470 actually, when I was working at Google for[br]a bit, I talked to the speech recognition 0:10:51.470,0:10:54.470 team as like: Why are you doing on[br]server speech recognition? You should do 0:10:54.470,0:10:58.029 it on the devices. But of course, Android[br]devices are, they're all very different 0:10:58.029,0:11:02.529 and many of them are not very powerful. So[br]they were having trouble getting 0:11:02.529,0:11:06.450 satisfactory speech recognition on the[br]device. So for a long time, there's some 0:11:06.450,0:11:10.630 server latency, server lag that you do[br]speech recognition and you wait a bit. And 0:11:10.630,0:11:14.190 then sometime this year, I just was using[br]speech recognition and it became so much 0:11:14.190,0:11:18.360 faster. I was extremely excited and I[br]looked into it and yeah, they just 0:11:18.360,0:11:22.000 switched on my device. At least they[br]switched on the On device speech recognition 0:11:22.000,0:11:25.710 model. And so now it's incredibly fast and[br]also incredibly accurate. I'm a huge fan 0:11:25.710,0:11:30.949 of it. On the right hand side. We can[br]actually see the voice access interface. 0:11:30.949,0:11:34.899 So this is meant to allow you to use a[br]phone entirely by voice. Again, while I 0:11:34.899,0:11:37.940 was at Google, I tried the the beta[br]version before it was publicly released 0:11:37.940,0:11:43.529 and I was like, this is pretty bad, mostly[br]because it did, it lacked completeness. 0:11:43.529,0:11:47.209 There would be things on the screen that[br]would not be selected. So here we see show 0:11:47.209,0:11:52.510 labels. And then I can I can say like four,[br]five, six, whatever, to tap on that 0:11:52.510,0:11:57.070 thing. But as you can see at the bottom,[br]there was like a Twitter Web app link and 0:11:57.070,0:12:00.140 there's no number on it. So if I want to[br]click on that, I'm out of luck. And this 0:12:00.140,0:12:06.500 is actually a problem in the design of the[br]accessibility interface that it only, it 0:12:06.500,0:12:11.519 doesn't expose the full DOM. It exposes[br]only a subset of it. And so an 0:12:11.519,0:12:18.959 accessibility mechanism can't ever see[br]those other things. And furthermore, the 0:12:18.959,0:12:22.279 way the Google speech recognition works,[br]they have to reestablish a new connection 0:12:22.279,0:12:26.480 every 30 seconds. And if you're in the[br]middle of speaking, it will just throw 0:12:26.480,0:12:29.959 away whatever you were saying because it[br]just decided it had to reconnect, which is 0:12:29.959,0:12:34.610 really unfortunate. They later released[br]that publicly and then sometime this year 0:12:34.610,0:12:39.860 they did the update, which is pretty nice.[br]It now has like a mouse grid, which lets, 0:12:39.860,0:12:44.050 which solves a lot of the completeness[br]problems. Like you can, you can use a grid 0:12:44.050,0:12:50.040 to narrow down somewhere on the screen and[br]then tap there. But the server issues and 0:12:50.040,0:12:54.870 the expert use is still not good, like, if[br]I want to turn it, if I want to do 0:12:54.870,0:12:59.540 something with the mouse grid, I have to[br]say "mouse grid on. 6. 5. mouse grid off". 0:12:59.540,0:13:02.899 And I can't combine those together. So[br]there's a lot of latency and it's not 0:13:02.899,0:13:09.611 really that fun to use, but better than[br]nothing? Absolutely! I just want to really 0:13:09.611,0:13:13.149 briefly show you as well that this same[br]feature of like being able to select links 0:13:13.149,0:13:17.209 on a screen is available on desktops. This[br]is a plug in for Chrome called Vimium. And 0:13:17.209,0:13:22.670 it's very powerful because you can then[br]combine this with keyboards or other input 0:13:22.670,0:13:26.650 mechanisms. And this one is complete. It[br]uses the entire DOM and anything you can 0:13:26.650,0:13:31.130 click on will be highlighted. So very[br]nice. I just want to give a quick example 0:13:31.130,0:13:35.380 of me using some of these systems. So I've[br]been trying to learn Japanese and there's 0:13:35.380,0:13:39.130 a couple of highly regarded websites for[br]this, but they're not consistent. When I 0:13:39.130,0:13:43.829 use the browser show labels like, you[br]know, the thing to press next page or 0:13:43.829,0:13:47.970 something like that or like, you know, I[br]give up or whatever it is, it keeps 0:13:47.970,0:13:51.980 changing. So the letters that are being[br]used keep changing. And that's because of 0:13:51.980,0:13:56.500 the dynamic way that they're generating[br]the HTML. So not really very useful. What 0:13:56.500,0:14:01.160 I do instead is I use a program called[br]Anki and that has very simple shortcuts in 0:14:01.160,0:14:06.410 its desktop app. One, two, three, four. So[br]it's nice to use and consistent and it's 0:14:06.410,0:14:11.530 syncs with an Android app and then I can[br]use my stylus on the Android device. So it 0:14:11.530,0:14:16.450 works pretty well. But even so, as you can[br]see from the chart in the bottom there, 0:14:16.450,0:14:20.220 there are many days when I can't use this,[br]even though I would like to, because I've 0:14:20.220,0:14:25.770 overused my hands or overused my voice.[br]When I'm using voice recognition all day, 0:14:25.770,0:14:28.649 every day, I do tend to lose my voice. And[br]as you can see from the graph, sometimes I 0:14:28.649,0:14:33.700 lose it for a week or two at a time. So[br]same thing with any accessibility 0:14:33.700,0:14:38.410 interface, you know, you've got to use[br]many different techniques and it's always, 0:14:38.410,0:14:44.259 it's never perfect is just the best you[br]can do at that moment. Something else I 0:14:44.259,0:14:49.770 like to do is read books. I read a lot of[br]books and I love e-book readers, the 0:14:49.770,0:14:54.139 dedicated e-ink displays. You can read them[br]in sunlight, they last forever, battery 0:14:54.139,0:14:59.060 wise. Unfortunately, it's hard to add other[br]input mechanisms to them. They don't have 0:14:59.060,0:15:03.569 microphones or other sensors and you can't[br]really install custom software on them. 0:15:03.569,0:15:07.250 But for Android based devices and there[br]are also like e-book reading apps for 0:15:07.250,0:15:10.399 Android devices, they have everything you[br]can install custom software and they have 0:15:10.399,0:15:15.569 microphones and many other sensors. So I[br]made two apps that allow you to read 0:15:15.569,0:15:21.319 e-books with an e-book reader. The first[br]one is Voice Next Page. It's based on one 0:15:21.319,0:15:25.759 of my speech recognition engines called[br]Silvius, and it does do server based 0:15:25.759,0:15:29.290 recognition. So you have to capture all[br]the audio, use 300 kilobits a second to 0:15:29.290,0:15:35.560 send it to the server and recognize things[br]like next page, previous page. However, it 0:15:35.560,0:15:40.329 doesn't cut out every 30 seconds. It keeps[br]going. So that's that's one win for it I 0:15:40.329,0:15:46.470 guess. And it is published in the Play[br]store. Huge thanks to Sarah Leventhal, who 0:15:46.470,0:15:49.670 did a lot of the implementation. Very[br]complicated to make an accessibility app 0:15:49.670,0:15:55.819 on Android. But we persevered and it works[br]quite nicely. So I'm going to actually 0:15:55.819,0:16:03.149 show you an example of voice next page.[br]This over here is my phone on the left 0:16:03.149,0:16:08.649 hand side just captured so that you guys[br]can see it. So here's the Voice Next Page. 0:16:08.649,0:16:13.820 And basically the connection is green. I[br]can do, the server is up and running and 0:16:13.820,0:16:19.700 so on. I just press start and then I'll[br]switch to an Android reading app and say, 0:16:19.700,0:16:23.120 next page, previous page. I won't speak[br]otherwise because it will chapel 0:16:23.120,0:16:26.400 everything I'm saying. 0:16:32.910,0:16:34.880 Next Page 0:16:36.090,0:16:37.640 Next Page 0:16:38.310,0:16:40.100 Previous Page 0:16:41.520,0:16:42.860 Center 0:16:43.680,0:16:45.030 Center 0:16:46.620,0:16:48.120 Foreground 0:16:49.155,0:16:50.845 Stop listening 0:16:54.960,0:16:58.680 So that's a demo of [br]The Voice Next Page, and it's 0:16:58.680,0:17:03.259 extremely helpful. I built it a couple of[br]years ago along with Sarah, and I use it a 0:17:03.259,0:17:07.800 lot. So, yeah, you can go ahead and[br]download it if you guys wanna try it out. 0:17:07.800,0:17:12.530 And the other one is called Blink Next[br]Page. So the idea for this, I got this 0:17:12.530,0:17:18.260 idea from a research paper this year that[br]was studying eyelid gestures. I didn't use 0:17:18.260,0:17:24.210 any of their code, but it's a great idea.[br]So the way this works is you detect blinks 0:17:24.210,0:17:28.590 by using the Android camera and then you[br]can trigger an action like turning pages 0:17:28.590,0:17:34.330 in an e-book reader. This actually doesn't[br]need any networking. It's able to use the 0:17:34.330,0:17:38.960 on device face recognition models from[br]Google, and it is still under development. 0:17:38.960,0:17:44.630 So it's not on the play store yet, but it[br]is working. And, you know, please contact 0:17:44.630,0:17:54.430 me if you want to try it. So just give me[br]one moment to set that demo up here. So 0:17:54.430,0:18:00.590 I'm going to use... The main problem with[br]this current implementation is that it 0:18:00.590,0:18:07.030 uses two devices. So that was easier to[br]implement. And I use two devices anyway. 0:18:07.030,0:18:14.040 But obviously I want a one device version[br]if I'm actually going to use it for 0:18:14.040,0:18:18.281 anything. So here's how this works. This[br]device I point at me, at my eyes, the 0:18:18.281,0:18:24.010 other device I put wherever it's[br]convenient to read, ups sorry, and if I blink 0:18:24.010,0:18:28.780 my eyes, the phone will buzz once it[br]detects that I blink my eyes and it will 0:18:28.780,0:18:35.410 turn the page automatically on the other[br]Android device. Now I have to blink both 0:18:35.410,0:18:41.500 my eyes for half a second. If I want to go[br]backwards, I can blink just my left eye. 0:18:41.500,0:18:49.510 And if I want to go forwards like quickly,[br]I can blink my right eye and hold it. (background buzzing) 0:18:49.510,0:18:54.640 Anyway, it does have some false positives.[br]That's why like you can go backwards in 0:18:54.640,0:18:59.790 case it detects that you've accidentally[br]flipped the page. And lighting is also 0:18:59.790,0:19:03.560 very important. Like if I have a light[br]behind me, then this is not going to be 0:19:03.560,0:19:07.760 able to identify whether my eyes are open[br]or closed properly. So it has some 0:19:07.760,0:19:19.150 limitations, but very simple to use. So[br]I'm a big fan. OK, so that's enough about 0:19:19.150,0:19:23.760 Android devices, let's talk very briefly[br]about desktop computers. So if you're 0:19:23.760,0:19:27.450 going to use a desktop computer, of[br]course, try using that show labels plugin 0:19:27.450,0:19:33.210 in a browser. For native apps you can try[br]Dragon NaturallySpeaking, which is fine if 0:19:33.210,0:19:37.190 you're just like using basic things. But[br]if you're trying to do complicated things, 0:19:37.190,0:19:40.830 you should definitely use a voice coding[br]system. You could also consider using eye 0:19:40.830,0:19:45.810 tracking to replace a mouse. I personally,[br]I don't use that. I find it hurts my eyes, 0:19:45.810,0:19:50.400 but I do use a trackball with very little[br]force and a wacom tablet. Some people will 0:19:50.400,0:19:55.640 even scroll up and down by humming, for[br]example, but I don't have that setup. 0:19:55.640,0:20:00.600 There's a bunch of nice talks out there on[br]voice coding. The top left is Tavis Rudds 0:20:00.600,0:20:06.110 talk from many years ago that got many of[br]us interested. Emily Shea gave a talk 0:20:06.110,0:20:10.971 there about best practices for voice[br]coding. And then I gave a talk a couple of 0:20:10.971,0:20:16.470 years ago at the Hope 11 conference, which[br]you can also check out. It's mostly out of 0:20:16.470,0:20:21.560 date by now, but it's still interesting.[br]So there are a lot of voice coding 0:20:21.560,0:20:27.660 systems, the sort of grandfather of them[br]all is Dragonfly. It's become a grammar 0:20:27.660,0:20:35.370 standard. Caster is if you're willing to[br]memorize lots of unusual words, you can 0:20:35.370,0:20:40.950 become much better, much faster than I[br]currently am at voice coding. aenea is how 0:20:40.950,0:20:45.710 you originally used Dragon to work on a[br]Linux machine, for example, because Dragon 0:20:45.710,0:20:52.620 only runs on Windows. Talon is a closed[br]source program, which is, but it's very 0:20:52.620,0:20:56.790 powerful. Has a big user base, especially[br]for Mac OS. There are ports now. And Talon 0:20:56.790,0:21:04.640 used to use Dragon, but it's now using a[br]speech system from Facebook. Silvius is 0:21:04.640,0:21:09.640 the system that I created, the models are[br]not very accurate, but it's a nice 0:21:09.640,0:21:12.910 architecture where there's client- server,[br]so it makes it easy to build things like 0:21:12.910,0:21:18.130 the voice next page. So Voice next page[br]was using Silvius. And then the the most 0:21:18.130,0:21:22.390 recent one I think on this list is kaldi-[br]active-grammar, which is extremely 0:21:22.390,0:21:26.420 powerful and extremely customizable. And[br]it's also open source. It works on all 0:21:26.420,0:21:29.590 platforms. So I really highly recommend[br]that. So let's talk a bit more about 0:21:29.590,0:21:35.300 kaldi-active-grammar. But first, for voice[br]coding, I've already mentioned, you have 0:21:35.300,0:21:38.890 to be careful how you use your voice[br]right. Breathe from your belly. Don't 0:21:38.890,0:21:42.180 tighten your muscles and breathe from your[br]chest. Try to speak normally. And I'm not 0:21:42.180,0:21:45.230 particularly good at this. Like you'll[br]hear me when I'm speaking commands that my 0:21:45.230,0:21:50.550 inflection changes. So I do tend to[br]overuse my voice, but you just have to be 0:21:50.550,0:21:53.780 conscious of that. The microphone hardware[br]does matter. I do recommend like a blue 0:21:53.780,0:21:59.801 yeti on a microphone arm that you can pull[br]and put close to your face like this. I 0:21:59.801,0:22:04.340 will use this one for my speaking demo[br]and. Yeah. And the other thing is your 0:22:04.340,0:22:08.190 grammar is fully customizable. So if you[br]keep saying a word and the system doesn't 0:22:08.190,0:22:14.190 recognize it, just change it to another[br]word. And it's complete in the sense you 0:22:14.190,0:22:17.680 can type any key on the keyboard. And the[br]most important thing for expert use or 0:22:17.680,0:22:22.120 customizability is that you can do[br]chaining. So with the voice coding system, 0:22:22.120,0:22:27.040 you can say multiple commands at once. If[br]there's, and it's a huge time saving, 0:22:27.040,0:22:32.140 you'll see what I mean when I give a quick[br]demo. When I do voice coding, I'm a very 0:22:32.140,0:22:39.150 heavy vim and tmux user. You know, there[br]have been I've worked with many people 0:22:39.150,0:22:41.870 before, so I have some cheat sheet[br]information there. So if you're 0:22:41.870,0:22:45.130 interested, you can go check that out. But[br]yeah, let's just do a quick demo of voice 0:22:45.130,0:22:54.350 coding here. "Turn this mic on". "Desk left[br]two". "Control delta", "open new terminal". 0:22:54.350,0:22:59.930 "Charlie delta space slash tango mike papa[br]enter". "Command vim". "Hotel hotel point 0:22:59.930,0:23:08.720 charlie papa papa, enter". "India , hash[br]word include space langel", "india oscar word 0:23:08.720,0:23:16.030 stream rangel, enter, enter", "india noi[br]tango space word mean", "no mike arch india 0:23:16.030,0:23:23.750 noi space len ren space lace enter enter[br]race up tab word print fox scratch nope code 0:23:23.750,0:23:31.080 standard charlie oscar uniform tango space[br]langel langel space quote. Sentence hello, 0:23:31.080,0:23:40.250 voice coding bang, scratch six delta india[br]noi golf, bang, backslash, noi quote 0:23:40.250,0:23:46.340 semicolon act sky fox mike romeo noi oscar[br]word return space number zero semicolon 0:23:46.340,0:23:53.450 act vim save and quit. Golf plus plus[br]space hotel hotel tab minus oscar space 0:23:53.450,0:24:03.840 hotel hotel enter. Point slash hotel hotel[br]enter. Desk right. So that's just a quick 0:24:03.840,0:24:09.010 example of voice coding, you can use it to[br]write any programing language, you can use 0:24:09.010,0:24:13.881 it to control anything on your desktop.[br]It's very powerful. It has a bit of a 0:24:13.881,0:24:18.990 learning curve, but it's very powerful. So[br]the creator of kaldi-active-grammar is 0:24:18.990,0:24:26.050 also named David. I'm named David, but[br]just a coincidence. And he says of kaldi- 0:24:26.050,0:24:31.260 active-grammar, that I haven't typed with[br]the keyboard in many years and kaldi- 0:24:31.260,0:24:35.640 active-grammar is bootstrapped in that I[br]have been developing it entirely using the 0:24:35.640,0:24:42.490 previous versions of it. So, David has a[br]medical condition that means he has very 0:24:42.490,0:24:48.270 low dexterity, so it's hard for him to use[br]a keyboard. And yet he basically got 0:24:48.270,0:24:53.000 kaldi-active-grammar working through the[br]skin of his teeth or something and then 0:24:53.000,0:24:58.710 continues to develop it using it. And[br]yeah, I'm a huge fan of the project. I 0:24:58.710,0:25:02.640 haven't contributed much, but I did give[br]some of the hardware resources like GPU 0:25:02.640,0:25:08.100 and CPU compute resources to allow[br]training to happen. But I would also like 0:25:08.100,0:25:12.970 to show you a video of David using kaldi-[br]active-grammar, just, so you can see it as 0:25:12.970,0:25:20.780 well. So, the other thing about David is,[br]that he has a speech impediment or a 0:25:20.780,0:25:25.000 speech, I don't know, an accent or[br]whatever. So it's difficult to, for a 0:25:25.000,0:25:28.060 normal speech recognition system, to[br]understand him. And you might have trouble 0:25:28.060,0:25:31.050 understanding him here. But you can see in[br]the lower right, what the speech system 0:25:31.050,0:25:37.390 understands what he's saying. Oh, I[br]realized, that I do need to switch 0:25:37.390,0:25:41.502 something in OBS, so that you guys can[br]hear it. Sorry. There you go. 0:25:41.502,0:26:03.430 (Other) David using kaldi-active-grammar system (not understandable) 0:26:03.430,0:26:05.900 Here, you get the idea and hopefully, you 0:26:05.900,0:26:10.530 guys were able to hear that. If not, you[br]can also find this on the website that I'm 0:26:10.530,0:26:18.350 going to show you at the end. One other[br]thing, I want to show you about this is, 0:26:18.350,0:26:23.010 David has actually set up this humming to[br]scroll, which I think is pretty cool. Of 0:26:23.010,0:26:28.260 course, I've gone and turned off the OBS[br]there. But he's just doing hmmm and it's 0:26:28.260,0:26:33.240 understanding that and scrolling down. So,[br]something that I'm able to do with my 0:26:33.240,0:26:41.730 trackball, but he's using his voice for,[br]so pretty cool. So I'm almost done here. 0:26:41.730,0:26:46.550 In summary, good input accessibility means[br]you need completeness, consistency and 0:26:46.550,0:26:49.591 customization. You need to be able to do[br]any action that you could do with the 0:26:49.591,0:26:55.110 other input mechanisms. And doing the same[br]input should have the same action. And 0:26:55.110,0:27:00.210 remember, your users will become experts,[br]so the system needs to be designed for 0:27:00.210,0:27:05.640 that. For e-book reading: Yes, I'm trying[br]to allow anyone to read, even if they're 0:27:05.640,0:27:10.860 experiencing some severe physical or motor[br]impairment, because I think that gives you 0:27:10.860,0:27:15.031 a lot of power to be able to turn the[br]pages and read your favorite books. And 0:27:15.031,0:27:19.270 for speech recognition, yeah, Android[br]speech recognition is very good. Silvius 0:27:19.270,0:27:23.490 accuracy is not so good, but it's easy to[br]use quickly for experimentation and to 0:27:23.490,0:27:28.150 make other types of things like Voice Next[br]Page. And please do check out kaldi- 0:27:28.150,0:27:33.850 active-grammar if you have some serious[br]need for voice recognition. Lastly, I put 0:27:33.850,0:27:39.050 all of this onto a website, voxhub.io, so[br]you can see Voice Next Page, Blink Next 0:27:39.050,0:27:42.100 Page, kaldi-active-grammar and so on, just[br]instructions for how to use it and how to 0:27:42.100,0:27:47.130 set it up. So please do check that out.[br]And tons of acknowledgments, lots of 0:27:47.130,0:27:50.030 people that have helped me along the way,[br]but I want to especially call out 0:27:50.030,0:27:53.700 Professor Sang-Mook Lee, who actually[br]invited me to Korea a couple of times to 0:27:53.700,0:27:58.140 give talks - a big inspiration. And of[br]course, David Zurow, who has actually been 0:27:58.140,0:28:02.900 able to bootstrap into a fully voice[br]coding environment. So that's all I have 0:28:02.900,0:28:07.300 for today. Thank you very much. 0:28:07.300,0:28:15.600 Herald: Alright, I suppose I'm back on the[br]air, so let me see. I want to remind 0:28:15.600,0:28:21.780 everyone before we go into the Q&A that[br]you can ask your questions for this talk 0:28:21.780,0:28:25.880 on IRC, the link is under the video, or[br]you can use Twitter or the Fediverse with 0:28:25.880,0:28:34.380 the hashtag #rc3two. Again, I'll hold it[br]up here, "rc3two". 0:28:34.380,0:28:38.680 Thanks for your talk, David. That was[br]really interesting. Thanks for talk, 0:28:38.680,0:28:47.160 David. I, yeah, I think we have a couple[br]of questions from the Signal Angels. 0:28:47.160,0:28:50.600 Before that, I just wanted to say I've[br]recently spent some time playing with a 0:28:50.600,0:28:56.900 like the VoiceOver system in iOS and that[br]can now actually tell you what is on a 0:28:56.900,0:29:03.210 photo, which is kind of amazing. Oh, by[br]the way, I can't hear you here on on the 0:29:03.210,0:29:05.470 Mumble.[br]David: Yeah. Sorry, I wasn't saying 0:29:05.470,0:29:10.440 anything. Yeah, no, it's so I focused[br]mostly on input accessability, right? 0:29:10.440,0:29:13.890 Which is like how do you get data to the[br]computer. But there's been huge 0:29:13.890,0:29:16.610 improvements in the other way around as[br]well, right? The computer doing VoiceOver 0:29:16.610,0:29:19.150 things.[br]Herald: So we have about let's see, 0:29:19.150,0:29:25.010 five-six minutes left at least for Q&A. We[br]have a question by Toby++, he asks: "Your 0:29:25.010,0:29:29.080 next page application looks cool. Do you[br]have statistics of how many people use it 0:29:29.080,0:29:35.650 or found it on the App Store?"[br]David: Not very many. The Voice Next Page 0:29:35.650,0:29:40.950 was advertised only so far as a little[br]academic poster. So I've gotten a few 0:29:40.950,0:29:46.310 people to use it. But I run eight[br]concurrent workers and we've never hit 0:29:46.310,0:29:51.560 more than that. (laughs) So not super popular,[br]but I do hope that some people will see it 0:29:51.560,0:29:54.891 because of this talk and go and check out.[br]Herald: That's cool. Next question. How 0:29:54.891,0:30:00.000 error prone are the speech recognition[br]systems at all? E.g., can you do coding 0:30:00.000,0:30:06.490 while doing workouts?[br]David: So one thing about speech 0:30:06.490,0:30:09.640 recognition is very sensitive to the[br]microphone, so when you're doing it 0:30:09.640,0:30:38.270 Technical malfunction. We'll be back soon. 0:30:38.270,0:30:40.650 David (cont.): Any mistakes, right? 0:30:40.650,0:30:43.830 That's the thing about having low latency,[br]you just say something and you watch it 0:30:43.830,0:30:47.870 and you make sure that it was what you[br]wanted to say. I don't know exactly how 0:30:47.870,0:30:52.010 many words per minute I can say with voice[br]coding, but I can say it much faster than 0:30:52.010,0:30:55.500 regular speech. So I'd say at least like[br]200, maybe 300 words per minute. 0:30:55.500,0:30:57.050 So it's actually a very high bandwidth[br]mechanism. 0:30:57.050,0:31:02.590 Herald: That's really awesome. A question from[br]peppyjndivos: "Any advice for software 0:31:02.590,0:31:07.760 authors to make their stuff more[br]accessible?" 0:31:07.760,0:31:15.420 David: There are good web accessibility[br]guidelines. So if you're just making a 0:31:15.420,0:31:19.240 website or something, I would definitely[br]follow those. They tend to be focused more 0:31:19.240,0:31:24.350 on people that are blind because that is,[br]you know, it's more of an obvious fail. 0:31:24.350,0:31:29.880 like they just can't interact at all with[br]your website. But things like, you know, 0:31:29.880,0:31:36.580 if Duolingo, for example, had used the[br]same, like, the same accessibility access 0:31:36.580,0:31:40.360 tag on their, like, next button, then they[br]would always be the same letter for me and 0:31:40.360,0:31:46.400 I wouldn't have to be like Fox-Charlie ,[br]Fox-Delta, Fox-something - changes all the 0:31:46.400,0:31:51.850 time. So I think consistency is very[br]important. And integrating with any 0:31:51.850,0:31:57.690 existing accessibility APIs is also a very[br]important - Web APIs, Android APIs and so 0:31:57.690,0:32:01.730 on, because, you know, we can't make every[br]program out there like voice compatible. 0:32:01.730,0:32:05.360 We just have to meet in the middle where[br]they interact at the keyboard layer or the 0:32:05.360,0:32:08.490 accessibility layer.[br]Herald: Awesome. AmericN has a question, 0:32:08.490,0:32:13.730 wonders if these systems use similar[br]approaches like stenography with mnemonics 0:32:13.730,0:32:18.530 or if there's any projects working having[br]that in mind. 0:32:18.530,0:32:26.830 David: A very good question. So, the first[br]thing everyone uses is the NATO phonetic 0:32:26.830,0:32:32.900 alphabet to spell letters, for example,[br]Alpha. Bravo, Charlie. Some people then 0:32:32.900,0:32:38.910 will substitute letters for things that[br]are too long, like November. I use noi. 0:32:38.910,0:32:41.690 Sometimes the speech system doesn't[br]understand you. Whenever I said Alpha, 0:32:41.690,0:32:45.620 Dragon was like, oh, you're saying[br]"offer". So I changed it. It's Arch for 0:32:45.620,0:32:53.300 me, Arch, Brav, Char. So, and also most of[br]these grammars are in a common grammar 0:32:53.300,0:32:56.640 format. They are written in Python and[br]they're compatible with Dragonfly. So you 0:32:56.640,0:33:00.920 can grab a grammar for, I don't know, for[br]Aenea and get it to work with kaldi- 0:33:00.920,0:33:04.550 active-grammar with very little effort. I[br]actually have a grammar that works on both 0:33:04.550,0:33:10.970 Aenea and kaldi-active-grammar, and that's[br]what I use. So there's a bit of lingua 0:33:10.970,0:33:14.060 franca, I guess, you can kind of guess[br]what other people are using. But at the 0:33:14.060,0:33:19.190 same time there's a lot of customization,[br]you know, because people change words, 0:33:19.190,0:33:23.160 they add their own commands, they change[br]words based on what the speech system 0:33:23.160,0:33:27.150 understands.[br]Herald: Alright, LEB asks, is there an online 0:33:27.150,0:33:32.130 community you can propose for[br]accessibility technologies? 0:33:32.130,0:33:40.460 David: There's an amazing forum for anything[br]related to voice coding. All the 0:33:40.460,0:33:51.560 developers of new voice coding software[br]are there. Sorry, I just need to drink. So 0:33:51.560,0:33:56.760 it's a really fantastic resource. I do[br]link to it from voxhub.io. I believe it's 0:33:56.760,0:34:01.690 at the bottom of the kaldi-active-grammar[br]page. So you can definitely check that 0:34:01.690,0:34:07.450 out. For general accessibility, I don't[br]know, I could recommend the accessibility 0:34:07.450,0:34:11.530 mailing list at Google, but that's only if[br]you work at Google. Other than that, yeah, 0:34:11.530,0:34:16.240 I think it depends on your community,[br]right? I think if you're looking for web 0:34:16.240,0:34:20.220 accessibility, you could go for some[br]Mozilla mailing list and so on. If you're 0:34:20.220,0:34:24.509 looking for desktop accessibility, then[br]maybe you could go find some stuff about 0:34:24.509,0:34:29.579 the Windows Speech API. unintelligible[br]Herald: One last question from Joe Neilson. 0:34:29.579,0:34:34.730 Could there be legal issues if you make an[br]e-book into audio? I'm not sure what that 0:34:34.730,0:34:42.849 refers to.[br]David: Yeah. So if you are like doing, if 0:34:42.849,0:34:45.780 you're using a screen reader and you're[br]like, you try to get it to read out the 0:34:45.780,0:34:55.059 contents of an e-book, right? So most,[br]most of the time there are fair use 0:34:55.059,0:35:02.609 exceptions for copyright law, even in the[br]US, and making a copy yourself for 0:35:02.609,0:35:08.661 personal purposes so that you can access[br]it is usually considered fair use. If you 0:35:08.661,0:35:14.079 were trying to commercialize it or make[br]money off of that or like, I don't know, 0:35:14.079,0:35:18.270 you're a famous streamer and all you do is[br]highlight text and have it read it out, 0:35:18.270,0:35:21.280 then maybe, but I would say that[br]definitely falls under fair use. 0:35:21.280,0:35:26.740 Herald: Alright. So I guess that's it for[br]the talk. I think we're hitting the timing 0:35:26.740,0:35:30.380 mark really well. Thank you so much,[br]David, for that. That was really, really 0:35:30.380,0:35:36.160 interesting. I learned a lot and thanks[br]everyone for watching and stay on. I think 0:35:36.160,0:35:40.369 there might be some news coming up. Thanks[br]and everyone. 0:35:40.369,0:35:55.640 rc3 postroll music 0:35:55.640,0:36:18.549 Subtitles created by c3subtitles.de[br]in the year 2020. Join, and help us!