-
rc3 preroll music
-
Herald: All right, so again, let's
introduce the next talk, accessible inputs
-
for readers, coders and hackers, the talk
by David Williams-King about custom off-,
-
well, not off the shelf, but custom
accessibility solutions. He will give you
-
some demonstrations and that includes his
own custom made voice input, an added link
-
system. Here is David Williams-King
-
David: Thank you for the introduction.
Let's go ahead and get started. So, yeah,
-
I'm talking about accessibility,
particularly accessible input for readers,
-
coders and hackers. So what do I mean by
accessibility? I mean people that have
-
physical or motor impairments. This could
be due to repetitive strain injury, carpal
-
tunnel, all kinds of medical conditions.
If you have this type of thing, you
-
probably can't use a normal computer
keyboard, computer mouse or even a phone
-
touch screen. However, technology does
allow users to interact with these devices
-
just using different forms of input. And
it's really valuable to these people
-
because, you know, being able to interact
with the device provides some agency they
-
can they can do things on their own and it
provides a means of communication with the
-
outside world. So it's an important
problem to look at. And it's what I care
-
about a lot. Let's talk a bit about me for
a moment. I'm a systems security person. I
-
did a phd in cybersecurity at Columbia. If
you're interested in low level software
-
defenses, you can look that up. And I'm
currently the CTO at a startup called
-
Elpha Secure. I started developing medical
issues in around 2014. And as a result of
-
that, in an ongoing fashion, I can only
type a few thousand keystrokes per day.
-
Roughly fifteen thousand is my maximum.
That sounds like a lot, but imagine you're
-
typing at a hundred words per minute.
That's five hundred characters per minute,
-
which means it takes you 30 minutes to hit
fifteen thousand characters. So
-
essentially I, I can work like the
equivalent of a fast programmer for, for
-
half an hour. And then after that I would
be unable to use my hands for anything,
-
including like preparing food for myself
or opening, closing doors and so on. So I
-
have to be very careful about my hand use
and actually have a little program that
-
you can see on the slide there that
measures the keystrokes for me so I can
-
tell it when I'm going over. So what do I
do? Well, I do a lot of pair programming,
-
for sure. I log into the same machine as
other people and we work together. I'm
-
also a very heavy user of speech
recognition and I gave a talk at that
-
about voice coding with speech recognition
at the Hope 11 conference. So you can go
-
check that out if you're interested. So
when I talk about accessible input, I mean
-
different ways that a human can provide
input to a computer. So ergonomic
-
keyboards are a simple one. Speech
recognition, eye tracking or gaze tracking
-
so you can see where you're looking
or where you're pointing your head and
-
maybe use that to replace a mouse, that's
head gestures, I suppose. And there's
-
always this distinction between bespoke,
like custom input mechanisms and somewhat
-
mainstream ones. So I'll give you some
examples. You've probably heard of Stephen
-
Hawking. He's a very famous professor, and
he was actually a bit of an extreme case.
-
He had, he was diagnosed with ALS when he
was 21. So his his physical
-
ability, abilities degraded over the years
because he lived for many decades after
-
that and he went through many
communication mechanisms. Initially his
-
speech changed so that it was only
intelligible to his family and close
-
friends, but he was still able to speak.
And then after that he would work with the
-
human interpreter and raise his eyebrows
to pick various letters. And then and keep
-
in mind, this is like the 60s or 70s,
right? So computers were not really where
-
they are today. Later he would operate a
switch with one hand, just like on off on
-
off, kind of morse code and select from a
bank of words. And that was around 15
-
words per minute. Eventually, he was
unable to move his hand, so a team of
-
engineers from Intel worked with him and
they figured out, they were trying to do
-
like brain scans and all kinds of stuff.
But again, this was like in the eighties,
-
so there was not not too much they could
do. So they basically just created some
-
custom software to detect muscle movements
in his cheek. And he used that with
-
predictive, predictive words, the same way
that a phone, smartphone keyboard will
-
predict which word you want to say next.
Stephen Hawking, used something similar to
-
that, except instead of swiping on a
phone, he was moving his cheek muscles, so
-
that's obviously a sequence of like highly
customized input mechanisms for, for
-
someone very, very specialized for that
person. I also want to talk about someone
-
else named Professor Sang-Mook Lee, whom
I've met. That was me when I had more of a
-
beard than I do now. He he's a professor
at Seoul National University in South
-
Korea. And he sometimes called like the
Korean Stephen Hawking, because he's a big
-
advocate for people with disabilities.
Anyway, what he uses is you can
-
see a little orange device near his mouth
there. It's called a sip and puff mouse
-
so he can blow into it and suck air
through it and also move it around. And
-
that acts as a mouse cursor on the Android
device in front of him. It will move the
-
cursor around and click when he when he
blows air and so on. So that combined
-
with speech recognition, lets him use
mainstream Android hardware. He still has
-
access to, you know, email apps and like
Web Browsers and like Maps and everything
-
that comes on a normal Android device. So
he's way more capable than Stephen
-
Hawking, as who could, Stephen Hawking
could communicate, but just to a person at
-
a very slow rate. Right. Part of it's due
to the nature of his injury. But it's also
-
a testament to how far the technology has
improved. So let's talk a little bit about
-
what makes good accessibility. I think
performance is very important, right? You
-
want high accuracy. You don't want typos,
low latency. I don't want to speak and
-
then five seconds later have words appear.
It's too long, especially if I have to
-
make corrections. Right. And you want high
throughput, which we already talked about.
-
Oh, I forgot to mention Stephen Hawking
had like 15 words per minute. A normal
-
person speaking is 150. So that's
a big difference. (laughs) The higher
-
throughput you can get, the better. And
for input accessibility, I think and this
-
is not scientific. This is just what I've
learned from using myself and observing
-
many of these systems. I think it's
important to get completeness, consistency
-
and customization. For completeness I
mean, can I do any action? So Stephen or
-
Professor Sang-Mook Lee, his, his orange
mouth input device, the sip and puff is
-
quite powerful, but it doesn't let him do
every action. For example, for some reason
-
when he gets an incoming call, the the
input doesn't work. So he has to call over
-
a person physically to tap the accept call
button or the reject call button, which is
-
really annoying. Right. If you don't have
completeness, you can't be fully
-
independent. Consistency, very important
as well. The same way we develop motor
-
memory for muscle memory, for a keyboard.
You develop memory for any types of
-
patterns that you do. But if the thing you
say or the thing you do keeps changing in
-
order to do the same action. That's not
good. And finally, customization. So the
-
learning curve for beginners is important
for any accessibility device, but
-
designing for expert use is almost more
important because anyone who uses an
-
accessibility interface becomes an expert
at it. The example I like to give is
-
screen readers like a blind person using a
screen reader on a phone. They will crank
-
up the speed at which the speech is being
produced. And I actually met someone who
-
made his speech 16 times faster than
normal human speech. I could not
-
understand it at all, I sound like brbrbrbr, but
he could understand it perfectly. And that's just
-
because he used it so much that he's
become an expert at its use. Let's analyze
-
ergonomic keyboards just for a moment,
because it's fun. You know, they are kind
-
of like a normal keyboard. They'll have a,
you'll have a slow pace when you're
-
starting to learn them. But once you're
good at it, you have very good accuracy,
-
like instantaneous low latency. Right. You
press the key, the computer receives it
-
immediately and very high throughput. It
has high as you are on a regular keyboard.
-
So they're actually fantastic
accessibility devices, right. They're
-
completely compatible with original
keyboards. And if all you need is an
-
ergonomic keyboard, then you're in luck
because it's a very good accessibility
-
device. I'm going to talk about two
things, computers, but also Android
-
devices, so let's start with Android
devices. Yes, the built in voice
-
recognition and Android is really
incredible. So even though the microphones
-
on the devices aren't great, Google has
just collected so much data from so many
-
different sources that they've built like
better than human accuracy for for their
-
voice recognition. The voice accessibility
interface is kind of so so we'll talk
-
about that in a bit. That's the interface
where you can control the Android device
-
entirely by voice. For other input
mechanisms. You could use like a sip and
-
puff device or you could use physical
styluses. That's something that I do a
-
lot, actually, because for me, my fingers
get sore. And if I can hold a stylus in my
-
hand and kind of not use my fingers, then
that's very effective. So and the Elecom
-
styluses from a Japanese company are the
lightest I've found and they don't require
-
a lot of force. So the ones at the top
there are they're like 12 grams and the
-
one on the bottom is 4.7 grams. And you've
got almost no force to use them. So very
-
nice on the left there you can see the
Android speech recognition is built into
-
the keyboard now. Right. You can just
press that and start speaking. It
-
supports different languages, and it's
very accurate, it's very nice. And
-
actually, when I was working at Google for
a bit, I talked to the speech recognition
-
team as like: Why are you doing on
server speech recognition? You should do
-
it on the devices. But of course, Android
devices are, they're all very different
-
and many of them are not very powerful. So
they were having trouble getting
-
satisfactory speech recognition on the
device. So for a long time, there's some
-
server latency, server lag that you do
speech recognition and you wait a bit. And
-
then sometime this year, I just was using
speech recognition and it became so much
-
faster. I was extremely excited and I
looked into it and yeah, they just
-
switched on my device. At least they
switched on the On device speech recognition
-
model. And so now it's incredibly fast and
also incredibly accurate. I'm a huge fan
-
of it. On the right hand side. We can
actually see the voice access interface.
-
So this is meant to allow you to use a
phone entirely by voice. Again, while I
-
was at Google, I tried the the beta
version before it was publicly released
-
and I was like, this is pretty bad, mostly
because it did, it lacked completeness.
-
There would be things on the screen that
would not be selected. So here we see show
-
labels. And then I can I can say like four,
five, six, whatever, to tap on that
-
thing. But as you can see at the bottom,
there was like a Twitter Web app link and
-
there's no number on it. So if I want to
click on that, I'm out of luck. And this
-
is actually a problem in the design of the
accessibility interface that it only, it
-
doesn't expose the full DOM. It exposes
only a subset of it. And so an
-
accessibility mechanism can't ever see
those other things. And furthermore, the
-
way the Google speech recognition works,
they have to reestablish a new connection
-
every 30 seconds. And if you're in the
middle of speaking, it will just throw
-
away whatever you were saying because it
just decided it had to reconnect, which is
-
really unfortunate. They later released
that publicly and then sometime this year
-
they did the update, which is pretty nice.
It now has like a mouse grid, which lets,
-
which solves a lot of the completeness
problems. Like you can, you can use a grid
-
to narrow down somewhere on the screen and
then tap there. But the server issues and
-
the expert use is still not good, like, if
I want to turn it, if I want to do
-
something with the mouse grid, I have to
say "mouse grid on. 6. 5. mouse grid off".
-
And I can't combine those together. So
there's a lot of latency and it's not
-
really that fun to use, but better than
nothing? Absolutely! I just want to really
-
briefly show you as well that this same
feature of like being able to select links
-
on a screen is available on desktops. This
is a plug in for Chrome called Vimium. And
-
it's very powerful because you can then
combine this with keyboards or other input
-
mechanisms. And this one is complete. It
uses the entire DOM and anything you can
-
click on will be highlighted. So very
nice. I just want to give a quick example
-
of me using some of these systems. So I've
been trying to learn Japanese and there's
-
a couple of highly regarded websites for
this, but they're not consistent. When I
-
use the browser show labels like, you
know, the thing to press next page or
-
something like that or like, you know, I
give up or whatever it is, it keeps
-
changing. So the letters that are being
used keep changing. And that's because of
-
the dynamic way that they're generating
the HTML. So not really very useful. What
-
I do instead is I use a program called
Anki and that has very simple shortcuts in
-
its desktop app. One, two, three, four. So
it's nice to use and consistent and it's
-
syncs with an Android app and then I can
use my stylus on the Android device. So it
-
works pretty well. But even so, as you can
see from the chart in the bottom there,
-
there are many days when I can't use this,
even though I would like to, because I've
-
overused my hands or overused my voice.
When I'm using voice recognition all day,
-
every day, I do tend to lose my voice. And
as you can see from the graph, sometimes I
-
lose it for a week or two at a time. So
same thing with any accessibility
-
interface, you know, you've got to use
many different techniques and it's always,
-
it's never perfect is just the best you
can do at that moment. Something else I
-
like to do is read books. I read a lot of
books and I love e-book readers, the
-
dedicated e-ink displays. You can read them
in sunlight, they last forever, battery
-
wise. Unfortunately, it's hard to add other
input mechanisms to them. They don't have
-
microphones or other sensors and you can't
really install custom software on them.
-
But for Android based devices and there
are also like e-book reading apps for
-
Android devices, they have everything you
can install custom software and they have
-
microphones and many other sensors. So I
made two apps that allow you to read
-
e-books with an e-book reader. The first
one is Voice Next Page. It's based on one
-
of my speech recognition engines called
Silvius, and it does do server based
-
recognition. So you have to capture all
the audio, use 300 kilobits a second to
-
send it to the server and recognize things
like next page, previous page. However, it
-
doesn't cut out every 30 seconds. It keeps
going. So that's that's one win for it I
-
guess. And it is published in the Play
store. Huge thanks to Sarah Leventhal, who
-
did a lot of the implementation. Very
complicated to make an accessibility app
-
on Android. But we persevered and it works
quite nicely. So I'm going to actually
-
show you an example of voice next page.
This over here is my phone on the left
-
hand side just captured so that you guys
can see it. So here's the Voice Next Page.
-
And basically the connection is green. I
can do, the server is up and running and
-
so on. I just press start and then I'll
switch to an Android reading app and say,
-
next page, previous page. I won't speak
otherwise because it will chapel
-
everything I'm saying.
-
Next Page
-
Next Page
-
Previous Page
-
Center
-
Center
-
Foreground
-
Stop listening
-
So that's a demo of
The Voice Next Page, and it's
-
extremely helpful. I built it a couple of
years ago along with Sarah, and I use it a
-
lot. So, yeah, you can go ahead and
download it if you guys wanna try it out.
-
And the other one is called Blink Next
Page. So the idea for this, I got this
-
idea from a research paper this year that
was studying eyelid gestures. I didn't use
-
any of their code, but it's a great idea.
So the way this works is you detect blinks
-
by using the Android camera and then you
can trigger an action like turning pages
-
in an e-book reader. This actually doesn't
need any networking. It's able to use the
-
on device face recognition models from
Google, and it is still under development.
-
So it's not on the play store yet, but it
is working. And, you know, please contact
-
me if you want to try it. So just give me
one moment to set that demo up here. So
-
I'm going to use... The main problem with
this current implementation is that it
-
uses two devices. So that was easier to
implement. And I use two devices anyway.
-
But obviously I want a one device version
if I'm actually going to use it for
-
anything. So here's how this works. This
device I point at me, at my eyes, the
-
other device I put wherever it's
convenient to read, ups sorry, and if I blink
-
my eyes, the phone will buzz once it
detects that I blink my eyes and it will
-
turn the page automatically on the other
Android device. Now I have to blink both
-
my eyes for half a second. If I want to go
backwards, I can blink just my left eye.
-
And if I want to go forwards like quickly,
I can blink my right eye and hold it. (background buzzing)
-
Anyway, it does have some false positives.
That's why like you can go backwards in
-
case it detects that you've accidentally
flipped the page. And lighting is also
-
very important. Like if I have a light
behind me, then this is not going to be
-
able to identify whether my eyes are open
or closed properly. So it has some
-
limitations, but very simple to use. So
I'm a big fan. OK, so that's enough about
-
Android devices, let's talk very briefly
about desktop computers. So if you're
-
going to use a desktop computer, of
course, try using that show labels plugin
-
in a browser. For native apps you can try
Dragon NaturallySpeaking, which is fine if
-
you're just like using basic things. But
if you're trying to do complicated things,
-
you should definitely use a voice coding
system. You could also consider using eye
-
tracking to replace a mouse. I personally,
I don't use that. I find it hurts my eyes,
-
but I do use a trackball with very little
force and a wacom tablet. Some people will
-
even scroll up and down by humming, for
example, but I don't have that setup.
-
There's a bunch of nice talks out there on
voice coding. The top left is Tavis Rudds
-
talk from many years ago that got many of
us interested. Emily Shea gave a talk
-
there about best practices for voice
coding. And then I gave a talk a couple of
-
years ago at the Hope 11 conference, which
you can also check out. It's mostly out of
-
date by now, but it's still interesting.
So there are a lot of voice coding
-
systems, the sort of grandfather of them
all is Dragonfly. It's become a grammar
-
standard. Caster is if you're willing to
memorize lots of unusual words, you can
-
become much better, much faster than I
currently am at voice coding. aenea is how
-
you originally used Dragon to work on a
Linux machine, for example, because Dragon
-
only runs on Windows. Talon is a closed
source program, which is, but it's very
-
powerful. Has a big user base, especially
for Mac OS. There are ports now. And Talon
-
used to use Dragon, but it's now using a
speech system from Facebook. Silvius is
-
the system that I created, the models are
not very accurate, but it's a nice
-
architecture where there's client- server,
so it makes it easy to build things like
-
the voice next page. So Voice next page
was using Silvius. And then the the most
-
recent one I think on this list is kaldi-
active-grammar, which is extremely
-
powerful and extremely customizable. And
it's also open source. It works on all
-
platforms. So I really highly recommend
that. So let's talk a bit more about
-
kaldi-active-grammar. But first, for voice
coding, I've already mentioned, you have
-
to be careful how you use your voice
right. Breathe from your belly. Don't
-
tighten your muscles and breathe from your
chest. Try to speak normally. And I'm not
-
particularly good at this. Like you'll
hear me when I'm speaking commands that my
-
inflection changes. So I do tend to
overuse my voice, but you just have to be
-
conscious of that. The microphone hardware
does matter. I do recommend like a blue
-
yeti on a microphone arm that you can pull
and put close to your face like this. I
-
will use this one for my speaking demo
and. Yeah. And the other thing is your
-
grammar is fully customizable. So if you
keep saying a word and the system doesn't
-
recognize it, just change it to another
word. And it's complete in the sense you
-
can type any key on the keyboard. And the
most important thing for expert use or
-
customizability is that you can do
chaining. So with the voice coding system,
-
you can say multiple commands at once. If
there's, and it's a huge time saving,
-
you'll see what I mean when I give a quick
demo. When I do voice coding, I'm a very
-
heavy vim and tmux user. You know, there
have been I've worked with many people
-
before, so I have some cheat sheet
information there. So if you're
-
interested, you can go check that out. But
yeah, let's just do a quick demo of voice
-
coding here. "Turn this mic on". "Desk left
two". "Control delta", "open new terminal".
-
"Charlie delta space slash tango mike papa
enter". "Command vim". "Hotel hotel point
-
charlie papa papa, enter". "India , hash
word include space langel", "india oscar word
-
stream rangel, enter, enter", "india noi
tango space word mean", "no mike arch india
-
noi space len ren space lace enter enter
race up tab word print fox scratch nope code
-
standard charlie oscar uniform tango space
langel langel space quote. Sentence hello,
-
voice coding bang, scratch six delta india
noi golf, bang, backslash, noi quote
-
semicolon act sky fox mike romeo noi oscar
word return space number zero semicolon
-
act vim save and quit. Golf plus plus
space hotel hotel tab minus oscar space
-
hotel hotel enter. Point slash hotel hotel
enter. Desk right. So that's just a quick
-
example of voice coding, you can use it to
write any programing language, you can use
-
it to control anything on your desktop.
It's very powerful. It has a bit of a
-
learning curve, but it's very powerful. So
the creator of kaldi-active-grammar is
-
also named David. I'm named David, but
just a coincidence. And he says of kaldi-
-
active-grammar, that I haven't typed with
the keyboard in many years and kaldi-
-
active-grammar is bootstrapped in that I
have been developing it entirely using the
-
previous versions of it. So, David has a
medical condition that means he has very
-
low dexterity, so it's hard for him to use
a keyboard. And yet he basically got
-
kaldi-active-grammar working through the
skin of his teeth or something and then
-
continues to develop it using it. And
yeah, I'm a huge fan of the project. I
-
haven't contributed much, but I did give
some of the hardware resources like GPU
-
and CPU compute resources to allow
training to happen. But I would also like
-
to show you a video of David using kaldi-
active-grammar, just, so you can see it as
-
well. So, the other thing about David is,
that he has a speech impediment or a
-
speech, I don't know, an accent or
whatever. So it's difficult to, for a
-
normal speech recognition system, to
understand him. And you might have trouble
-
understanding him here. But you can see in
the lower right, what the speech system
-
understands what he's saying. Oh, I
realized, that I do need to switch
-
something in OBS, so that you guys can
hear it. Sorry. There you go.
-
(Other) David using kaldi-active-grammar system (not understandable)
-
Here, you get the idea and hopefully, you
-
guys were able to hear that. If not, you
can also find this on the website that I'm
-
going to show you at the end. One other
thing, I want to show you about this is,
-
David has actually set up this humming to
scroll, which I think is pretty cool. Of
-
course, I've gone and turned off the OBS
there. But he's just doing hmmm and it's
-
understanding that and scrolling down. So,
something that I'm able to do with my
-
trackball, but he's using his voice for,
so pretty cool. So I'm almost done here.
-
In summary, good input accessibility means
you need completeness, consistency and
-
customization. You need to be able to do
any action that you could do with the
-
other input mechanisms. And doing the same
input should have the same action. And
-
remember, your users will become experts,
so the system needs to be designed for
-
that. For e-book reading: Yes, I'm trying
to allow anyone to read, even if they're
-
experiencing some severe physical or motor
impairment, because I think that gives you
-
a lot of power to be able to turn the
pages and read your favorite books. And
-
for speech recognition, yeah, Android
speech recognition is very good. Silvius
-
accuracy is not so good, but it's easy to
use quickly for experimentation and to
-
make other types of things like Voice Next
Page. And please do check out kaldi-
-
active-grammar if you have some serious
need for voice recognition. Lastly, I put
-
all of this onto a website, voxhub.io, so
you can see Voice Next Page, Blink Next
-
Page, kaldi-active-grammar and so on, just
instructions for how to use it and how to
-
set it up. So please do check that out.
And tons of acknowledgments, lots of
-
people that have helped me along the way,
but I want to especially call out
-
Professor Sang-Mook Lee, who actually
invited me to Korea a couple of times to
-
give talks - a big inspiration. And of
course, David Zurow, who has actually been
-
able to bootstrap into a fully voice
coding environment. So that's all I have
-
for today. Thank you very much.
-
Herald: Alright, I suppose I'm back on the
air, so let me see. I want to remind
-
everyone before we go into the Q&A that
you can ask your questions for this talk
-
on IRC, the link is under the video, or
you can use Twitter or the Fediverse with
-
the hashtag #rc3two. Again, I'll hold it
up here, "rc3two".
-
Thanks for your talk, David. That was
really interesting. Thanks for talk,
-
David. I, yeah, I think we have a couple
of questions from the Signal Angels.
-
Before that, I just wanted to say I've
recently spent some time playing with a
-
like the VoiceOver system in iOS and that
can now actually tell you what is on a
-
photo, which is kind of amazing. Oh, by
the way, I can't hear you here on on the
-
Mumble.
David: Yeah. Sorry, I wasn't saying
-
anything. Yeah, no, it's so I focused
mostly on input accessability, right?
-
Which is like how do you get data to the
computer. But there's been huge
-
improvements in the other way around as
well, right? The computer doing VoiceOver
-
things.
Herald: So we have about let's see,
-
five-six minutes left at least for Q&A. We
have a question by Toby++, he asks: "Your
-
next page application looks cool. Do you
have statistics of how many people use it
-
or found it on the App Store?"
David: Not very many. The Voice Next Page
-
was advertised only so far as a little
academic poster. So I've gotten a few
-
people to use it. But I run eight
concurrent workers and we've never hit
-
more than that. (laughs) So not super popular,
but I do hope that some people will see it
-
because of this talk and go and check out.
Herald: That's cool. Next question. How
-
error prone are the speech recognition
systems at all? E.g., can you do coding
-
while doing workouts?
David: So one thing about speech
-
recognition is very sensitive to the
microphone, so when you're doing it
-
Technical malfunction. We'll be back soon.
-
David (cont.): Any mistakes, right?
-
That's the thing about having low latency,
you just say something and you watch it
-
and you make sure that it was what you
wanted to say. I don't know exactly how
-
many words per minute I can say with voice
coding, but I can say it much faster than
-
regular speech. So I'd say at least like
200, maybe 300 words per minute.
-
So it's actually a very high bandwidth
mechanism.
-
Herald: That's really awesome. A question from
peppyjndivos: "Any advice for software
-
authors to make their stuff more
accessible?"
-
David: There are good web accessibility
guidelines. So if you're just making a
-
website or something, I would definitely
follow those. They tend to be focused more
-
on people that are blind because that is,
you know, it's more of an obvious fail.
-
like they just can't interact at all with
your website. But things like, you know,
-
if Duolingo, for example, had used the
same, like, the same accessibility access
-
tag on their, like, next button, then they
would always be the same letter for me and
-
I wouldn't have to be like Fox-Charlie ,
Fox-Delta, Fox-something - changes all the
-
time. So I think consistency is very
important. And integrating with any
-
existing accessibility APIs is also a very
important - Web APIs, Android APIs and so
-
on, because, you know, we can't make every
program out there like voice compatible.
-
We just have to meet in the middle where
they interact at the keyboard layer or the
-
accessibility layer.
Herald: Awesome. AmericN has a question,
-
wonders if these systems use similar
approaches like stenography with mnemonics
-
or if there's any projects working having
that in mind.
-
David: A very good question. So, the first
thing everyone uses is the NATO phonetic
-
alphabet to spell letters, for example,
Alpha. Bravo, Charlie. Some people then
-
will substitute letters for things that
are too long, like November. I use noi.
-
Sometimes the speech system doesn't
understand you. Whenever I said Alpha,
-
Dragon was like, oh, you're saying
"offer". So I changed it. It's Arch for
-
me, Arch, Brav, Char. So, and also most of
these grammars are in a common grammar
-
format. They are written in Python and
they're compatible with Dragonfly. So you
-
can grab a grammar for, I don't know, for
Aenea and get it to work with kaldi-
-
active-grammar with very little effort. I
actually have a grammar that works on both
-
Aenea and kaldi-active-grammar, and that's
what I use. So there's a bit of lingua
-
franca, I guess, you can kind of guess
what other people are using. But at the
-
same time there's a lot of customization,
you know, because people change words,
-
they add their own commands, they change
words based on what the speech system
-
understands.
Herald: Alright, LEB asks, is there an online
-
community you can propose for
accessibility technologies?
-
David: There's an amazing forum for anything
related to voice coding. All the
-
developers of new voice coding software
are there. Sorry, I just need to drink. So
-
it's a really fantastic resource. I do
link to it from voxhub.io. I believe it's
-
at the bottom of the kaldi-active-grammar
page. So you can definitely check that
-
out. For general accessibility, I don't
know, I could recommend the accessibility
-
mailing list at Google, but that's only if
you work at Google. Other than that, yeah,
-
I think it depends on your community,
right? I think if you're looking for web
-
accessibility, you could go for some
Mozilla mailing list and so on. If you're
-
looking for desktop accessibility, then
maybe you could go find some stuff about
-
the Windows Speech API. unintelligible
Herald: One last question from Joe Neilson.
-
Could there be legal issues if you make an
e-book into audio? I'm not sure what that
-
refers to.
David: Yeah. So if you are like doing, if
-
you're using a screen reader and you're
like, you try to get it to read out the
-
contents of an e-book, right? So most,
most of the time there are fair use
-
exceptions for copyright law, even in the
US, and making a copy yourself for
-
personal purposes so that you can access
it is usually considered fair use. If you
-
were trying to commercialize it or make
money off of that or like, I don't know,
-
you're a famous streamer and all you do is
highlight text and have it read it out,
-
then maybe, but I would say that
definitely falls under fair use.
-
Herald: Alright. So I guess that's it for
the talk. I think we're hitting the timing
-
mark really well. Thank you so much,
David, for that. That was really, really
-
interesting. I learned a lot and thanks
everyone for watching and stay on. I think
-
there might be some news coming up. Thanks
and everyone.
-
rc3 postroll music
-
Subtitles created by c3subtitles.de
in the year 2020. Join, and help us!