WEBVTT 00:00:06.361 --> 00:00:08.084 Marita Cheng: When I was growing up, 00:00:08.085 --> 00:00:11.595 I had a family friend who became blind in his 20s. 00:00:12.438 --> 00:00:15.602 When we went out as a family, he would say to me, 00:00:15.603 --> 00:00:19.881 "Rita hold my hand, hold my arm, and tell me what you see." 00:00:19.882 --> 00:00:23.682 So I'd say, "There's some flowers here to the left. 00:00:23.683 --> 00:00:25.681 There's a gate here to the right. 00:00:25.682 --> 00:00:28.212 There's a mountain over in the distance." 00:00:28.213 --> 00:00:31.493 And he would say, "What color are those flowers? 00:00:32.403 --> 00:00:34.753 Can I use my hand, reach out, and touch them? 00:00:34.754 --> 00:00:36.322 Could you lead my hand to that?" 00:00:36.323 --> 00:00:39.622 I'd say, "Oh, they're pink, they're blue." 00:00:39.623 --> 00:00:42.833 And he'd say, "Tell me more, tell me more about what you can see. 00:00:42.834 --> 00:00:44.062 Share it with me." 00:00:44.063 --> 00:00:48.592 About eight months ago, Alberto and I decided to create an app 00:00:48.593 --> 00:00:52.663 to enable blind people to recognize their surroundings. 00:00:52.667 --> 00:00:55.959 We used something called convolutional neural networks, 00:00:55.972 --> 00:01:00.283 which is a computer system that's been trained on millions of images. 00:01:00.284 --> 00:01:04.242 It learns the features of a dog. It learns what a flower looks like. 00:01:04.244 --> 00:01:07.964 It learns a fork, a knife, everyday objects. 00:01:09.667 --> 00:01:14.418 Using this system, we created something called "Aipoly" 00:01:14.454 --> 00:01:17.734 that recognizes over 1,000 everyday objects. 00:01:17.735 --> 00:01:21.598 So, a blind person just needs to walk around with their phone, 00:01:21.599 --> 00:01:26.924 and put it over various objects, and it will say the name of the object. 00:01:27.444 --> 00:01:32.773 Using voice over, the phone can relay the word on the screen 00:01:32.774 --> 00:01:34.113 to that blind person, 00:01:34.114 --> 00:01:36.316 so they know exactly what's in front of them. 00:01:36.317 --> 00:01:38.193 (Applause) 00:01:46.103 --> 00:01:49.103 Since we released our application in January, 00:01:49.104 --> 00:01:52.634 we've had over a 100,000 downloads around the world. 00:01:53.536 --> 00:01:58.444 The app has been so popular we've translated it into seven languages. 00:01:59.167 --> 00:02:02.376 Alberto Rizzoli: After experiencing the technology the first time, 00:02:02.377 --> 00:02:04.764 our users kept asking us for more. 00:02:05.555 --> 00:02:09.384 We asked them 00:02:09.386 --> 00:02:14.225 to think of our technology as a superpower for a moment, 00:02:14.226 --> 00:02:19.425 something they could effortlessly evoke at any time, 00:02:19.426 --> 00:02:22.346 and gain understanding of what was in front of them. 00:02:23.080 --> 00:02:25.705 And surprisingly, 00:02:25.706 --> 00:02:29.545 nobody really wants X-ray vision or telescopic goggles, 00:02:29.546 --> 00:02:33.166 but what everyone wants is more information. 00:02:33.167 --> 00:02:34.495 It's not surprising 00:02:34.496 --> 00:02:40.134 because 60% of the information that we perceive comes through sight. 00:02:40.135 --> 00:02:41.846 It is the main tool that we use 00:02:41.847 --> 00:02:45.994 to understand our surroundings and often, to make decisions. 00:02:45.995 --> 00:02:50.356 If you're blind you must rely on other senses like touch or hearing, 00:02:50.357 --> 00:02:53.465 and you miss out on the lightning-fast identification 00:02:53.466 --> 00:02:58.717 that our brain and eyes do every second of every day, if you're a sighted person. 00:02:59.077 --> 00:03:02.154 We went 00:03:02.155 --> 00:03:05.024 to the Santa Clara Valley Blind Center, 00:03:05.025 --> 00:03:07.606 and we tried to build this superpower. 00:03:07.607 --> 00:03:10.577 We tried to see what kind of information people wanted, 00:03:10.578 --> 00:03:13.565 and it's simple things like whether a dish is clean or not, 00:03:13.566 --> 00:03:17.655 whether you can cross the street, what product am I looking at? 00:03:17.656 --> 00:03:19.396 Things that can lead to a decision, 00:03:19.397 --> 00:03:23.797 from a simple gaze, to an understanding of the situation in front of you. 00:03:24.557 --> 00:03:28.887 We asked what form factor people preferred, and we built it. 00:03:29.707 --> 00:03:31.485 So we put together 00:03:31.486 --> 00:03:37.147 some jawbone conductive headphones, a pair of sunglasses, and a tiny camera, 00:03:38.177 --> 00:03:42.046 and we asked our friends to think of a common situation 00:03:42.047 --> 00:03:45.056 in which they had to make many small decisions, 00:03:45.057 --> 00:03:47.857 and we told them we will be giving them the prototype, 00:03:47.858 --> 00:03:50.166 and taking them in the middle of that situation. 00:03:50.167 --> 00:03:52.003 Let's see how it went. (Video starts) 00:03:52.004 --> 00:03:53.351 [We asked blind individuals 00:03:53.352 --> 00:03:56.089 [what are the hardest things to do when visually impaired] 00:03:56.090 --> 00:03:58.656 I mean it takes me forever to go grocery-shopping. 00:03:58.657 --> 00:04:02.316 Even with someone helping me that has known me for years. 00:04:02.317 --> 00:04:05.827 I'll say, "What's in that cabinet?" 00:04:07.097 --> 00:04:10.397 Or develop a system, right to left, top to bottom. 00:04:10.398 --> 00:04:13.058 [So we took them grocery shopping with our technology] 00:04:13.358 --> 00:04:14.728 Computer: Oranges. 00:04:16.048 --> 00:04:18.618 Man: This is great. I'm really liking it. 00:04:19.567 --> 00:04:23.237 Apples, grapes, carrots. 00:04:23.238 --> 00:04:25.626 I'm looking, I'm looking. 00:04:25.627 --> 00:04:27.576 Computer: Lilies. Man: Lilies. 00:04:27.577 --> 00:04:31.317 Computer: Bouquet. Man: A bouquet, ahh! 00:04:31.318 --> 00:04:33.211 Computer: Roses, flowers. 00:04:33.212 --> 00:04:35.177 Man: Can I take these home? 00:04:35.178 --> 00:04:36.778 This is great. 00:04:36.792 --> 00:04:38.710 Computer: Roses. Man: Roses. 00:04:38.711 --> 00:04:41.367 Computer: Bouquet, tulips. Man: Tulips. 00:04:41.368 --> 00:04:44.019 Computer: Pineapple. Woman: It's a pineapple. 00:04:45.089 --> 00:04:46.830 Computer: Mango. Woman: Mango. 00:04:46.831 --> 00:04:48.389 Computer: M&Ms. Woman: M&Ms. 00:04:48.375 --> 00:04:50.750 Computer: Tic Tac. Woman: It just said, "Tic Tac." 00:04:50.759 --> 00:04:52.377 Computer: Tic Tac. Woman: Tic Tac. 00:04:52.378 --> 00:04:54.309 Computer: Paper note, calendar. 00:04:54.310 --> 00:04:55.719 Woman: Calendar, you got it. 00:04:56.329 --> 00:04:59.318 Wow, I didn't know what that was at all. 00:04:59.319 --> 00:05:01.288 Computer: Pretzels. Woman: Pretzels. 00:05:01.289 --> 00:05:03.609 Computer: Pretzels. Woman: It said, "Pretzels." 00:05:03.610 --> 00:05:05.409 Computer: Lipton tea. Woman 2: Lipton? 00:05:05.410 --> 00:05:06.589 Tea? 00:05:06.590 --> 00:05:08.548 Computer: Lipton teabags. 00:05:08.549 --> 00:05:12.209 Woman 2: It's like I'm seeing it, but I'm not, it's seeing it for me. 00:05:12.210 --> 00:05:14.971 Computer: Coffee mate. Woman 2: Mate; coffee mate. 00:05:16.281 --> 00:05:18.739 It didn't say "coffee," but it kept saying "mate." 00:05:18.740 --> 00:05:21.369 Computer: Mate, mate. 00:05:22.190 --> 00:05:24.731 Man 2: I put on the glasses, 00:05:24.732 --> 00:05:27.869 and right away, it told me there was an apple, there were oranges, 00:05:27.870 --> 00:05:32.561 and there was this, and there was that, and it's like, "This is great!" 00:05:33.420 --> 00:05:34.799 Instant love. 00:05:34.800 --> 00:05:35.966 (Video ends) 00:05:35.967 --> 00:05:37.301 (Applause) 00:05:45.251 --> 00:05:48.759 AR: That little pair of glasses connected to their phones 00:05:48.760 --> 00:05:52.589 could identify four to 5,000 objects in real time. 00:05:52.590 --> 00:05:56.080 That's about the capacity of a five-year-old child. 00:05:57.440 --> 00:06:03.519 A simple accessory can now expand a person's perception 00:06:03.520 --> 00:06:05.900 to thousands of new possibilities. 00:06:06.630 --> 00:06:11.456 This is the power of marrying artificial and human intelligence, 00:06:12.316 --> 00:06:14.666 and the potential is still vastly untapped. 00:06:14.667 --> 00:06:18.209 This isn't going to be a revolution just because GPUs are getting faster, 00:06:18.210 --> 00:06:20.550 or the research is getting more open, 00:06:20.551 --> 00:06:24.250 but because the barriers of entry to impacting millions of lives 00:06:24.251 --> 00:06:27.472 for artificial intelligence are getting lower and lower. 00:06:28.142 --> 00:06:31.110 The Paralympic games are starting in a few weeks, 00:06:31.111 --> 00:06:36.731 an event where sheer force of will, training, and technology 00:06:36.732 --> 00:06:40.122 turn people with a disability into super humans. 00:06:40.532 --> 00:06:46.961 and so, too, will all ability to think, perceive, make decisions, and learn 00:06:46.962 --> 00:06:48.951 increase exponentially. 00:06:48.952 --> 00:06:52.472 You will be building the tools to make this happen. 00:06:54.032 --> 00:06:57.291 So tomorrow, with your morning coffee, 00:06:57.292 --> 00:07:00.652 take 40 minutes and try out a tutorial on deep learning. 00:07:01.212 --> 00:07:03.232 Build yourself a small superpower. 00:07:03.233 --> 00:07:05.902 All it takes is your laptop, and a bunch of data, 00:07:05.903 --> 00:07:07.763 like your holiday pictures. 00:07:08.063 --> 00:07:12.593 Superpower engineer - that's a great dream job. 00:07:14.073 --> 00:07:18.283 The good news is that the world needs many, many more of them 00:07:18.632 --> 00:07:22.303 so we can't wait to see what you will be building next. 00:07:22.603 --> 00:07:23.721 Thank you. 00:07:23.722 --> 00:07:25.114 (Applause)