1 00:00:00,760 --> 00:00:03,816 If you ask evolutionary biologists 2 00:00:03,840 --> 00:00:07,136 when did humans become humans, 3 00:00:07,160 --> 00:00:09,256 some of them will say that, 4 00:00:09,280 --> 00:00:12,176 well, at some point we started standing on our feet, 5 00:00:12,200 --> 00:00:15,520 became biped and became the masters of our environment. 6 00:00:16,880 --> 00:00:22,176 Others will say that because our brain started growing much bigger, 7 00:00:22,200 --> 00:00:25,280 that we were able to have much more complex cognitive processes. 8 00:00:26,680 --> 00:00:31,056 And others might argue that it's because we developed language 9 00:00:31,080 --> 00:00:33,360 that allowed us to evolve as a species. 10 00:00:34,960 --> 00:00:39,496 Interestingly, those three phenomena are all connected. 11 00:00:39,520 --> 00:00:42,296 We are not sure how or in which order, 12 00:00:42,320 --> 00:00:43,856 but they are all linked 13 00:00:43,880 --> 00:00:47,640 with the change of shape of a little bone in the back of your neck 14 00:00:48,840 --> 00:00:52,000 that changed the angle between our head and our body. 15 00:00:52,760 --> 00:00:55,816 That means we were able to stand upright 16 00:00:55,840 --> 00:00:58,976 but also for our brain to evolve in the back 17 00:00:59,000 --> 00:01:04,616 and for our voice box to grow from seven centimeters for primates 18 00:01:04,640 --> 00:01:07,760 to 11 and up to 17 centimetres for humans. 19 00:01:09,080 --> 00:01:11,480 And this is called the descent of the larynx. 20 00:01:12,280 --> 00:01:14,240 And the larynx is the site of your voice. 21 00:01:16,160 --> 00:01:20,320 When baby humans are born today, their larynx is not descended yet. 22 00:01:21,160 --> 00:01:23,480 That only happens at about three months old. 23 00:01:25,040 --> 00:01:27,296 So, metaphorically, each of us here 24 00:01:27,320 --> 00:01:30,560 has relived the evolution of our whole species. 25 00:01:32,880 --> 00:01:34,136 And talking about babies, 26 00:01:34,160 --> 00:01:36,800 when you were starting to develop in your mother's womb, 27 00:01:37,600 --> 00:01:41,816 the first sensation that you had coming from the outside world, 28 00:01:41,840 --> 00:01:45,776 at only three weeks old, when you were about the size of a shrimp, 29 00:01:45,800 --> 00:01:48,496 were through the tactile sensation 30 00:01:48,520 --> 00:01:51,320 coming from the vibrations of your mother's voice. 31 00:01:52,440 --> 00:01:57,416 So, as we can see, the human voice is quite meaningful and important 32 00:01:57,440 --> 00:02:00,416 at the level of the species, 33 00:02:00,440 --> 00:02:01,976 at the level of the society -- 34 00:02:02,000 --> 00:02:04,936 this is how we communicate and create bonds, 35 00:02:04,960 --> 00:02:08,416 and at the personal and interpersonal levels -- 36 00:02:08,440 --> 00:02:11,576 with our voice, we share much more than words and data, 37 00:02:11,600 --> 00:02:13,736 we share basically who we are. 38 00:02:13,760 --> 00:02:18,136 And our voice is indistinguishable from how other people see us. 39 00:02:18,160 --> 00:02:20,080 It is a mask that we wear in society. 40 00:02:22,000 --> 00:02:24,810 But our relationship with our own voice is far from obvious. 41 00:02:25,840 --> 00:02:30,616 We rarely use our voice for ourselves; we use it as a gift to give to others. 42 00:02:30,640 --> 00:02:33,816 It is how we touch each other. 43 00:02:33,840 --> 00:02:35,776 It's a dialectical grooming. 44 00:02:35,800 --> 00:02:38,216 But what do we think about our own voice? 45 00:02:38,240 --> 00:02:39,976 So please raise your hand 46 00:02:40,000 --> 00:02:44,016 if you don't like the sound of your voice when you hear it on a recording machine. 47 00:02:44,040 --> 00:02:45,256 (Laughter) 48 00:02:45,280 --> 00:02:46,496 Yeah, thank you, indeed, 49 00:02:46,520 --> 00:02:50,416 most people report not liking the sound of their voice recording. 50 00:02:50,440 --> 00:02:51,656 So what does that mean? 51 00:02:51,680 --> 00:02:54,376 Let's try to understand that in the next 10 minutes. 52 00:02:54,400 --> 00:02:57,496 I'm a researcher at the MIT Media Lab, 53 00:02:57,520 --> 00:02:59,360 part of the Opera of the Future group, 54 00:03:00,200 --> 00:03:03,256 and my research focuses on the relationship 55 00:03:03,280 --> 00:03:06,360 people have with their own voice and with the voices of others. 56 00:03:07,760 --> 00:03:11,816 I study what we can learn from listening to voices, 57 00:03:11,840 --> 00:03:13,216 from the various fields, 58 00:03:13,240 --> 00:03:17,120 from neurology to biology, cognitive sciences, linguistics. 59 00:03:18,680 --> 00:03:21,736 In our group we create tools and experiences 60 00:03:21,760 --> 00:03:26,496 to help people gain a better applied understanding of their voice 61 00:03:26,520 --> 00:03:29,576 in order to reduce the biases, 62 00:03:29,600 --> 00:03:31,696 to become better listeners, 63 00:03:31,720 --> 00:03:34,696 to create more healthy relationships 64 00:03:34,720 --> 00:03:36,720 or just to understand themselves better. 65 00:03:38,240 --> 00:03:42,520 And this really has to come with a holistic approach on the voice. 66 00:03:43,640 --> 00:03:47,256 Because, think about all the applications and implications 67 00:03:47,280 --> 00:03:50,160 that the voice may have, as we discover more about it. 68 00:03:51,160 --> 00:03:54,296 Your voice is a very complex phenomenon. 69 00:03:54,320 --> 00:03:57,640 It requires a synchronization of more than 100 muscles in your body. 70 00:03:58,480 --> 00:04:00,856 And by listening to the voice, 71 00:04:00,880 --> 00:04:05,656 we can understand possible failures of what happens inside. 72 00:04:05,680 --> 00:04:06,880 For example: 73 00:04:07,840 --> 00:04:11,336 listening to very specific types of turbulences 74 00:04:11,360 --> 00:04:13,656 and nonlinearity of the voice 75 00:04:13,680 --> 00:04:17,136 can help predict very early stages of Parkinson's, 76 00:04:17,160 --> 00:04:18,560 just through a phone call. 77 00:04:19,519 --> 00:04:22,140 Listening to the breathlessness of the voice 78 00:04:22,170 --> 00:04:23,950 can help detect heart disease. 79 00:04:25,880 --> 00:04:30,496 And we also know that the changes of tempo inside individual words 80 00:04:30,520 --> 00:04:32,800 is a very good marker of depression. 81 00:04:34,320 --> 00:04:37,376 Your voice is also very linked with your hormone levels. 82 00:04:37,400 --> 00:04:40,016 Third parties listening to female voices 83 00:04:40,040 --> 00:04:43,056 were able to very accurately place the speaker 84 00:04:43,080 --> 00:04:44,400 on their menstrual cycle. 85 00:04:45,560 --> 00:04:47,080 Just with acoustic information. 86 00:04:48,800 --> 00:04:52,376 And now with technology listening to us all the time, 87 00:04:52,400 --> 00:04:55,376 Alexa from Amazon Echo 88 00:04:55,400 --> 00:04:57,816 might be able to predict if you're pregnant 89 00:04:57,840 --> 00:04:59,536 even before you know it. 90 00:04:59,560 --> 00:05:00,896 So think about -- 91 00:05:00,920 --> 00:05:02,096 (Laughter) 92 00:05:02,120 --> 00:05:04,280 Think about the ethical implications of that. 93 00:05:05,720 --> 00:05:08,816 Your voice is also very linked to how you create relationships. 94 00:05:08,840 --> 00:05:12,216 You have a different voice for every person you talk to. 95 00:05:12,240 --> 00:05:15,736 If I take a little snippet of your voice and I analyze it, 96 00:05:15,760 --> 00:05:19,136 I can know whether you're talking to your mother, to your brother, 97 00:05:19,160 --> 00:05:20,776 your friend or your boss. 98 00:05:20,800 --> 00:05:25,976 We can also use, as a predictor, the vocal posture. 99 00:05:26,000 --> 00:05:29,936 Meaning, how you decide to place your voice when you talk to someone. 100 00:05:29,960 --> 00:05:33,176 And you vocal posture, when you talk to your spouse, 101 00:05:33,200 --> 00:05:36,880 can help predict not only if, but also when you will divorce. 102 00:05:38,560 --> 00:05:41,000 So there is a lot to learn from listening to voices. 103 00:05:42,080 --> 00:05:44,456 And I believe this has to start with understanding 104 00:05:44,480 --> 00:05:46,496 that we have more than one voice. 105 00:05:46,520 --> 00:05:50,376 So, I'm going to talk about three voices that most of us posses, 106 00:05:50,400 --> 00:05:52,720 in a model of what I call the mask. 107 00:05:53,520 --> 00:05:55,496 So when you look at the mask, 108 00:05:55,520 --> 00:05:57,800 what you see is a projection of a character. 109 00:05:58,360 --> 00:06:00,496 Let's call that your outward voice. 110 00:06:00,520 --> 00:06:03,336 This is also the most classic way to think about the voice, 111 00:06:03,360 --> 00:06:06,256 it's a way of projecting yourself in the world. 112 00:06:06,280 --> 00:06:09,536 The mechanism for this projection is well understood. 113 00:06:09,560 --> 00:06:11,896 Your lungs contract your diaphragm 114 00:06:11,920 --> 00:06:15,176 and that creates a self-sustained vibration of your vocal fold, 115 00:06:15,200 --> 00:06:16,696 that creates a sound. 116 00:06:16,720 --> 00:06:19,896 And then the way you open and close the cavities in you mouth, 117 00:06:19,920 --> 00:06:22,616 your vocal tract is going to transform the sound. 118 00:06:22,640 --> 00:06:24,856 So everyone has the same mechanism. 119 00:06:24,880 --> 00:06:26,656 But voices are quite unique. 120 00:06:26,680 --> 00:06:32,456 It's because very subtle differences in size, physiology, in hormone levels 121 00:06:32,480 --> 00:06:36,120 are going to make very subtle differences in your outward voice. 122 00:06:36,840 --> 00:06:38,856 And your brain is very good 123 00:06:38,880 --> 00:06:42,680 at picking up those subtle differences from other people's outward voices. 124 00:06:43,760 --> 00:06:47,016 In our lab, we are working on teaching machines 125 00:06:47,040 --> 00:06:49,456 to understand those subtle differences. 126 00:06:49,480 --> 00:06:55,336 And we use deep learning to create a real-time speaker identification system 127 00:06:55,360 --> 00:07:00,136 to help raise awareness on the use of the shared vocal space -- 128 00:07:00,160 --> 00:07:02,560 so who talks and who never talks during meetings -- 129 00:07:03,480 --> 00:07:05,280 to increase group intelligence. 130 00:07:05,960 --> 00:07:10,080 And one of the difficulties with that is that your voice is also not static. 131 00:07:11,000 --> 00:07:14,136 We already said that it changes with every person you talk to 132 00:07:14,160 --> 00:07:17,096 but it also changes generally throughout your life. 133 00:07:17,120 --> 00:07:19,336 At the beginning and at the end of the journey, 134 00:07:19,360 --> 00:07:22,056 male and female voices are very similar. 135 00:07:22,080 --> 00:07:23,616 It's very hard to distinguish 136 00:07:23,640 --> 00:07:26,760 the voice of a very young girl from the voice of a very young boy. 137 00:07:28,280 --> 00:07:33,376 But in between, your voice becomes a marker of your fluid identity. 138 00:07:33,400 --> 00:07:37,336 Generally, for male voices there's a big change at puberty. 139 00:07:37,360 --> 00:07:38,696 And then for female voices, 140 00:07:38,720 --> 00:07:41,960 there is a change at each pregnancy and a big change at menopause. 141 00:07:43,320 --> 00:07:47,456 So all of that is the voice other people hear when you talk. 142 00:07:47,480 --> 00:07:50,680 So why is it that we're so unfamiliar with it? 143 00:07:51,600 --> 00:07:55,016 Why is it that it's not the voice that we hear? 144 00:07:55,040 --> 00:07:56,256 So, let's think about it. 145 00:07:56,280 --> 00:07:59,800 When you wear a mask, you actually don't see the mask. 146 00:08:00,640 --> 00:08:04,560 And when you try to observe it, what you will see is inside of the mask. 147 00:08:05,240 --> 00:08:06,920 And that's your inward voice. 148 00:08:08,560 --> 00:08:10,576 So to understand why it's different, 149 00:08:10,600 --> 00:08:14,280 let's try to understand the mechanism of perception of this inward voice. 150 00:08:15,560 --> 00:08:18,496 Because your body has many ways of filtering it differently 151 00:08:18,520 --> 00:08:20,536 from the outward voice. 152 00:08:20,560 --> 00:08:24,056 So to perceive this voice, it first has to travel to your ears. 153 00:08:24,080 --> 00:08:26,656 And your outward voice travels through the air 154 00:08:26,680 --> 00:08:30,176 while your inward voice travels through your bones. 155 00:08:30,200 --> 00:08:31,800 This is called bone conduction. 156 00:08:32,640 --> 00:08:37,655 Because of this, your inward voice is going to sound in a lower register 157 00:08:37,679 --> 00:08:42,400 and also more musically harmonical than your outward voice. 158 00:08:43,400 --> 00:08:47,296 Once it travels there, it has to access your inner ear. 159 00:08:47,320 --> 00:08:49,736 And there's this other mechanism taking place here. 160 00:08:49,760 --> 00:08:51,856 It's a mechanical filter, 161 00:08:51,880 --> 00:08:55,416 it's a little partition that comes and protects your inner ear 162 00:08:55,440 --> 00:08:58,176 each time you produce a sound. 163 00:08:58,200 --> 00:09:00,400 So it also reduces what you hear. 164 00:09:01,240 --> 00:09:04,200 And then there is a third filter, it's a biological filter. 165 00:09:04,880 --> 00:09:09,136 Your cochlea -- it's a part of your inner ear that processes the sound -- 166 00:09:09,160 --> 00:09:11,216 is made out of living cells. 167 00:09:11,240 --> 00:09:14,336 And those living cells are going to trigger differently 168 00:09:14,360 --> 00:09:16,936 according to how often they hear the sound. 169 00:09:16,960 --> 00:09:18,360 It's a habituation effect. 170 00:09:19,400 --> 00:09:20,936 So because of this, 171 00:09:20,960 --> 00:09:24,016 as your voice is the sound you hear the most in your life, 172 00:09:24,040 --> 00:09:26,520 you actually hear it less than other sounds. 173 00:09:27,280 --> 00:09:29,296 Finally, we have a fourth filter. 174 00:09:29,320 --> 00:09:30,840 It's a neurological filter. 175 00:09:31,760 --> 00:09:34,216 Neurologists found out recently 176 00:09:34,240 --> 00:09:37,016 that when you open your mouth to create a sound, 177 00:09:37,040 --> 00:09:39,840 your own auditory cortex shuts down. 178 00:09:42,400 --> 00:09:45,296 So you hear your voice 179 00:09:45,320 --> 00:09:49,600 but your brain actually never listens to the sound of your voice. 180 00:09:52,040 --> 00:09:54,576 Well, evolutionarily that might make sense, 181 00:09:54,600 --> 00:09:57,576 because we know cognitively what we are going to sound like 182 00:09:57,600 --> 00:10:00,760 so maybe we don't need to spend energy analyzing the signal. 183 00:10:01,560 --> 00:10:05,216 And this is called a corollary discharge 184 00:10:05,240 --> 00:10:07,736 and it happens for every motion that your body does. 185 00:10:07,760 --> 00:10:09,976 The exact definition of a corollary discharge 186 00:10:10,000 --> 00:10:15,256 is a copy of a motor command that is sent by the brain. 187 00:10:15,280 --> 00:10:17,976 This copy doesn't create any motion itself 188 00:10:18,000 --> 00:10:21,976 but instead is sent to other regions of the brain 189 00:10:22,000 --> 00:10:24,160 to inform them of the impending motion. 190 00:10:26,120 --> 00:10:29,600 And for the voice, this corollary discharge also has a different name. 191 00:10:30,480 --> 00:10:32,576 It is your inner voice. 192 00:10:32,600 --> 00:10:34,416 So let's recapitulate. 193 00:10:34,440 --> 00:10:36,496 We have the mask, the outward voice, 194 00:10:36,520 --> 00:10:39,616 the inside of the mask, your inward voice, 195 00:10:39,640 --> 00:10:41,656 and then you have your inner voice. 196 00:10:41,680 --> 00:10:43,736 And I like to see this one as the puppeteer 197 00:10:43,760 --> 00:10:46,200 that holds the strings of the whole system. 198 00:10:47,440 --> 00:10:49,056 Your inner voice is 199 00:10:49,080 --> 00:10:51,920 the one you hear when you read a text silently, 200 00:10:53,320 --> 00:10:55,560 when you rehearse for an important conversation. 201 00:10:56,760 --> 00:10:58,376 Sometimes is hard to turn it off, 202 00:10:58,400 --> 00:11:02,456 it's really hard to look at the text written in your native language, 203 00:11:02,480 --> 00:11:04,560 without having this inner voice read it. 204 00:11:05,800 --> 00:11:08,136 It's also the voice that refuse to stop singing 205 00:11:08,160 --> 00:11:09,976 the stupid song you have in your head. 206 00:11:10,000 --> 00:11:11,200 (Laughter) 207 00:11:13,280 --> 00:11:16,976 And for some people it's actually impossible to control it. 208 00:11:17,000 --> 00:11:19,736 And that's the case of schizophrenic patients, 209 00:11:19,760 --> 00:11:21,776 who have auditory hallucinations. 210 00:11:21,800 --> 00:11:25,136 Who can't distinguish at all between voices coming from inside 211 00:11:25,160 --> 00:11:26,656 and outside their head. 212 00:11:26,680 --> 00:11:30,056 So in our lab, we are also working on small devices 213 00:11:30,080 --> 00:11:32,616 to help those people make those distinctions 214 00:11:32,640 --> 00:11:35,280 and know if a voice is internal or external. 215 00:11:36,760 --> 00:11:41,096 You can also think about the inner voice as the voice that speaks in your dream. 216 00:11:41,120 --> 00:11:43,216 This inner voice can take many forms. 217 00:11:43,240 --> 00:11:47,456 And in your dreams, you actually unleash the potential of this inner voice. 218 00:11:47,480 --> 00:11:49,536 That's another work we are doing in our lab: 219 00:11:49,560 --> 00:11:52,720 trying to access this inner voice in dreams. 220 00:11:54,280 --> 00:11:56,576 So even if you can't always control it, 221 00:11:56,600 --> 00:11:59,136 the inner voice -- you can always engage with it 222 00:11:59,160 --> 00:12:01,496 through dialogue, through inner dialogues. 223 00:12:01,520 --> 00:12:03,336 And you can even see this inner voice 224 00:12:03,360 --> 00:12:06,000 as the missing link between thought and actions. 225 00:12:08,640 --> 00:12:11,776 So I hope I've left you with a better appreciation, 226 00:12:11,800 --> 00:12:15,256 a new appreciation of all of your voices 227 00:12:15,280 --> 00:12:17,816 and the role it plays inside and outside of you -- 228 00:12:17,840 --> 00:12:22,256 as your voice is a very critical determinant of what makes you humans 229 00:12:22,280 --> 00:12:24,560 and of how you interact with the world. 230 00:12:25,120 --> 00:12:26,336 Thank you. 231 00:12:26,360 --> 00:12:29,200 (Applause)