1 00:00:00,000 --> 00:00:05,640 Will artificial intelligence make language learning obsolete? If in ten 2 00:00:05,640 --> 00:00:11,190 years we will have a device that will allow us to communicate with people who speak other languages 3 00:00:11,190 --> 00:00:19,140 efficiently and without obstacles, will there still be people who will learn languages? This is 4 00:00:19,140 --> 00:00:24,480 a question that I often ask myself, which leads me to have existential crises because I think: "Oh my god, 5 00:00:24,480 --> 00:00:30,360 in ten years my work will no longer make sense! Nobody will learn Italian!" Now, 6 00:00:30,360 --> 00:00:36,240 no one can know how things are going to go, but I discussed it with my dad in my last episode 7 00:00:36,240 --> 00:00:41,220 of the podcast and why with my dad, why my dad (who just retired) 8 00:00:41,220 --> 00:00:47,610 has been involved with his entire career in intelligence. artificial and specifically 9 00:00:47,610 --> 00:00:53,112 voice recognition. You know when you talk to your devices, if you do? 10 00:00:53,112 --> 00:00:57,030 (automatic voice) In English it is "subscribe to Italian podcasts". Now 11 00:00:57,030 --> 00:00:59,280 I can act as an interpreter in a foreign language. 12 00:00:59,280 --> 00:01:00,330 Q: Hi, how are you doing? 13 00:01:00,330 --> 00:01:02,877 A: Hi, how are you? D: I'm fine, and you? 14 00:01:02,877 --> 00:01:05,790 A: I'm fine, and you? Q: I'm fine because I 15 00:01:05,790 --> 00:01:08,021 haven't signed up for Podcast Italiano yet. A: I'm sick because I haven't 16 00:01:08,021 --> 00:01:14,340 subscribed to an italian podcast yet. Q: What's an italian podcast? Do you mean 'Podcast 17 00:01:14,340 --> 00:01:15,840 Italiano', by any chance? 18 00:01:15,840 --> 00:01:20,040 A: What is an Italian podcast? Do you mean by any chance 'Italian Podcast'? 19 00:01:20,040 --> 00:01:23,340 D: Yes, of course, Italian Podcast, on YouTube! A: Yes, of course italian podcast on youtube. 20 00:01:23,340 --> 00:01:27,390 D: Of course, but do you know there's a podcast version of it as well? 21 00:01:27,390 --> 00:01:34,410 A: Sure, but did you know that there is also a podcast version? 22 00:01:34,410 --> 00:01:40,320 This technology already exists, maybe it's not perfect but in ten years? I do n't 23 00:01:40,320 --> 00:01:45,300 know ... Anyway my father was involved in speech recognition, that is the part 24 00:01:45,300 --> 00:01:52,560 of understanding by the human language machine. We did two episodes, 25 00:01:52,560 --> 00:01:56,850 the first about his career in the world of artificial intelligence and the 26 00:01:56,850 --> 00:02:02,250 second specifically about the linguistic applications of artificial intelligence 27 00:02:02,250 --> 00:02:09,570 and neural networks, which are so fashionable today. I leave you now an excerpt from the 28 00:02:09,570 --> 00:02:14,310 second episode in which we talk about these things. I hope you like it. 29 00:02:14,310 --> 00:02:22,080 Q: I wanted to talk now a little bit about the ... artificial intelligence in the linguistic 30 00:02:22,080 --> 00:02:28,170 field , so it seems to me that there are three uses mainly: translation, 31 00:02:28,170 --> 00:02:37,230 voice recognition and voice synthesis, so to make a machine speak, right ? 32 00:02:37,230 --> 00:02:38,910 A: Exactly. Q: Here, 33 00:02:38,910 --> 00:02:44,490 I wanted to start with the very speech recognition you've been dealing with for a 34 00:02:44,490 --> 00:02:53,160 long time and ask you how it works. You told me that once upon a time they just put linguistic 35 00:02:53,160 --> 00:02:59,400 knowledge in the machine and then this approach was completely abandoned. 36 00:02:59,400 --> 00:03:07,890 A: Exactly. Until, let's say, all the 90s, until 2000 even the automatic recognition 37 00:03:07,890 --> 00:03:16,770 systems had the skills of human experts inside. There was, for example, the phonetic knowledge 38 00:03:16,770 --> 00:03:23,340 of the language, that is, what are the basic sounds of the language and how they are organized among 39 00:03:23,340 --> 00:03:30,600 them; lexical knowledge, that is, how these sounds form words; then there was the 40 00:03:30,600 --> 00:03:39,060 syntactic knowledge, that is how words form correct sentences of the language, and these ... 41 00:03:39,060 --> 00:03:45,840 these knowledge were introduced by human experts, by phoneticians, linguists 42 00:03:45,840 --> 00:03:51,920 who inserted into the code (or in any case in the knowledge of the computer ) these informations. 43 00:03:51,920 --> 00:03:54,682 D: That is, in short, all the grammar of a language because it is grammar ... 44 00:03:54,682 --> 00:03:58,920 A: Yes, we spoke, in fact, of grammar, even the grammars of a language 45 00:03:58,920 --> 00:04:03,300 were inserted into ... into the computer. Q: So there were people with a 46 00:04:03,300 --> 00:04:09,460 grammar book that translated rules into computer instructions? 47 00:04:09,460 --> 00:04:14,380 A: Yes yes, exactly. At the beginning I myself had brought Italian grammar to the office. 48 00:04:14,380 --> 00:04:18,190 D: Of Serianni? A: The one I had in high school, I don't know whose it was, 49 00:04:18,190 --> 00:04:24,040 but and this went on for a very long time, then they ... they started using 50 00:04:24,040 --> 00:04:31,030 statistical methods too, at least for the lower level part of the sound. . But then lately, 51 00:04:31,030 --> 00:04:40,690 so I would say from 2013 onwards all this has literally disappeared, in the sense that 52 00:04:40,690 --> 00:04:47,860 models of neural networks called end to end models have arrived , i.e. models that go from start to 53 00:04:47,860 --> 00:04:55,570 finish, and these models in speech recognition start from signal that comes out of the microphone, from 54 00:04:55,570 --> 00:05:03,845 the waveform and reaches the words. Then a waveform enters, a sequence of words comes out. 55 00:05:03,845 --> 00:05:13,920 D: And so ... so everything in between, syntax, phonetics, morphology, vocabulary, everything ... 56 00:05:13,920 --> 00:05:19,080 everything in between happens magically? A: It happens magically in the interaction between these 57 00:05:19,080 --> 00:05:25,560 neurons. These end-to-end models are even more complicated than I have described 58 00:05:25,560 --> 00:05:34,620 neural networks . And yet all these ... this human knowledge and linguistic knowledge have disappeared. 59 00:05:34,620 --> 00:05:39,840 Oh my God, maybe they are present in the neural network but it is an opaque model, it is a so-called 60 00:05:39,840 --> 00:05:46,080 black box, it is a black box and therefore we do not know if the neural network has used them, has 61 00:05:46,080 --> 00:05:50,550 not used them. Will he have ... in his learning he will have rediscovered phonetics, will he have rediscovered 62 00:05:50,550 --> 00:05:56,580 linguistics? We don't really know. Q: So there is no way to understand what he is 63 00:05:56,580 --> 00:06:03,180 learning and what he "thinks", in quotation marks? A: No, I would say ... I would say no. This is 64 00:06:03,180 --> 00:06:09,150 perhaps one of the limitations of these neural networks that ... which perhaps is also a limitation of 65 00:06:09,150 --> 00:06:19,110 biological neural networks , in the sense that they are not inspectable. D: So they're not ... they're not very transparent. 66 00:06:19,110 --> 00:06:22,230 And so the only thing you need for these models 67 00:06:22,230 --> 00:06:27,870 is audio that is transcribed, right? A: Transcribed into words. And it takes 68 00:06:27,870 --> 00:06:36,090 many hours, we are talking about thousands of hours of transcribed recordings and the more there is, the 69 00:06:36,090 --> 00:06:45,120 better it will work. But it also takes days and days of very powerful computer computing 70 00:06:45,120 --> 00:06:51,510 to train the neural network, but eventually this network begins to figure out how to relate 71 00:06:51,510 --> 00:06:56,190 this strange input coming out of the microphone with words. In the case of recognition; 72 00:06:56,190 --> 00:07:00,570 in the case of translation it correlates the words in one language with the words in the other, even with 73 00:07:00,570 --> 00:07:06,150 totally different characters, this does not matter. D: Sure. And instead in the case of synthesis, 74 00:07:06,150 --> 00:07:11,520 that is, making the machine speak, how does that work? The correlation between what is it? 75 00:07:11,520 --> 00:07:17,730 A: Yes, it is exactly the other way around. The examples are pairs where the input is a sequence 76 00:07:17,730 --> 00:07:23,535 of words and the output is a waveform. Q: But is the ... waveform created from scratch? 77 00:07:23,535 --> 00:07:25,050 A: From scratch. D: Because once upon a time, 78 00:07:25,050 --> 00:07:30,510 perhaps you told me, they used just blocks of words, bits of 79 00:07:30,510 --> 00:07:34,830 words that were recombined in various ways. A: Exactly, what was in the synthesis, 80 00:07:34,830 --> 00:07:42,540 let's say, the classic one with knowledge introduced by man in which, in fact, man classified 81 00:07:42,540 --> 00:07:48,510 many pieces of recording that were then concatenated, the so-called concatenative synthesis. 82 00:07:48,510 --> 00:07:51,270 D: What can still be heard, for example, on some trains ... 83 00:07:51,270 --> 00:07:54,315 A: Otherwise, but I would say that ... D: Is it still used a lot? 84 00:07:54,315 --> 00:07:58,620 A: Yes yes yes, in the Italian railway systems the synthesis is still that of 85 00:07:58,620 --> 00:08:03,420 the 90 '. In my opinion or Amazon's systems they are already used though. 86 00:08:03,420 --> 00:08:10,640 Q: So if we have a system that ... the first step is to recognize the voice, then from the sound to the 87 00:08:10,640 --> 00:08:17,990 text, then the translation that translates into another language, into another text then, translated, 88 00:08:17,990 --> 00:08:25,100 and then we have the synthesis that he reads the translation aloud, in fact we have ... we have an 89 00:08:25,100 --> 00:08:30,950 interpreter, we have an interpreter. So you think that maybe translators and interpreters, translators 90 00:08:30,950 --> 00:08:36,425 already now but interpreters in the future, will be at risk, for example, in conferences? 91 00:08:36,425 --> 00:08:41,870 A: Unfortunately for human translators and interpreters I think yes, it will happen, 92 00:08:41,870 --> 00:08:48,770 or at least it will greatly reduce the scope ... the possibilities of ... work. In the sense that the 93 00:08:48,770 --> 00:08:56,540 sooner things will disappear more, let's say, more routine. I believe that the translation of technical manuals 94 00:08:56,540 --> 00:09:01,790 or product manuals is already done almost entirely automatically, even though there are still some 95 00:09:01,790 --> 00:09:09,950 errors inside. And then gradually also the work of interpreting maybe ... Maybe the two alternatives will exist 96 00:09:09,950 --> 00:09:14,960 for some time, the automatic one, cheaper and less precise, and 97 00:09:14,960 --> 00:09:21,920 the human one, more accurate and more expensive. D: And I think that even now it is starting to be 98 00:09:21,920 --> 00:09:27,560 a problem for some, for some translators, perhaps because clearly if 99 00:09:27,560 --> 00:09:33,710 the translator only has to correct the work done by a machine the pay, 100 00:09:33,710 --> 00:09:39,824 the pay will be ... it will be lower. A: Of course, and perhaps even less beautiful work. 101 00:09:39,824 --> 00:09:45,410 D: But even less beautiful yes. Going back to the topic of languages, do you think that learning languages 102 00:09:45,410 --> 00:09:52,040 will still be relevant in 10 or 15 years? This is a question I often ask myself. I don't know if there will be 103 00:09:52,920 --> 00:09:59,730 a device that will allow us to communicate with people who speak different languages ​​or a 104 00:09:59,730 --> 00:10:07,260 chip in the brain but also something less futuristic, let's say. Will it still be relevant? 105 00:10:07,260 --> 00:10:15,000 A: But I think that at least for a long time it will not become obsolete, in the sense that one learns 106 00:10:15,000 --> 00:10:24,660 a language for many reasons, but surely one thing is to be able to speak and communicate in a language 107 00:10:24,660 --> 00:10:32,040 without any device, another thing is to always have a device in hand or a device 108 00:10:32,040 --> 00:10:38,610 that acts as a mediator. I imagine that for reasons or work, or tourism, or even a little 109 00:10:38,610 --> 00:10:46,080 occasional these devices will certainly be used. Or maybe even in certain conferences which 110 00:10:46,080 --> 00:10:52,590 are occasional meetings of people of different ... so many nationalities could be used. But if 111 00:10:52,590 --> 00:11:00,270 one wants, in fact, to learn a language also to enter the culture of a nation, of a country 112 00:11:00,960 --> 00:11:10,110 this will remain totally irreplaceable, unless, as you mentioned, to implant 113 00:11:10,110 --> 00:11:16,530 artificial neural networks in the brain, then they implant the memory expansion of the language, 114 00:11:16,530 --> 00:11:20,820 you buy it and they bring it to you, but this really in my opinion is a little too far. 115 00:11:20,820 --> 00:11:26,340 And this was the excerpt, I hope you enjoyed it. If you want to hear the whole episode or the 116 00:11:26,340 --> 00:11:31,500 two episodes we did find the link below. Oh and if you didn't know, 117 00:11:31,500 --> 00:11:38,550 yes I have a podcast for those who learn the Italian language and that's why my name is Italian podcast, 118 00:11:38,550 --> 00:11:42,480 I know it's a bit weird. But I think these episodes in particular can be of interest to everyone 119 00:11:42,480 --> 00:11:49,080 even if you are Italian, because in short, my father is an expert in the sector. So let me 120 00:11:49,080 --> 00:11:54,240 know what you think, would you learn a language if there was a technology like it already 121 00:11:54,240 --> 00:12:01,140 exists but much more efficient than this? Maybe integrated into a device in our brain, 122 00:12:01,140 --> 00:12:06,370 or into a device that I know, a little more efficient than a mobile phone, 123 00:12:06,370 --> 00:12:11,890 which still acts as a bit of an obstacle, gets in the way between me and another person? Or do 124 00:12:11,890 --> 00:12:17,260 you think that you will continue to learn the language perhaps out of love for a culture or because you 125 00:12:17,260 --> 00:12:22,180 like learning languages? And what do you think most people will do? Let me 126 00:12:22,180 --> 00:12:27,010 know what you think as you understand it didn't have a real video for this week but 127 00:12:27,010 --> 00:12:33,430 let's be back next week with our usual schedule. Until next time! Bye Bye.