1 00:00:06,070 --> 00:00:07,120 Hi, my name's John. 2 00:00:07,510 --> 00:00:10,140 I lead the search and machine learning teams at Google. 3 00:00:12,130 --> 00:00:14,230 I think it's amazingly inspiring 4 00:00:14,230 --> 00:00:16,214 that people all over the world 5 00:00:16,215 --> 00:00:19,160 turn to search engines to ask trivial questions 6 00:00:19,160 --> 00:00:20,930 and incredibly important questions. 7 00:00:20,930 --> 00:00:23,450 So it's a huge responsibility to give them 8 00:00:23,450 --> 00:00:24,864 the best answers that we can. 9 00:00:26,710 --> 00:00:30,610 Hi, my name's Akshaya and I work on the Bing search team. 10 00:00:30,910 --> 00:00:33,190 There are many times where we will start looking 11 00:00:33,190 --> 00:00:35,800 into artificial intelligence and machine learning, 12 00:00:35,830 --> 00:00:39,010 but we have to address how are the users going to use this, 13 00:00:39,140 --> 00:00:42,390 because at the end of the day, we want to make an impact to society. 14 00:00:43,780 --> 00:00:45,400 Let's ask a simple question. 15 00:00:45,820 --> 00:00:48,070 How long does it take to travel to Mars? 16 00:00:49,330 --> 00:00:50,950 Where did these results come from 17 00:00:51,370 --> 00:00:54,100 and why was this listed before the other one? 18 00:00:55,700 --> 00:00:58,150 Okay, let's dive in and see how the search engine 19 00:00:58,150 --> 00:00:59,860 turned your request into a result. 20 00:01:00,690 --> 00:01:03,360 The first thing you need to know is when you do a search, 21 00:01:03,430 --> 00:01:06,480 the search engine isn't actually going out to the World Wide Web 22 00:01:06,480 --> 00:01:08,010 to run your search in real time. 23 00:01:08,140 --> 00:01:10,610 And that's because there's over a billion websites 24 00:01:10,610 --> 00:01:14,140 on the internet and hundreds more are being created every single minute. 25 00:01:14,140 --> 00:01:16,210 So if the search engine had to look through 26 00:01:16,240 --> 00:01:18,690 every single site to find the one you wanted, 27 00:01:18,690 --> 00:01:20,120 it would just take forever. 28 00:01:20,500 --> 00:01:21,940 So to make your search faster, 29 00:01:21,970 --> 00:01:24,940 search engines are constantly scanning the web in advance 30 00:01:25,420 --> 00:01:28,560 to record the information that might help with your search later. 31 00:01:28,930 --> 00:01:31,270 That way, when you search about travel to Mars, 32 00:01:31,630 --> 00:01:33,700 the search engine already has what it needs 33 00:01:33,700 --> 00:01:35,728 to give you an answer in real time. 34 00:01:36,250 --> 00:01:37,540 Here's how it works. 35 00:01:37,900 --> 00:01:42,010 The internet is a web of pages connected to each other by hyperlinks. 36 00:01:42,400 --> 00:01:44,680 Search engines are constantly running a program 37 00:01:44,680 --> 00:01:47,380 called a Spider that cross through these web pages 38 00:01:47,380 --> 00:01:49,040 to collect information about them. 39 00:01:49,780 --> 00:01:51,550 Each time it finds a hyperlink, 40 00:01:52,090 --> 00:01:55,000 it follows it until it has visited every page 41 00:01:55,030 --> 00:01:57,240 it can find on the entire internet. 42 00:01:57,335 --> 00:01:59,170 For each page the spider visits, 43 00:01:59,200 --> 00:02:02,320 it records any information it might need for a search 44 00:02:02,500 --> 00:02:05,650 by adding it to a special database called a search index. 45 00:02:07,166 --> 00:02:09,530 Now, let's go back to that search from earlier 46 00:02:09,590 --> 00:02:11,990 and see if we can figure out how the search engine 47 00:02:11,990 --> 00:02:13,333 came up with the results. 48 00:02:13,640 --> 00:02:16,460 When you ask how long does it take to travel to Mars, 49 00:02:16,640 --> 00:02:18,860 the search engine looks in each of those words 50 00:02:18,920 --> 00:02:21,410 in the search index to immediately get a list 51 00:02:21,410 --> 00:02:24,500 of all the pages on the internet containing those words. 52 00:02:24,890 --> 00:02:26,870 But just looking for these search terms 53 00:02:26,870 --> 00:02:28,760 could return millions of pages, 54 00:02:28,760 --> 00:02:31,110 so the search engine needs to be able to determine 55 00:02:31,110 --> 00:02:33,120 the best matches to show you first. 56 00:02:33,340 --> 00:02:36,010 This is where it gets tricky because the search engine 57 00:02:36,010 --> 00:02:38,040 may need to guess what you're looking for. 58 00:02:38,930 --> 00:02:41,360 Each search engine uses its own algorithm 59 00:02:41,360 --> 00:02:44,230 to rank the pages based on what it thinks you want. 60 00:02:44,930 --> 00:02:47,660 The search engine's ranking algorithm might check 61 00:02:47,990 --> 00:02:50,360 if your search term shows up in the page title, 62 00:02:50,900 --> 00:02:53,820 it might check if all of the words show up next to each other, 63 00:02:54,520 --> 00:02:57,020 or any number of other calculations 64 00:02:57,020 --> 00:02:58,610 that help it better determine 65 00:02:58,670 --> 00:03:01,420 which pages you'll want to see and which you won't. 66 00:03:02,960 --> 00:03:04,960 Google invented the most famous algorithm 67 00:03:04,960 --> 00:03:08,530 for choosing the most relevant results for a search by taking into account 68 00:03:08,560 --> 00:03:11,230 how many other Web pages linked to a given page. 69 00:03:11,830 --> 00:03:14,140 The idea is that if lots of websites think 70 00:03:14,140 --> 00:03:15,660 that a web page is interesting, 71 00:03:15,660 --> 00:03:17,940 then it's probably the one you're looking for. 72 00:03:18,190 --> 00:03:20,020 This algorithm is called page rank, 73 00:03:20,590 --> 00:03:22,330 not because it ranks web pages, 74 00:03:22,570 --> 00:03:25,210 but because it was named after its inventor, Larry Page, 75 00:03:25,480 --> 00:03:27,333 who's one of the founders of Google. 76 00:03:27,940 --> 00:03:30,520 Because a website often makes money when you visit it, 77 00:03:30,820 --> 00:03:32,950 spammers are constantly trying to find ways 78 00:03:32,950 --> 00:03:35,741 to game the search algorithm so that their pages 79 00:03:35,742 --> 00:03:37,931 are listed higher in the results. 80 00:03:38,260 --> 00:03:40,750 Search engines regularly update their algorithms 81 00:03:40,750 --> 00:03:44,296 to prevent fake or untrustworthy sites from reaching the top. 82 00:03:44,680 --> 00:03:47,350 Ultimately, it's up to you to keep an eye out 83 00:03:47,500 --> 00:03:49,450 for these pages that are untrustworthy 84 00:03:49,690 --> 00:03:52,990 by looking at the web address and making sure it's a reliable source. 85 00:03:53,680 --> 00:03:55,390 Search programs are always evolving 86 00:03:55,420 --> 00:03:58,420 to improve the algorithms wo they return better results, 87 00:03:58,540 --> 00:04:00,460 faster results than their competitors. 88 00:04:01,000 --> 00:04:03,100 Today's search engines even use information 89 00:04:03,100 --> 00:04:06,820 that you haven't explicitly provided to help you narrow down your search. 90 00:04:07,150 --> 00:04:10,120 So, for example, if you did a search for dog parks, 91 00:04:10,240 --> 00:04:12,190 many search engines would give you results 92 00:04:12,190 --> 00:04:13,840 for all the dog parks nearby, 93 00:04:14,080 --> 00:04:16,260 even though you didn't type in your location. 94 00:04:17,800 --> 00:04:20,530 Modern search engines also understand more 95 00:04:20,530 --> 00:04:22,060 than just the words on a page, 96 00:04:22,300 --> 00:04:24,970 but what they actually mean in order to find the best one 97 00:04:24,970 --> 00:04:26,750 that matches what you're looking for. 98 00:04:27,130 --> 00:04:29,980 For example, if you search for fast pitcher, 99 00:04:30,280 --> 00:04:32,300 it will know you're looking for an athlete. 100 00:04:32,500 --> 00:04:34,450 But if you search for large pitcher, 101 00:04:34,450 --> 00:04:36,730 it will find you options for your kitchen. 102 00:04:38,420 --> 00:04:41,910 To understand the words better, we use something called machine learning, 103 00:04:41,910 --> 00:04:43,985 a type of artificial intelligence. 104 00:04:43,985 --> 00:04:46,050 It enables search algorithms to search out 105 00:04:46,090 --> 00:04:48,400 not just individual letters or words in the page, 106 00:04:48,400 --> 00:04:51,280 but understand the underlying meaning of the words. 107 00:04:53,690 --> 00:04:55,850 The internet is growing exponentially, 108 00:04:56,210 --> 00:04:59,810 but if the teams that design search engines do our jobs right, 109 00:05:00,080 --> 00:05:04,090 the information you want should always be just a few keystrokes away.