WEBVTT 00:00:06.070 --> 00:00:07.120 Hi, my name's John. 00:00:07.510 --> 00:00:09.880 I lead the search and machine learning teams at Google. 00:00:12.130 --> 00:00:14.230 I think it's amazingly inspiring 00:00:14.230 --> 00:00:16.214 that people all over the world 00:00:16.215 --> 00:00:19.240 turn to search engines to ask trivial questions 00:00:19.240 --> 00:00:20.530 and incredibly important questions. 00:00:20.560 --> 00:00:23.290 So it's a huge responsibility to give them 00:00:23.290 --> 00:00:24.594 the best answers that we can. 00:00:26.710 --> 00:00:30.610 Hi, my name's Akshaya and I work on the Bing search team. 00:00:31.090 --> 00:00:33.190 There are many times where we will start looking 00:00:33.190 --> 00:00:35.800 into artificial intelligence and machine learning, 00:00:35.830 --> 00:00:39.010 but we have to address how are the users going to use this, 00:00:39.140 --> 00:00:42.220 because at the end of the day, we want to make an impact to society. 00:00:43.780 --> 00:00:45.400 Let's ask a simple question. 00:00:45.820 --> 00:00:48.070 How long does it take to travel to Mars? 00:00:49.330 --> 00:00:50.950 Where did these results come from 00:00:51.370 --> 00:00:54.100 and why was this listed before the other one? 00:00:55.700 --> 00:00:58.150 Okay, let's dive in and see how the search engine 00:00:58.150 --> 00:00:59.860 turned your request into a result. 00:01:01.090 --> 00:01:03.310 The first thing you need to know is when you do a search, 00:01:03.430 --> 00:01:06.310 the search engine isn't actually going out to the World Wide Web 00:01:06.310 --> 00:01:07.840 to run your search in real time. 00:01:08.520 --> 00:01:10.450 And that's because there's over a billion websites 00:01:10.450 --> 00:01:13.780 on the internet and hundreds more are being created every single minute. 00:01:14.680 --> 00:01:16.210 So if the search engine had to look through 00:01:16.240 --> 00:01:18.220 every single site to find the one you wanted, 00:01:18.510 --> 00:01:19.390 it would just take forever. 00:01:20.500 --> 00:01:21.940 So to make your search faster, 00:01:21.970 --> 00:01:24.940 search engines are constantly scanning the web in advance 00:01:25.420 --> 00:01:28.140 to record the information that might help with your search later. 00:01:28.930 --> 00:01:31.270 That way, when you search about travel to Mars, 00:01:31.630 --> 00:01:33.700 the search engine already has what it needs 00:01:33.700 --> 00:01:35.230 to give you an answer in real time. 00:01:36.250 --> 00:01:37.540 Here's how it works. 00:01:37.900 --> 00:01:42.010 The internet is a web of pages connected to each other by hyperlinks. 00:01:42.400 --> 00:01:44.680 Search engines are constantly running a program 00:01:44.680 --> 00:01:47.380 called a Spider that cross through these web pages 00:01:47.380 --> 00:01:49.040 to collect information about them. 00:01:49.780 --> 00:01:51.550 Each time it finds a hyperlink, 00:01:52.090 --> 00:01:55.000 it follows it until it has visited every page 00:01:55.030 --> 00:01:57.240 it can find on the entire internet. 00:01:57.335 --> 00:01:59.170 For each page the spider visits, 00:01:59.200 --> 00:02:02.320 it records any information it might need for a search 00:02:02.500 --> 00:02:05.650 by adding it to a special database called a search index. 00:02:07.520 --> 00:02:09.530 Now, let's go back to that search from earlier 00:02:09.590 --> 00:02:11.720 and see if we can figure out how the search engine 00:02:11.750 --> 00:02:13.093 came up with the results. 00:02:13.640 --> 00:02:16.460 When you ask how long does it take to travel to Mars, 00:02:16.640 --> 00:02:18.860 the search engine looks in each of those words 00:02:18.920 --> 00:02:21.410 in the search index to immediately get a list 00:02:21.410 --> 00:02:24.500 of all the pages on the internet containing those words. 00:02:25.130 --> 00:02:26.870 But just looking for these search terms 00:02:26.870 --> 00:02:28.760 could return millions of pages, 00:02:29.060 --> 00:02:30.980 so the search engine needs to be able to determine 00:02:30.980 --> 00:02:32.990 the best matches to show you first. 00:02:33.860 --> 00:02:35.810 This is where it gets tricky because the search engine 00:02:35.840 --> 00:02:37.610 may need to guess what you're looking for. 00:02:38.930 --> 00:02:41.360 Each search engine uses its own algorithm 00:02:41.360 --> 00:02:44.230 to rank the pages based on what it thinks you want. 00:02:44.930 --> 00:02:47.660 The search engine's ranking algorithm might check 00:02:47.990 --> 00:02:50.360 if your search term shows up in the page title, 00:02:51.040 --> 00:02:53.660 it might check if all of the words show up next to each other, 00:02:54.520 --> 00:02:57.020 or any number of other calculations 00:02:57.020 --> 00:02:58.610 that help it better determine 00:02:58.670 --> 00:03:01.420 which pages you'll want to see and which you won't. 00:03:03.280 --> 00:03:04.960 Google invented the most famous algorithm 00:03:04.960 --> 00:03:08.530 for choosing the most relevant results for a search by taking into account 00:03:08.560 --> 00:03:11.230 how many other Web pages linked to a given page. 00:03:11.830 --> 00:03:14.140 The idea is that if lots of websites think 00:03:14.140 --> 00:03:15.400 that a web page is interesting, 00:03:15.660 --> 00:03:17.170 then it's probably the one you're looking for. 00:03:18.190 --> 00:03:20.020 This algorithm is called page rank, 00:03:20.590 --> 00:03:22.330 not because it ranks web pages, 00:03:22.570 --> 00:03:25.210 but because it was named after its inventor, Larry Page, 00:03:25.480 --> 00:03:27.023 who's one of the founders of Google. 00:03:27.940 --> 00:03:30.520 Because a website often makes money when you visit it, 00:03:30.820 --> 00:03:32.950 spammers are constantly trying to find ways 00:03:32.950 --> 00:03:35.741 to game the search algorithm so that their pages 00:03:35.742 --> 00:03:37.931 are listed higher in the results. 00:03:38.260 --> 00:03:40.750 Search engines regularly update their algorithms 00:03:40.750 --> 00:03:44.296 to prevent fake or untrustworthy sites from reaching the top. 00:03:44.680 --> 00:03:47.350 Ultimately, it's up to you to keep an eye out 00:03:47.500 --> 00:03:49.450 for these pages that are untrustworthy 00:03:49.690 --> 00:03:52.990 by looking at the web address and making sure it's a reliable source. 00:03:53.680 --> 00:03:55.390 Search programs are always evolving 00:03:55.420 --> 00:03:58.420 to improve the algorithms wo they return better results, 00:03:58.540 --> 00:04:00.460 faster results than their competitors. 00:04:01.000 --> 00:04:03.100 Today's search engines even use information 00:04:03.100 --> 00:04:06.310 that you haven't explicitly provided to help you narrow down your search. 00:04:07.150 --> 00:04:10.120 So, for example, if you did a search for dog parks, 00:04:10.510 --> 00:04:12.190 many search engines would give you results 00:04:12.190 --> 00:04:13.840 for all the dog parks nearby, 00:04:14.080 --> 00:04:15.880 even though you didn't type in your location. 00:04:17.800 --> 00:04:20.530 Modern search engines also understand more 00:04:20.530 --> 00:04:22.060 than just the words on a page, 00:04:22.300 --> 00:04:24.970 but what they actually mean in order to find the best one 00:04:24.970 --> 00:04:26.560 that matches what you're looking for. 00:04:27.130 --> 00:04:29.980 For example, if you search for fast pitcher, 00:04:30.280 --> 00:04:32.000 it will know you're looking for an athlete. 00:04:32.500 --> 00:04:34.180 But if you search for large pitcher, 00:04:34.450 --> 00:04:36.730 it will find you options for your kitchen. 00:04:38.810 --> 00:04:40.000 To understand the words better, 00:04:40.000 --> 00:04:41.440 we use something called machine learning, 00:04:41.800 --> 00:04:43.390 a type of artificial intelligence. 00:04:43.760 --> 00:04:46.050 It enables search algorithms to search out 00:04:46.090 --> 00:04:48.190 not just individual letters or words in the page, 00:04:48.400 --> 00:04:51.280 but understand the underlying meaning of the words. 00:04:53.690 --> 00:04:55.850 The internet is growing exponentially, 00:04:56.210 --> 00:04:59.810 but if the teams that design search engines do our jobs right, 00:05:00.080 --> 00:05:04.090 the information you want should always be just a few keystrokes away.