0:00:17.903,0:00:19.617 Hello, my name is Christian Rudder, 0:00:19.641,0:00:21.850 and I was one of the founders of OkCupid. 0:00:21.874,0:00:24.792 It's now one of the biggest[br]dating sites in the United States. 0:00:24.816,0:00:27.207 Like most everyone at the site,[br]I was a math major, 0:00:27.231,0:00:30.671 As you may expect, we're known[br]for the analytic approach we take to love. 0:00:30.695,0:00:32.333 We call it our matching algorithm. 0:00:32.357,0:00:34.945 Basically, OkCupid's matching[br]algorithm helps us decide 0:00:34.969,0:00:36.845 whether two people should go on a date. 0:00:36.869,0:00:38.741 We built our entire business around it. 0:00:38.765,0:00:40.725 Now, algorithm is a fancy word, 0:00:40.749,0:00:43.234 and people like to drop it[br]like it's this big thing. 0:00:43.258,0:00:45.546 But really, an algorithm[br]is just a systematic, 0:00:45.570,0:00:47.793 step-by-step way to solve a problem. 0:00:47.817,0:00:49.994 It doesn't have to be fancy at all. 0:00:50.018,0:00:51.169 Here in this lesson, 0:00:51.193,0:00:54.201 I'm going to explain how we arrived[br]at our particular algorithm, 0:00:54.225,0:00:55.636 so you can see how it's done. 0:00:55.660,0:00:57.594 Now, why are algorithms even important? 0:00:57.618,0:00:59.198 Why does this lesson even exist? 0:00:59.222,0:01:02.642 Well, notice one very significant[br]phrase I used above: 0:01:02.666,0:01:05.005 they are a step-by-step[br]way to solve a problem, 0:01:05.029,0:01:08.447 and as you probably know, computers[br]excel at step-by-step processes. 0:01:08.471,0:01:10.060 A computer without an algorithm 0:01:10.084,0:01:12.808 is basically an expensive paperweight. 0:01:12.832,0:01:15.821 And since computers are such[br]a pervasive part of everyday life, 0:01:15.845,0:01:17.392 algorithms are everywhere. 0:01:18.590,0:01:21.787 The math behind OkCupid's matching[br]algorithm is surprisingly simple. 0:01:21.811,0:01:25.813 It's just some addition, multiplication,[br]a little bit of square roots. 0:01:25.837,0:01:27.527 The tricky part in designing it 0:01:27.551,0:01:30.116 was figuring out how to take[br]something mysterious, 0:01:30.140,0:01:31.290 human attraction, 0:01:31.314,0:01:34.098 and break it into components[br]that a computer can work with. 0:01:34.122,0:01:36.675 The first thing we needed[br]to match people up was data, 0:01:36.699,0:01:38.691 something for the algorithm to work with. 0:01:38.715,0:01:41.873 The best way to get data quickly[br]from people is to just ask for it. 0:01:41.897,0:01:44.624 So we decided that OkCupid[br]should ask users questions, 0:01:44.648,0:01:47.005 stuff like, "Do you want[br]to have kids one day?" 0:01:47.029,0:01:48.787 "How often do you brush your teeth?" 0:01:48.811,0:01:50.203 "Do you like scary movies?" 0:01:50.675,0:01:52.752 And big stuff like,[br]"Do you believe in God?" 0:01:53.843,0:01:56.907 Now, a lot of the questions[br]are good for matching like with like, 0:01:56.931,0:01:59.087 that is, when both people[br]answer the same way. 0:01:59.111,0:02:01.659 For example, two people[br]who are both into scary movies 0:02:01.683,0:02:05.004 are probably a better match[br]than one person who is and one who isn't. 0:02:05.028,0:02:06.521 But what about a question like, 0:02:06.545,0:02:08.607 "Do you like to be[br]the center of attention?" 0:02:08.631,0:02:11.259 If both people in a relationship[br]are saying yes to this, 0:02:11.283,0:02:13.376 they're going to have massive problems. 0:02:13.400,0:02:14.645 We realized this early on, 0:02:14.669,0:02:17.938 and so we decided we needed[br]a bit more data from each question. 0:02:17.962,0:02:20.725 We had to ask people to specify[br]not only their own answer, 0:02:20.749,0:02:23.014 but the answer they wanted[br]from someone else. 0:02:23.038,0:02:24.539 That worked really well. 0:02:24.563,0:02:26.167 But we needed one more dimension. 0:02:26.191,0:02:28.834 Some questions tell you more[br]about a person than others. 0:02:28.858,0:02:32.253 For example, a question[br]about politics, something like, 0:02:32.277,0:02:34.565 "Which is worse:[br]book burning or flag burning?" 0:02:34.589,0:02:37.399 might reveal more about someone[br]than their taste in movies. 0:02:37.423,0:02:40.042 And it doesn't make sense[br]to weigh all things equally, 0:02:40.066,0:02:41.662 so we added one final data point. 0:02:41.686,0:02:43.710 For everything that OkCupid asks you, 0:02:43.734,0:02:46.563 you have a chance to tell us[br]the role it plays in your life. 0:02:46.587,0:02:48.906 And this ranges[br]from irrelevant to mandatory. 0:02:49.446,0:02:52.668 So now, for every question,[br]we have three things for our algorithm: 0:02:52.692,0:02:54.044 first, your answer; 0:02:54.617,0:02:58.757 second, how you want someone else --[br]your potential match -- to answer; 0:02:58.781,0:03:01.569 and third, how important[br]the question is to you at all. 0:03:02.710,0:03:03.962 With all this information, 0:03:03.986,0:03:07.104 OkCupid can figure out[br]how well two people will get along. 0:03:07.128,0:03:10.134 The algorithm crunches the numbers[br]and gives us a result. 0:03:10.158,0:03:11.310 As a practical example, 0:03:11.334,0:03:13.859 let's look at how we'd match you[br]with another person. 0:03:13.883,0:03:15.072 Let's call him "B." 0:03:16.023,0:03:19.505 Your match percentage with B is based[br]on questions you've both answered. 0:03:19.529,0:03:21.954 Let's call that set[br]of common questions "s." 0:03:22.559,0:03:24.908 As a very simple example,[br]we use a small set "s" 0:03:24.932,0:03:26.573 with just two questions in common, 0:03:26.597,0:03:28.425 and compute a match from that. 0:03:28.449,0:03:30.120 Here are our two example questions. 0:03:30.144,0:03:32.525 The first one, let's say, is,[br]"How messy are you?" 0:03:32.549,0:03:34.645 And the answer possibilities are: 0:03:34.669,0:03:38.030 very messy, average and very organized. 0:03:38.054,0:03:40.114 And let's say you answered[br]"very organized," 0:03:40.138,0:03:42.898 and you'd like someone else[br]to answer "very organized," 0:03:42.922,0:03:45.178 and the question is very important to you. 0:03:45.202,0:03:46.694 Basically, you're a neat freak. 0:03:46.718,0:03:49.586 You're neat, you want someone else[br]to be neat, and that's it. 0:03:49.610,0:03:51.625 And let's say B is a little bit different. 0:03:51.649,0:03:53.688 He answered "very organized" for himself, 0:03:53.712,0:03:56.719 but "average" is OK with him[br]as an answer from someone else, 0:03:56.743,0:03:59.145 and the question is only[br]a little important to him. 0:03:59.169,0:04:02.062 Let's look at the second question,[br]from our previous example: 0:04:02.086,0:04:04.142 "Do you like to be[br]the center of attention?" 0:04:04.166,0:04:05.680 The answers are "yes" and "no." 0:04:05.704,0:04:08.699 You've answered "no," you want[br]someone else to answer "no," 0:04:08.723,0:04:11.114 and the question is only[br]a little important to you. 0:04:11.138,0:04:12.759 Now B, he's answered "yes." 0:04:12.783,0:04:14.559 He wants someone else to answer "no," 0:04:14.583,0:04:16.857 because he wants the spotlight on him, 0:04:16.881,0:04:19.311 and the question is somewhat[br]important to him. 0:04:19.335,0:04:21.334 So, let's try to compute all of this. 0:04:21.972,0:04:24.475 Our first step is, since we use[br]computers to do this, 0:04:24.499,0:04:26.366 we need to assign numerical values 0:04:26.390,0:04:29.017 to ideas like "somewhat[br]important" and "very important," 0:04:29.041,0:04:31.252 because computers need[br]everything in numbers. 0:04:31.276,0:04:33.679 We at OkCupid decided[br]on the following scale: 0:04:33.703,0:04:35.649 "Irrelevant" is worth 0. 0:04:36.173,0:04:38.062 "A little important" is worth 1. 0:04:38.538,0:04:40.347 "Somewhat important" is worth 10. 0:04:40.831,0:04:42.585 "Very important" is 50. 0:04:42.609,0:04:46.221 And "absolutely mandatory" is 250. 0:04:46.245,0:04:48.876 Next, the algorithm makes[br]two simple calculations. 0:04:48.900,0:04:52.146 The first is: How much did[br]B's answers satisfy you? 0:04:52.170,0:04:55.963 That is, how many possible points[br]did B score on your scale? 0:04:55.987,0:04:59.199 Well, you indicated that B's answer[br]to the first question, 0:04:59.223,0:05:00.389 about messiness, 0:05:00.413,0:05:01.763 was very important to you. 0:05:01.787,0:05:04.017 It's worth 50 points and B got that right. 0:05:04.375,0:05:06.112 The second question is worth only 1, 0:05:06.136,0:05:08.414 because you said[br]it was only a little important. 0:05:08.438,0:05:09.635 B got that wrong, 0:05:09.659,0:05:12.441 so B's answers were 50[br]out of 51 possible points. 0:05:12.465,0:05:15.073 That's 98% satisfactory. Pretty good. 0:05:15.097,0:05:19.046 The second question the algorithm[br]looks at is: How much did you satisfy B? 0:05:19.070,0:05:22.329 Well, B placed 1 point on your answer[br]to the messiness question 0:05:22.353,0:05:24.306 and 10 on your answer to the second. 0:05:24.745,0:05:28.132 Of those 11, that's 1 plus 10,[br]you earned 10 -- 0:05:28.156,0:05:30.751 you guys satisfied each other[br]on the second question. 0:05:30.775,0:05:35.017 So your answers were 10 out of 11[br]equals 91 percent satisfactory to B. 0:05:35.041,0:05:36.192 That's not bad. 0:05:36.216,0:05:38.723 The final step is to take[br]these two match percentages 0:05:38.747,0:05:40.613 and get one number for the both of you. 0:05:40.637,0:05:43.248 To do this, the algorithm[br]multiplies your scores, 0:05:43.272,0:05:44.937 then takes the nth root, 0:05:44.961,0:05:47.144 where "n" is the number of questions. 0:05:47.168,0:05:49.998 Because s, which is the number[br]of questions in this sample, 0:05:50.022,0:05:51.863 is only 2, 0:05:51.887,0:05:55.552 we have: match percentage[br]equals the square root 0:05:55.576,0:05:58.472 of 98 percent times 91 percent. 0:05:58.496,0:06:00.280 That equals 94 percent. 0:06:00.304,0:06:03.508 That 94 percent is your match[br]percentage with B. 0:06:03.532,0:06:06.775 It's a mathematical expression[br]of how happy you'd be with each other, 0:06:06.799,0:06:07.982 based on what we know. 0:06:08.006,0:06:09.792 Now, why does the algorithm multiply, 0:06:09.816,0:06:12.585 as opposed to, say, average[br]the two match scores together, 0:06:12.609,0:06:14.279 and do the square-root business? 0:06:14.303,0:06:16.832 In general, this formula[br]is called the geometric mean. 0:06:16.856,0:06:19.483 It's a great way to combine[br]values that have wide ranges 0:06:19.507,0:06:21.422 and represent very different properties. 0:06:21.446,0:06:23.859 In other words, it's perfect[br]for romantic matching. 0:06:23.883,0:06:27.130 You've got wide ranges and you've got[br]tons of different data points, 0:06:27.154,0:06:30.592 like I said, about movies, politics,[br]religion -- everything. 0:06:30.616,0:06:32.454 Intuitively, too, this makes sense. 0:06:32.478,0:06:35.253 Two people satisfying[br]each other 50 percent 0:06:35.277,0:06:39.229 should be a better match[br]than two others who satisfy 0 and 100, 0:06:39.253,0:06:41.067 because affection needs to be mutual. 0:06:41.091,0:06:43.582 After adding a little correction[br]for margin of error, 0:06:43.606,0:06:46.177 in the case where we have[br]a small number of questions, 0:06:46.201,0:06:47.518 like we do in this example, 0:06:47.542,0:06:48.714 we're good to go. 0:06:48.738,0:06:50.650 Any time OkCupid matches two people, 0:06:50.674,0:06:52.706 it goes through the steps[br]we just outlined. 0:06:52.730,0:06:54.999 First it collects data about your answers, 0:06:55.023,0:06:58.008 then it compares your choices[br]and preferences to other people's 0:06:58.032,0:06:59.999 in simple, mathematical ways. 0:07:00.023,0:07:02.946 This, the ability to take[br]real-world phenomena 0:07:02.970,0:07:05.385 and make them something[br]a microchip can understand, 0:07:05.409,0:07:08.686 is, I think, the most important skill[br]anyone can have these days. 0:07:08.710,0:07:11.133 Like you use sentences[br]to tell a story to a person, 0:07:11.157,0:07:13.641 you use algorithms[br]to tell a story to a computer. 0:07:14.349,0:07:17.382 If you learn the language,[br]you can go out and tell your stories. 0:07:17.406,0:07:19.159 I hope this will help you do that.