English subtitles

← Inside OKCupid: The math of online dating - Christian Rudder

Get Embed Code
23 Languages

Showing Revision 4 created 05/05/2016 by Krystian Aparta.

  1. Hello, my name is Christian Rudder,
  2. and I was one of the founders of OkCupid.
  3. It's now one of the biggest
    dating sites in the United States.
  4. Like most everyone at the site,
    I was a math major,
  5. As you may expect, we're known
    for the analytic approach we take to love.
  6. We call it our matching algorithm.
  7. Basically, OkCupid's matching
    algorithm helps us decide
  8. whether two people should go on a date.
  9. We built our entire business around it.
  10. Now, algorithm is a fancy word,
  11. and people like to drop it
    like it's this big thing.
  12. But really, an algorithm
    is just a systematic,
  13. step-by-step way to solve a problem.
  14. It doesn't have to be fancy at all.
  15. Here in this lesson,
  16. I'm going to explain how we arrived
    at our particular algorithm,
  17. so you can see how it's done.
  18. Now, why are algorithms even important?
  19. Why does this lesson even exist?
  20. Well, notice one very significant
    phrase I used above:
  21. they are a step-by-step
    way to solve a problem,
  22. and as you probably know, computers
    excel at step-by-step processes.
  23. A computer without an algorithm
  24. is basically an expensive paperweight.
  25. And since computers are such
    a pervasive part of everyday life,
  26. algorithms are everywhere.
  27. The math behind OkCupid's matching
    algorithm is surprisingly simple.
  28. It's just some addition, multiplication,
    a little bit of square roots.
  29. The tricky part in designing it
  30. was figuring out how to take
    something mysterious,
  31. human attraction,
  32. and break it into components
    that a computer can work with.
  33. The first thing we needed
    to match people up was data,
  34. something for the algorithm to work with.
  35. The best way to get data quickly
    from people is to just ask for it.
  36. So we decided that OkCupid
    should ask users questions,
  37. stuff like, "Do you want
    to have kids one day?"
  38. "How often do you brush your teeth?"
  39. "Do you like scary movies?"
  40. And big stuff like,
    "Do you believe in God?"
  41. Now, a lot of the questions
    are good for matching like with like,
  42. that is, when both people
    answer the same way.
  43. For example, two people
    who are both into scary movies
  44. are probably a better match
    than one person who is and one who isn't.
  45. But what about a question like,
  46. "Do you like to be
    the center of attention?"
  47. If both people in a relationship
    are saying yes to this,
  48. they're going to have massive problems.
  49. We realized this early on,
  50. and so we decided we needed
    a bit more data from each question.
  51. We had to ask people to specify
    not only their own answer,
  52. but the answer they wanted
    from someone else.
  53. That worked really well.
  54. But we needed one more dimension.
  55. Some questions tell you more
    about a person than others.
  56. For example, a question
    about politics, something like,
  57. "Which is worse:
    book burning or flag burning?"
  58. might reveal more about someone
    than their taste in movies.
  59. And it doesn't make sense
    to weigh all things equally,
  60. so we added one final data point.
  61. For everything that OkCupid asks you,
  62. you have a chance to tell us
    the role it plays in your life.
  63. And this ranges
    from irrelevant to mandatory.
  64. So now, for every question,
    we have three things for our algorithm:
  65. first, your answer;
  66. second, how you want someone else --
    your potential match -- to answer;
  67. and third, how important
    the question is to you at all.
  68. With all this information,
  69. OkCupid can figure out
    how well two people will get along.
  70. The algorithm crunches the numbers
    and gives us a result.
  71. As a practical example,
  72. let's look at how we'd match you
    with another person.
  73. Let's call him "B."
  74. Your match percentage with B is based
    on questions you've both answered.
  75. Let's call that set
    of common questions "s."
  76. As a very simple example,
    we use a small set "s"
  77. with just two questions in common,
  78. and compute a match from that.
  79. Here are our two example questions.
  80. The first one, let's say, is,
    "How messy are you?"
  81. And the answer possibilities are:
  82. very messy, average and very organized.
  83. And let's say you answered
    "very organized,"
  84. and you'd like someone else
    to answer "very organized,"
  85. and the question is very important to you.
  86. Basically, you're a neat freak.
  87. You're neat, you want someone else
    to be neat, and that's it.
  88. And let's say B is a little bit different.
  89. He answered "very organized" for himself,
  90. but "average" is OK with him
    as an answer from someone else,
  91. and the question is only
    a little important to him.
  92. Let's look at the second question,
    from our previous example:
  93. "Do you like to be
    the center of attention?"
  94. The answers are "yes" and "no."
  95. You've answered "no," you want
    someone else to answer "no,"
  96. and the question is only
    a little important to you.
  97. Now B, he's answered "yes."
  98. He wants someone else to answer "no,"
  99. because he wants the spotlight on him,
  100. and the question is somewhat
    important to him.
  101. So, let's try to compute all of this.
  102. Our first step is, since we use
    computers to do this,
  103. we need to assign numerical values
  104. to ideas like "somewhat
    important" and "very important,"
  105. because computers need
    everything in numbers.
  106. We at OkCupid decided
    on the following scale:
  107. "Irrelevant" is worth 0.
  108. "A little important" is worth 1.
  109. "Somewhat important" is worth 10.
  110. "Very important" is 50.
  111. And "absolutely mandatory" is 250.
  112. Next, the algorithm makes
    two simple calculations.
  113. The first is: How much did
    B's answers satisfy you?
  114. That is, how many possible points
    did B score on your scale?
  115. Well, you indicated that B's answer
    to the first question,
  116. about messiness,
  117. was very important to you.
  118. It's worth 50 points and B got that right.
  119. The second question is worth only 1,
  120. because you said
    it was only a little important.
  121. B got that wrong,
  122. so B's answers were 50
    out of 51 possible points.
  123. That's 98% satisfactory. Pretty good.
  124. The second question the algorithm
    looks at is: How much did you satisfy B?
  125. Well, B placed 1 point on your answer
    to the messiness question
  126. and 10 on your answer to the second.
  127. Of those 11, that's 1 plus 10,
    you earned 10 --
  128. you guys satisfied each other
    on the second question.
  129. So your answers were 10 out of 11
    equals 91 percent satisfactory to B.
  130. That's not bad.
  131. The final step is to take
    these two match percentages
  132. and get one number for the both of you.
  133. To do this, the algorithm
    multiplies your scores,
  134. then takes the nth root,
  135. where "n" is the number of questions.
  136. Because s, which is the number
    of questions in this sample,
  137. is only 2,
  138. we have: match percentage
    equals the square root
  139. of 98 percent times 91 percent.
  140. That equals 94 percent.
  141. That 94 percent is your match
    percentage with B.
  142. It's a mathematical expression
    of how happy you'd be with each other,
  143. based on what we know.
  144. Now, why does the algorithm multiply,
  145. as opposed to, say, average
    the two match scores together,
  146. and do the square-root business?
  147. In general, this formula
    is called the geometric mean.
  148. It's a great way to combine
    values that have wide ranges
  149. and represent very different properties.
  150. In other words, it's perfect
    for romantic matching.
  151. You've got wide ranges and you've got
    tons of different data points,
  152. like I said, about movies, politics,
    religion -- everything.
  153. Intuitively, too, this makes sense.
  154. Two people satisfying
    each other 50 percent
  155. should be a better match
    than two others who satisfy 0 and 100,
  156. because affection needs to be mutual.
  157. After adding a little correction
    for margin of error,
  158. in the case where we have
    a small number of questions,
  159. like we do in this example,
  160. we're good to go.
  161. Any time OkCupid matches two people,
  162. it goes through the steps
    we just outlined.
  163. First it collects data about your answers,
  164. then it compares your choices
    and preferences to other people's
  165. in simple, mathematical ways.
  166. This, the ability to take
    real-world phenomena
  167. and make them something
    a microchip can understand,
  168. is, I think, the most important skill
    anyone can have these days.
  169. Like you use sentences
    to tell a story to a person,
  170. you use algorithms
    to tell a story to a computer.
  171. If you learn the language,
    you can go out and tell your stories.
  172. I hope this will help you do that.