Got a YouTube account?

New: enable viewer-created translations and captions on your YouTube channel!

English subtitles

← Getting To Point B - Intro to Algorithms

Get Embed Code
3 Languages

Showing Revision 3 created 05/24/2016 by Udacity Robot.

  1. Our next guest is Andrew Goldberg.
  2. He's a principal researcher at Microsoft Research, Silicon Valley,
  3. and his work focuses on creating algorithms for real-world problems.
  4. One of the interesting problems he's studied
  5. is how to find the shortest path in really, really large networks in microseconds,
  6. so he's going to tell us a little bit about algorithms for doing that.
  7. [Andrew Goldberg] [Principal Researcher, Microsoft Research] Recently I've been
  8. working a lot on shortest path algorithms,
  9. especially motivated by GPS navigation applications,
  10. so basically how to get from A to B.
  11. That's great.
  12. When you analyze this problem, that problem itself is fairly classically studied.
  13. What are the new wrinkles in it that come up, and why do they come up?
  14. So, the new wrinkles came up when GPS navigation
  15. became very, very widely used,
  16. and also GPS maps became continent sized and detailed, digital maps.
  17. And then you really wanted to solve the problem much faster than in linear time.
  18. Basically when Bing Maps or Google Maps gets a request
  19. it doesn't have the time to look at the whole map,
  20. so linear time algorithms like your classical Dijkstra's algorithm are not good enough,
  21. and the new wrinkle was preprocessing,
  22. so you want to preprocess your graph to be able to answer queries
  23. very, very quickly, sort of in polylogarithmic time if you want to put your theory hat on.
  24. During the last 15 years or so there was a lot of research
  25. both at our group and also in many other places,
  26. and there have been very nice algorithms developed
  27. which 10 years ago I wouldn't have believed that it's possible,
  28. but basically these algorithms can answer queries in microseconds
  29. on continental-sized networks.
  30. Wow, and it doesn't pre-store all possible pair wise--
  31. No, so just to give you an idea of scale
  32. a continent-sized network has tens of millions of nodes,
  33. so 10 billion squared is too big even for today's huge disks.
  34. Are there any relationships between the kinds of graphs that you get
  35. in highways and the kinds of graphs that would come out of a social network?
  36. We did recent variation studies under submission now,
  37. and we studied some of the publicly available networks,
  38. and the sub-labeling algorithm I want to talk about next
  39. has actually worked fairly well for this kind of network
  40. like quarter networks and so on.
  41. But for example, it doesn't work so well for small world kind of networks.
  42. Okay, interesting.
  43. All right, good, so if you wouldn't mind telling me a little bit about the algorithm,
  44. I think that would be really interesting.
  45. Let's talk about implementing just the distance oracle,
  46. so basically given 2 points, you want to tell the distance between those 2 points.
  47. The algorithm first preprocesses the graph,
  48. and for each vertex it computes labels,
  49. and let's say for simplicity the graph is undirected.
  50. Then a label of a vertex is a set of vertices
  51. which we call hubs and distances to the hubs from the vertex.
  52. Each vertex has a label, and these labels must have the following property.
  53. If you take any 2 vertices, the set of hubs has to intersect,
  54. and the intersection mark contains a vertex on the shortest path between them.
  55. And why it's important is that for this vertex
  56. if you sum up the distances to the 2 hubs, which you have,
  57. you will get the shortest path distance.
  58. The sum of results, so all the hubs' shortest path distance is stored?
  59. No, for each vertex you have a set of hubs.
  60. The distance from this vertex to each hub.
  61. [Male] Oh, I see, and they have to share a hub.
  62. On the shortest path.
  63. That is also on the shortest path, I see.
  64. Right, so this is a fairly strong property, so the easiest way
  65. to do it is you say, okay, for each vertex
  66. all other vertices are at hubs.
  67. [Male] Then you're guaranteed.
  68. And then the property holds, but then your queue at a time is order of n,
  69. and what you really want is small labels,
  70. and it turns out that some graphs have more labels,
  71. and the reason why this works well in road networks
  72. is that we can compute labels
  73. for, say, the graph of Western Europe with about 18 million vertices.
  74. We can compute labels of size about 70.
  75. 70, 7-0.>>[Andrew Goldberg] Yes.
  76. Out of how many million did you say? 16 million?
  77. 18.>>[Male] 18 million.
  78. You only need 70.
  79. It's very, very fast.
  80. If you think about that if you sort these hubs by node ID
  81. we have 2 arrays, and you just need to intersect these 2 arrays of size 70,
  82. which you can do like a merge sort that is very good locality.
  83. This time becomes below a microsecond.
  84. It's very, very, fast.
  85. But this is not a very intuitive concept to me so what are the 70--
  86. so we're talking about like individual intersections in Europe, right?
  87. This is like Piccadilly Square or something like that
  88. or the northeast corner of Piccadilly Square,
  89. and you only need to know from there
  90. the distance to 70 other places in Europe.
  91. This is the amazing thing, and that's why people thought
  92. that this wouldn't be practical because
  93. if you think about it there are probably thousands,
  94. but the surprising thing was that you only needed a much smaller number,
  95. so if you think about this long range,
  96. it's basically intersections of major highways,
  97. and there are not that many of those, and sort of locally
  98. it's intersections of state highways and more locally
  99. it's intersections of major streets.
  100. The bad graphs of this kind of algorithm are grids,
  101. but fortunately there are no big grids in the maps,
  102. and even though there are grids like in Manhattan
  103. it's only 10 avenues wide, so it's not very big,
  104. and there is Broadway, which breaks the symmetry,
  105. so things are not as bad as one might think.
  106. That's fascinating.
  107. Of the 70 places, I could imagine that some fraction of them are
  108. for really long distance travel, like between countries,
  109. and some other fraction of them are for within the country
  110. but far distance, and then another fraction of them
  111. are for within the region of the country, and is it sort of equal buckets for each,
  112. or for really far distance you only need a small number, but for local you need more?
  113. What's the distribution like?
  114. It's roughly uniform as you increase the scale exponentially.
  115. [Male] Wow, wow.
  116. That really depends on where in the country you are
  117. because in that densely populated area you need more local things.
  118. If you are in the middle of nowhere--
  119. [Male] There's only one way out.