
Title:
Getting To Point B  Intro to Algorithms

Description:

Our next guest is Andrew Goldberg.

He's a principal researcher at Microsoft Research, Silicon Valley,

and his work focuses on creating algorithms for realworld problems.

One of the interesting problems he's studied

is how to find the shortest path in really, really large networks in microseconds,

so he's going to tell us a little bit about algorithms for doing that.

[Andrew Goldberg] [Principal Researcher, Microsoft Research] Recently I've been

working a lot on shortest path algorithms,

especially motivated by GPS navigation applications,

so basically how to get from A to B.

That's great.

When you analyze this problem, that problem itself is fairly classically studied.

What are the new wrinkles in it that come up, and why do they come up?

So, the new wrinkles came up when GPS navigation

became very, very widely used,

and also GPS maps became continent sized and detailed, digital maps.

And then you really wanted to solve the problem much faster than in linear time.

Basically when Bing Maps or Google Maps gets a request

it doesn't have the time to look at the whole map,

so linear time algorithms like your classical Dijkstra's algorithm are not good enough,

and the new wrinkle was preprocessing,

so you want to preprocess your graph to be able to answer queries

very, very quickly, sort of in polylogarithmic time if you want to put your theory hat on.

During the last 15 years or so there was a lot of research

both at our group and also in many other places,

and there have been very nice algorithms developed

which 10 years ago I wouldn't have believed that it's possible,

but basically these algorithms can answer queries in microseconds

on continentalsized networks.

Wow, and it doesn't prestore all possible pair wise

No, so just to give you an idea of scale

a continentsized network has tens of millions of nodes,

so 10 billion squared is too big even for today's huge disks.

Are there any relationships between the kinds of graphs that you get

in highways and the kinds of graphs that would come out of a social network?

We did recent variation studies under submission now,

and we studied some of the publicly available networks,

and the sublabeling algorithm I want to talk about next

has actually worked fairly well for this kind of network

like quarter networks and so on.

But for example, it doesn't work so well for small world kind of networks.

Okay, interesting.

All right, good, so if you wouldn't mind telling me a little bit about the algorithm,

I think that would be really interesting.

Let's talk about implementing just the distance oracle,

so basically given 2 points, you want to tell the distance between those 2 points.

The algorithm first preprocesses the graph,

and for each vertex it computes labels,

and let's say for simplicity the graph is undirected.

Then a label of a vertex is a set of vertices

which we call hubs and distances to the hubs from the vertex.

Each vertex has a label, and these labels must have the following property.

If you take any 2 vertices, the set of hubs has to intersect,

and the intersection mark contains a vertex on the shortest path between them.

And why it's important is that for this vertex

if you sum up the distances to the 2 hubs, which you have,

you will get the shortest path distance.

The sum of results, so all the hubs' shortest path distance is stored?

No, for each vertex you have a set of hubs.

The distance from this vertex to each hub.

[Male] Oh, I see, and they have to share a hub.

On the shortest path.

That is also on the shortest path, I see.

Right, so this is a fairly strong property, so the easiest way

to do it is you say, okay, for each vertex

all other vertices are at hubs.

[Male] Then you're guaranteed.

And then the property holds, but then your queue at a time is order of n,

and what you really want is small labels,

and it turns out that some graphs have more labels,

and the reason why this works well in road networks

is that we can compute labels

for, say, the graph of Western Europe with about 18 million vertices.

We can compute labels of size about 70.

70, 70.>>[Andrew Goldberg] Yes.

Out of how many million did you say? 16 million?

18.>>[Male] 18 million.

You only need 70.

It's very, very fast.

If you think about that if you sort these hubs by node ID

we have 2 arrays, and you just need to intersect these 2 arrays of size 70,

which you can do like a merge sort that is very good locality.

This time becomes below a microsecond.

It's very, very, fast.

But this is not a very intuitive concept to me so what are the 70

so we're talking about like individual intersections in Europe, right?

This is like Piccadilly Square or something like that

or the northeast corner of Piccadilly Square,

and you only need to know from there

the distance to 70 other places in Europe.

This is the amazing thing, and that's why people thought

that this wouldn't be practical because

if you think about it there are probably thousands,

but the surprising thing was that you only needed a much smaller number,

so if you think about this long range,

it's basically intersections of major highways,

and there are not that many of those, and sort of locally

it's intersections of state highways and more locally

it's intersections of major streets.

The bad graphs of this kind of algorithm are grids,

but fortunately there are no big grids in the maps,

and even though there are grids like in Manhattan

it's only 10 avenues wide, so it's not very big,

and there is Broadway, which breaks the symmetry,

so things are not as bad as one might think.

That's fascinating.

Of the 70 places, I could imagine that some fraction of them are

for really long distance travel, like between countries,

and some other fraction of them are for within the country

but far distance, and then another fraction of them

are for within the region of the country, and is it sort of equal buckets for each,

or for really far distance you only need a small number, but for local you need more?

What's the distribution like?

It's roughly uniform as you increase the scale exponentially.

[Male] Wow, wow.

That really depends on where in the country you are

because in that densely populated area you need more local things.

If you are in the middle of nowhere

[Male] There's only one way out.