0:00:00.000,0:00:11.000


0:00:11.000,0:00:14.000
This presentation is delivered by the Stanford Center for Professional

0:00:14.000,0:00:25.000
Development.

0:00:25.000,0:00:27.000
Okay, we got volume and everything. All

0:00:27.000,0:00:28.000
right, here we go. Lots of you think your

0:00:28.000,0:00:32.000
learning things that aren't gonna actually someday help you in your future

0:00:32.000,0:00:34.000
career. Let's say you plan to be president. You

0:00:34.000,0:00:38.000
think, ''Should I take 106b; should I not take 106b?'' Watch this interview and

0:00:38.000,0:00:39.000
you will see. [Video]

0:00:39.000,0:00:41.000
We have questions, and we ask

0:00:41.000,0:00:45.000
our candidates questions, and this one is from Larry Schwimmer. What

0:00:45.000,0:00:48.000
- you

0:00:48.000,0:00:52.000
guys think I'm kidding; it's right here.

0:00:52.000,0:00:59.000
What is the most efficient way to sort a million 32-bit integers?

0:00:59.000,0:01:01.000
Well - I'm

0:01:01.000,0:01:07.000
sorry. Maybe we should - that's not a - I think the bubble sort

0:01:07.000,0:01:09.000
would be the wrong way to go. Come on; who told him this?

0:01:09.000,0:01:14.000
There you go. So, apparently -

0:01:14.000,0:01:15.000
the question was

0:01:15.000,0:01:16.000
actually

0:01:16.000,0:01:19.000
by a former student of mine, in

0:01:19.000,0:01:22.000
fact, Larry Schwimmer. So apparently, he's been prepped well. As part of going on the campaign trail, not only do you

0:01:22.000,0:01:25.000
have to know all your policy statements, you also have to have your

0:01:25.000,0:01:29.000
N2 and login sorts kept straight in your mind, so - just so

0:01:29.000,0:01:31.000
you know. So let me give you

0:01:31.000,0:01:35.000
some examples of things that are graphs, just to get you thinking about

0:01:35.000,0:01:36.000


0:01:36.000,0:01:39.000
the data structure that we're gonna be working with and playing around with

0:01:39.000,0:01:40.000
today. Sometimes you

0:01:40.000,0:01:44.000
think of graphs of being those, like, bar graphs, right, your graphing

0:01:44.000,0:01:46.000
histograms. This is actually a different kind of graph. The computer scientist's form of

0:01:46.000,0:01:50.000
graph is just a generalized, recursive data structure

0:01:50.000,0:01:51.000
that, kind of,

0:01:51.000,0:01:54.000
starts with something that looks a little bit like a tree and extends it to

0:01:54.000,0:01:55.000
this notion that there are

0:01:55.000,0:01:57.000
any number of nodes, in

0:01:57.000,0:02:00.000
this case, represented as the cities in this airline route map,

0:02:00.000,0:02:04.000
and then there are connections between them, sometimes called edges or arcs that wire up

0:02:04.000,0:02:08.000
the connections that in the case, for example, of this graph are showing you

0:02:08.000,0:02:10.000
which cities have connecting flights. You have a direct

0:02:10.000,0:02:15.000
flight that takes you from Las Vegas to Phoenix, right, one from Phoenix to Oklahoma

0:02:15.000,0:02:15.000
City.

0:02:15.000,0:02:18.000
There is not a direct flight from Phoenix to Kansas City, but there's a path

0:02:18.000,0:02:21.000
by which you could get there by taking a sequence

0:02:21.000,0:02:25.000
of connecting flights, right, going from Phoenix, to Oklahoma City, to Kansas City is

0:02:25.000,0:02:26.000
one way to get there.

0:02:26.000,0:02:30.000
And so the entire, kind of, map of airline routes forms a graph of the city,

0:02:30.000,0:02:34.000
and there are a lot of interesting questions that having represented that data you might want to

0:02:34.000,0:02:35.000
answer about finding

0:02:35.000,0:02:36.000
flights that are

0:02:36.000,0:02:40.000
cheap, or meet your time specification, or get you where you want to go,

0:02:40.000,0:02:47.000
and that this data would be, kind of, primary in solving that problem.

0:02:47.000,0:02:48.000
Another one,

0:02:48.000,0:02:49.000
right, another

0:02:49.000,0:02:52.000
idea of something that's represented as a graph is something that would support

0:02:52.000,0:02:54.000
the notion of word ladders, those little puzzles where you're

0:02:54.000,0:02:56.000
given the word ''chord,''

0:02:56.000,0:03:00.000
and you want to change it into the word ''worm,'' and you have these rules that say, well, you

0:03:00.000,0:03:02.000
can change one letter at a time.

0:03:02.000,0:03:04.000
And so the

0:03:04.000,0:03:07.000
connections in the case of the word letter have to do with words that are

0:03:07.000,0:03:12.000
one character different from their neighbors, have direct connections or arcs

0:03:12.000,0:03:16.000
between them. Each of the nodes themselves represents a word, and then

0:03:16.000,0:03:19.000
paths between that represent word ladders. That if you can go from here

0:03:19.000,0:03:22.000
through a sequence of steps directly connecting your way, each of those stepping

0:03:22.000,0:03:26.000
stones is a word in the chain that makes a word ladder. So

0:03:26.000,0:03:29.000
things you might want to solve there is what's the shortest word ladder? How many different

0:03:29.000,0:03:31.000
word ladders - how many different ways can I go from this word to that one

0:03:31.000,0:03:36.000
could be answered by searching around in a space like this.

0:03:36.000,0:03:39.000
Any kind of prerequisite structure,

0:03:39.000,0:03:42.000
such as the one for the major that might say you need to take this class before you take

0:03:42.000,0:03:45.000
that class, and this class before that class, right,

0:03:45.000,0:03:46.000
forms a

0:03:46.000,0:03:49.000
structure that also fits into the main graph.

0:03:49.000,0:03:52.000
In this case, there's a connection to those arcs.

0:03:52.000,0:03:54.000
That the prerequisite is such that

0:03:54.000,0:03:58.000
you take 106a, and it leads into 106b, and that connection doesn't really go in

0:03:58.000,0:04:01.000
the reverse, right? You don't start in 106b and move backwards.

0:04:01.000,0:04:05.000
Whereas in the case, for example, of the word ladder, all of these arcs are what we called undirected,

0:04:05.000,0:04:09.000
right? You can traverse from wood to word and back again, right?

0:04:09.000,0:04:10.000
They're equally

0:04:10.000,0:04:12.000


0:04:12.000,0:04:15.000
visitable or neighbors, in this case. So there may be a situation where you have directed

0:04:15.000,0:04:20.000
arcs where they really imply a one-way directionality through the paths.

0:04:20.000,0:04:22.000
Another one, lots and lots of

0:04:22.000,0:04:23.000
arrows that

0:04:23.000,0:04:25.000
tell you about the lifecycle of

0:04:25.000,0:04:26.000


0:04:26.000,0:04:30.000
Stanford relationships all captured on one

0:04:30.000,0:04:31.000
slide

0:04:31.000,0:04:34.000
in, sort of, the flowchart. So flowcharts have a lot of things about directionality. Like, you're

0:04:34.000,0:04:37.000
in this state, and you can move to this state, so that's your neighboring state, and

0:04:37.000,0:04:39.000
the arcs in there connect those things.

0:04:39.000,0:04:43.000
And so, as you will see, like, you get to start over here and, like, be single and love

0:04:43.000,0:04:47.000
it, and then you can, kind of, flirt with various degrees of seriousness,

0:04:47.000,0:04:50.000
and then eventually there evolves a trombone and serenade, which, you know,

0:04:50.000,0:04:53.000
you can't go back to flirting aimlessly after the trombone and serenade; there's no

0:04:53.000,0:04:55.000
arc that direction.

0:04:55.000,0:04:57.000
When somebody gets the trombone out,

0:04:57.000,0:04:59.000
you know, you take them seriously,

0:04:59.000,0:05:02.000
or you completely blow them off. There's the [inaudible] there.

0:05:02.000,0:05:04.000
And then there's dating, and then there's

0:05:04.000,0:05:08.000
two-phase commit, which is a very important part of any computer scientist's education,

0:05:08.000,0:05:10.000
and then, potentially, other

0:05:10.000,0:05:13.000
tracks of children and what not. But it does

0:05:13.000,0:05:15.000
show you, kind of, places you can go, right?

0:05:15.000,0:05:18.000
It turns out you actually don't get to go from flirt aimlessly to have baby, so

0:05:18.000,0:05:20.000
write that down -

0:05:20.000,0:05:23.000
not allowed in this graph. And

0:05:23.000,0:05:28.000
then it turns out some things that you've already seen are actually graphs in disguise.

0:05:28.000,0:05:32.000
I didn't tell you when I gave you the maze problem and its solution, the solving and

0:05:32.000,0:05:33.000
building of a maze,

0:05:33.000,0:05:37.000
that, in fact, what you're working on though is something that is a graph.

0:05:37.000,0:05:39.000
That a perfect maze, right,

0:05:39.000,0:05:40.000
so we started with this

0:05:40.000,0:05:44.000
fully disconnected maze. If you remember how the Aldis/Broder algorithm worked, it's like

0:05:44.000,0:05:47.000
you have all these nodes that have no connections between them,

0:05:47.000,0:05:51.000
and then you went bopping around breaking down walls, which is effectively

0:05:51.000,0:05:53.000
connecting up the nodes to each other,

0:05:53.000,0:05:57.000
and we kept doing that until actually all of the nodes were connected. So, in effect, we

0:05:57.000,0:06:00.000
were building a graph out of this big grid of cells

0:06:00.000,0:06:03.000
by knocking down those walls to build the connections, and when we were done, then

0:06:03.000,0:06:07.000
we made it so that it was possible to trace paths starting from that lower corner

0:06:07.000,0:06:10.000
and working our way through the neighbors all the way over to there.

0:06:10.000,0:06:13.000
So in the case of this, if you imagine that each of these cells was broken out in this little

0:06:13.000,0:06:16.000
thing, there'd be these neighboring connections where this guy has this

0:06:16.000,0:06:19.000
neighbor but no other ones because actually it has no

0:06:19.000,0:06:22.000
other directions you can move from there, but some of them, for example like this

0:06:22.000,0:06:23.000
one,

0:06:23.000,0:06:26.000
has a full set of four neighbors you can reach

0:06:26.000,0:06:28.000
depending on which walls are intact or not.

0:06:28.000,0:06:29.000
And so the

0:06:29.000,0:06:31.000
creation of that was creating a graph out

0:06:31.000,0:06:34.000
of a set of disconnected nodes, and then solving it is searching that

0:06:34.000,0:06:38.000
graph for a path. Here's one that is a little horrifying that came from the newspaper, which is social networking, kind of, a very big [inaudible] of, like, on the net who do you know, and who do they know, and how do those connections, like, linked in maybe help you get a job. This is one that was actually based on the liaisons, let's say, of a group of high school students, right, and it turns out there was a lot of more interconnected in that graph than I would expect, or appreciate, or approve of but interesting to, kind of, look at. So

0:06:38.000,0:06:39.000
let's talk, like, concretely.

0:06:39.000,0:06:43.000
How are we gonna make stuff work on graphs? Graphs turn out to actually - by showing you that, I'm trying to

0:06:43.000,0:06:46.000
get you this idea that a lot of things are really just graphs in disguise. That a lot

0:06:46.000,0:06:50.000
of the problems you may want to solve, if you can represent it as a graph, then things

0:06:50.000,0:06:53.000
you learn about how to manipulate graphs will help you to solve that kind of

0:06:53.000,0:06:54.000
problem.

0:06:54.000,0:06:59.000
So the basic idea of a graph, it is a recursive data structure based on the node where

0:06:59.000,0:07:02.000
nodes have connections to other nodes.

0:07:02.000,0:07:04.000
Okay. So that's, kind of, like a tree,

0:07:04.000,0:07:07.000
but it actually is more freeform than a tree, right?

0:07:07.000,0:07:11.000
In a tree, right, there's that single root node that has, kind of, a special place, and

0:07:11.000,0:07:15.000
there's this restriction about the connections that there's a single path

0:07:15.000,0:07:17.000
from every other node in the tree

0:07:17.000,0:07:20.000
starting from the root that leads to it. So you never can get there by

0:07:20.000,0:07:24.000
multiple ways, and there's no cycles, and it's all connected, and things like

0:07:24.000,0:07:25.000
that.

0:07:25.000,0:07:26.000
So, in terms of a graph,

0:07:26.000,0:07:29.000
it just doesn't place those restrictions. It says there's a node.

0:07:29.000,0:07:32.000
It has any number of pointers to other nodes. It potentially could even

0:07:32.000,0:07:36.000
have zero pointers to other's nodes, so it could be on a little island of its own out

0:07:36.000,0:07:38.000
here that has no incoming or outgoing flights, right? You

0:07:38.000,0:07:40.000
have to swim if you want to get there.

0:07:40.000,0:07:42.000


0:07:42.000,0:07:45.000
And then there's also not a rule that says that you can't have

0:07:45.000,0:07:47.000
multiple ways to get to the same

0:07:47.000,0:07:50.000
node. So if you're at B, and you're interested in getting to C,

0:07:50.000,0:07:52.000
you could take the direction connection here,

0:07:52.000,0:07:54.000
but you could also take a longer path

0:07:54.000,0:07:58.000
that bops through A and hits to C, and so there's more than one way to get there,

0:07:58.000,0:07:59.000


0:07:59.000,0:08:02.000
which would not be true in a tree structure where it has that constraint,

0:08:02.000,0:08:05.000
but that adds some interesting challenges to solving problems in the graphs

0:08:05.000,0:08:08.000
because there actually are so many different ways you can

0:08:08.000,0:08:11.000
get to the same place. We have to be careful avoiding

0:08:11.000,0:08:13.000
getting into circularities where we're

0:08:13.000,0:08:16.000
doing that, and also not redundantly doing work we've already done before because

0:08:16.000,0:08:18.000
we've visited it previously.

0:08:18.000,0:08:21.000
There can be cycles where

0:08:21.000,0:08:22.000
you can

0:08:22.000,0:08:24.000
completely come around. I this one, actually,

0:08:24.000,0:08:26.000
doesn't have a cycle in it.

0:08:26.000,0:08:27.000
Nope, it doesn't,

0:08:27.000,0:08:31.000
but if I add it - I could add a link, for example, from

0:08:31.000,0:08:34.000
C back to B, and then there would be this place you could just, kind of, keep

0:08:34.000,0:08:36.000
rolling around in the B, A, C

0:08:36.000,0:08:38.000
area.

0:08:38.000,0:08:41.000
So we call each of those guys, the circles here, the nodes, right?

0:08:41.000,0:08:44.000
The connection between them arcs. We

0:08:44.000,0:08:47.000
will talk about the arcs as being directed and undirected in different situations

0:08:47.000,0:08:51.000
where we imply that there's a directionality; you can travel the arc only one-way,

0:08:51.000,0:08:54.000
or in cases where it's undirected, it means it goes both ways.

0:08:54.000,0:08:56.000
We'll talk about two nodes being connected,

0:08:56.000,0:08:58.000
meaning there is a direct connection, an arc, between them,

0:08:58.000,0:09:02.000
and if they are not directly connected, there may possibly be a path

0:09:02.000,0:09:05.000
between them. A sequence of arcs forms a path that you can follow to get

0:09:05.000,0:09:07.000
from one node to another,

0:09:07.000,0:09:11.000
and then cycles just meaning a path that revisits one of the nodes that

0:09:11.000,0:09:14.000
was previously traveled on that path. So I've got

0:09:14.000,0:09:16.000
a little terminology.

0:09:16.000,0:09:18.000
Let's talk a little bit about how we're gonna represent it.

0:09:18.000,0:09:22.000
So before we start making stuff happen on them, it's, like, well, how does this -

0:09:22.000,0:09:26.000
in C++, right, what's the best way, or what are the reasonable ways you

0:09:26.000,0:09:27.000
might represent

0:09:27.000,0:09:30.000
this kind of connected structure?

0:09:30.000,0:09:31.000
So if I have

0:09:31.000,0:09:36.000
this four-node graph, A, B, C, D over where with the connections that are shown with

0:09:36.000,0:09:38.000
direction, in this case, I have directed arcs.

0:09:38.000,0:09:41.000
That one way to think of it is that what a

0:09:41.000,0:09:45.000
graph really is is a set of nodes and a set of arcs, and by set, I mean, sort of, just the

0:09:45.000,0:09:48.000
generalization. There might be a vector of arcs; there might be a vector of

0:09:48.000,0:09:51.000
nodes, whatever, but some collection of arcs, some collection of nodes,

0:09:51.000,0:09:53.000
and that the

0:09:53.000,0:09:56.000
nodes might have information about what location they represent, or

0:09:56.000,0:09:59.000
what word, or what entity, you know, in a social network that they are,

0:09:59.000,0:10:03.000
and then the arcs would show who is directly connected to whom.

0:10:03.000,0:10:06.000
And so one way to manage that, right, is just to have two independent collections - a collection

0:10:06.000,0:10:08.000
of nodes and a

0:10:08.000,0:10:11.000
collection of arcs, and that when I need to know things about connections, I'll have to, kind of, use both

0:10:11.000,0:10:12.000
in tandem.

0:10:12.000,0:10:14.000
So if I want to know if A is connected to B,

0:10:14.000,0:10:18.000
I might actually have to just walk through the entire set of arcs trying to find

0:10:18.000,0:10:22.000
one that connects A and B as its endpoints. If I want to see if

0:10:22.000,0:10:25.000
C is connected to B, same thing, just walk down the whole set.

0:10:25.000,0:10:27.000


0:10:27.000,0:10:29.000


0:10:29.000,0:10:31.000
That's probably the simplest thing to do, right, is to just, kind of,

0:10:31.000,0:10:35.000
have them just be maintained as the two things that are used in conjunction.

0:10:35.000,0:10:37.000
More likely, what you're gonna want to do

0:10:37.000,0:10:39.000
is provide some access that

0:10:39.000,0:10:42.000
when at a current node, you're currently trying to do some processing

0:10:42.000,0:10:47.000
from the Node B, it would be convenient if you could easily access those nodes

0:10:47.000,0:10:51.000
that emanate or outgo from the B node. That's typically the thing you're trying

0:10:51.000,0:10:52.000
to do is

0:10:52.000,0:10:55.000
at B try to visit its neighbors,

0:10:55.000,0:10:58.000
and rather that having to, kind of, pluck them out of the entire collection,

0:10:58.000,0:11:01.000
it might be handy if they were already, kind of, stored in such a way to make it

0:11:01.000,0:11:03.000
easy for you to get to that.

0:11:03.000,0:11:06.000
So the two ways that try to represent adjacency

0:11:06.000,0:11:10.000
in a more efficient way are the adjacency list and the adjacency matrix.

0:11:10.000,0:11:14.000
So in an adjacency list representation, we have a way of associating with each node,

0:11:14.000,0:11:17.000
probably just by storing it in the node structure itself,

0:11:17.000,0:11:21.000
those arcs that outgo from there. So, in this case, A has as direct

0:11:21.000,0:11:22.000
connection to the B

0:11:22.000,0:11:23.000
and C Nodes,

0:11:23.000,0:11:26.000
and so I would have that information stored with the A Node, which is my

0:11:26.000,0:11:29.000
vector or set of outgoing connections.

0:11:29.000,0:11:33.000
Similarly, B has one outgoing connection, which goes to D. C has an outgoing connection to D, and

0:11:33.000,0:11:37.000
then D has outgoing connections that lead back to A and

0:11:37.000,0:11:40.000
to B. So at a particular node I could say, well, who's my neighbors?

0:11:40.000,0:11:41.000
It would be

0:11:41.000,0:11:46.000
easily accessible and not have to be retrieved, or searched, or filtered.

0:11:46.000,0:11:48.000
Another way of doing that

0:11:48.000,0:11:51.000
is to realize that, in a sense, what we have is this, kind of,

0:11:51.000,0:11:53.000
end by end grid

0:11:53.000,0:11:56.000
where each node is potentially connected to

0:11:56.000,0:12:00.000
the other N -1 nodes in that graph. And so if I just build

0:12:00.000,0:12:03.000
a 2x2 matrix that's labeled with

0:12:03.000,0:12:06.000
all the node names on this side and all the node names across the top, that the

0:12:06.000,0:12:08.000
intersection of this says

0:12:08.000,0:12:11.000
is there an arc that leads away from A and

0:12:11.000,0:12:14.000
ends at B? Yes, there is. Is there one from A to C?

0:12:14.000,0:12:18.000
And in the places where there is not an existing connection, a self loop or

0:12:18.000,0:12:19.000


0:12:19.000,0:12:23.000
just a connection that doesn't exist, right, I have an empty slot. So maybe these would be Booleans

0:12:23.000,0:12:25.000
true and false or something, or maybe there'd be some more information here

0:12:25.000,0:12:27.000
about the distance

0:12:27.000,0:12:30.000
or other associated arc properties for that

0:12:30.000,0:12:31.000
particular connection.

0:12:31.000,0:12:34.000
In this case, it's actually very easy then to actually do the really quick

0:12:34.000,0:12:35.000
search of

0:12:35.000,0:12:38.000
I'm at A and I want to know if I can get to B, then it's really just a

0:12:38.000,0:12:41.000
matter of reaching right into the

0:12:41.000,0:12:45.000
slot in constant time in that matrix to pull it out, and

0:12:45.000,0:12:47.000
so

0:12:47.000,0:12:49.000
this one involves a lot of searching to find things. This one involves, perhaps, a

0:12:49.000,0:12:52.000
little bit of searching. If I want to know if A is connected to B, I might have to look

0:12:52.000,0:12:55.000
through all its outgoing connections, right? This one gives me that direct

0:12:55.000,0:12:56.000
access,

0:12:56.000,0:13:00.000
but the tradeoffs here has to do with where the space starts building up, right?

0:13:00.000,0:13:03.000
A full NxN matrix could be very big,

0:13:03.000,0:13:06.000
and sometimes we're allocating space as though there are a full set of

0:13:06.000,0:13:09.000
connections between all the nodes, every node connected to every other,

0:13:09.000,0:13:12.000
and in the cases where a lot of those things are false, there's a lot of

0:13:12.000,0:13:13.000
capacity

0:13:13.000,0:13:16.000
that we have set aside that we're not really tapping into.

0:13:16.000,0:13:19.000
And so in the case of a graph that we call dense,

0:13:19.000,0:13:23.000
where there's a lot of connections, it will use a lot of that capacity, and it might make sense

0:13:23.000,0:13:27.000
in a graph that's more sparse, where there's a few connections. So you have

0:13:27.000,0:13:31.000
thousands of nodes, but maybe only three or four on average are connected to one.

0:13:31.000,0:13:35.000
Then having allocated 1,000 slots of which to only fill in three

0:13:35.000,0:13:36.000
might become

0:13:36.000,0:13:40.000
rather space inefficient. You might prefer an adjacency list representation that

0:13:40.000,0:13:41.000
can get you

0:13:41.000,0:13:44.000
to the ones you're actually using versus the other,

0:13:44.000,0:13:47.000
but in terms of the adjacency matrices is very fast,

0:13:47.000,0:13:48.000
right? A lot of space thrown at this

0:13:48.000,0:13:51.000
gives you this immediate 0,1 access to know

0:13:51.000,0:13:55.000
who your neighbors are. So

0:13:55.000,0:13:59.000
we're gonna use the adjacency list, kind of, strategy here

0:13:59.000,0:14:02.000
for the code we're gonna do today, and it's, kind

0:14:02.000,0:14:07.000
of, a good compromise between the two alternatives that we've seen there.

0:14:07.000,0:14:08.000
So I have a struct node.

0:14:08.000,0:14:11.000
It has some information for the node. Given that I'm not being very specific

0:14:11.000,0:14:14.000
about what the graph is, I'm just gonna, kind of, leave that unspecified. It might be that it's

0:14:14.000,0:14:17.000
the city name. It might be that it's a word. It might be that it's a person in a

0:14:17.000,0:14:19.000
social network, you know,

0:14:19.000,0:14:23.000
some information that represents what this node is an entity,

0:14:23.000,0:14:28.000
and then there's gonna be a set of arcs or a set of nodes it's connected to,

0:14:28.000,0:14:32.000
and so I'm using vector here. I could be using set. I could be using a

0:14:32.000,0:14:35.000
raw array, or a link list, or any of these things.

0:14:35.000,0:14:38.000
I need to use some unbounded collection though because there's no guarantee that

0:14:38.000,0:14:42.000
they will be 0, or 1, or 2 the way there is in a link list or a tree where there's a

0:14:42.000,0:14:45.000
specified number of outgoing links. There can be any number of them, so I'm just leaving

0:14:45.000,0:14:46.000


0:14:46.000,0:14:47.000
a

0:14:47.000,0:14:51.000
variable size collection here to do that work for me.

0:14:51.000,0:14:52.000
The graph itself,

0:14:52.000,0:14:55.000
so I have this idea of all the nodes in the graph, right, would each get a new

0:14:55.000,0:14:58.000
node structure and then be wired up to the other nodes that they are connected

0:14:58.000,0:15:00.000
to,

0:15:00.000,0:15:02.000
but then when I'm trying to operate on that graph,

0:15:02.000,0:15:06.000
I can't just take one pointer and say here's a pointer to some node in the

0:15:06.000,0:15:09.000
graph and say this is - and from here you have accessed everything. In the way

0:15:09.000,0:15:11.000
that a tree,

0:15:11.000,0:15:13.000
the only thing we need to keep track of is the pointer to the root,

0:15:13.000,0:15:17.000
or a link list, the only thing we keep a pointer to, typically, is that frontmost

0:15:17.000,0:15:19.000
cell, and from there, we can reach everything else.

0:15:19.000,0:15:23.000
There is no special head or root cell in a graph.

0:15:23.000,0:15:27.000
A graph is this

0:15:27.000,0:15:30.000
being, kind of, a crazy collection without a lot of rules about

0:15:30.000,0:15:35.000
how they are connected. In fact, it's not even guaranteed that they totally are

0:15:35.000,0:15:36.000
connected.

0:15:36.000,0:15:38.000
That I can't

0:15:38.000,0:15:41.000
guarantee that if I had a pointer, for example, to C,

0:15:41.000,0:15:44.000
right, I may or may not be able to reach all the nodes from there.

0:15:44.000,0:15:49.000
If I had a pointer to A, right, A can get to C and B, and then also

0:15:49.000,0:15:52.000
down to D, but, for example, there could just be an E node over here that was only

0:15:52.000,0:15:55.000
connected to C or not connected to anything at all for that matter, connected

0:15:55.000,0:15:57.000
to F in their own little island.

0:15:57.000,0:16:01.000
That there is no way to identify some special node from which you could reach

0:16:01.000,0:16:02.000
all the nodes.

0:16:02.000,0:16:04.000
There isn't a special root node,

0:16:04.000,0:16:07.000
and you can't even just arbitrarily pick one to be the root because actually there's no

0:16:07.000,0:16:10.000
guarantee that it will be connected to all of the others. So it would not give you

0:16:10.000,0:16:13.000
access to the full entirety of your graph if you just picked on and said I want anything

0:16:13.000,0:16:16.000
reachable from here, it might not get you the whole thing. So,

0:16:16.000,0:16:18.000
typically, what you need to really operate on is

0:16:18.000,0:16:21.000
the entire collection of node stars. You'll have a set or a vector

0:16:21.000,0:16:24.000
that contains all of the nodes that are in the graph,

0:16:24.000,0:16:26.000
and then within that, there's a bunch of other connections that are

0:16:26.000,0:16:35.000
being made on the graph connectivity that you're also exploring.

0:16:35.000,0:16:37.000
So let's make it a little bit more concrete

0:16:37.000,0:16:39.000


0:16:39.000,0:16:40.000
and talk a little bit about

0:16:40.000,0:16:43.000
what really is a node; what really is an arc?

0:16:43.000,0:16:46.000
There may be, actually, be some more information I need to store than just a

0:16:46.000,0:16:48.000
node is connected to other nodes.

0:16:48.000,0:16:51.000
So in the case of a list or a tree,

0:16:51.000,0:16:54.000
the next field and the left and the right field are really just pointers for the

0:16:54.000,0:16:56.000
purpose of organization.

0:16:56.000,0:16:58.000
That's the pointer to the next cell.

0:16:58.000,0:17:02.000
There's no data that is really associated with that link itself.

0:17:02.000,0:17:05.000
That's not true in the case of a graph.

0:17:05.000,0:17:09.000
That often what the links that connect up the nodes

0:17:09.000,0:17:12.000
actually do have a real role to play in terms of being data.

0:17:12.000,0:17:14.000
It may be that what they tell you is

0:17:14.000,0:17:18.000
what road you're taking here, or how long this path is, or how much this

0:17:18.000,0:17:22.000
flight costs, or what time this flight leaves, or how full this flight already is,

0:17:22.000,0:17:24.000
or who knows, you know, just other information

0:17:24.000,0:17:27.000
that is more than just this nodes connected to that one. It's, like, how is it

0:17:27.000,0:17:30.000
connected; what is it connected by, and what are the details of how that

0:17:30.000,0:17:31.000
connection

0:17:31.000,0:17:32.000


0:17:32.000,0:17:34.000
is being represented?

0:17:34.000,0:17:38.000
So it's likely that what you really want is not just pointers to other nodes but

0:17:38.000,0:17:42.000
information about that actual link stored with an arc structure.

0:17:42.000,0:17:44.000
So, typically, we're gonna have

0:17:44.000,0:17:45.000
an arc

0:17:45.000,0:17:48.000
which has information about that arc, how long, how much it costs, you know,

0:17:48.000,0:17:50.000
what flight number it is,

0:17:50.000,0:17:52.000
that will have pointers to the start and end node that it is connecting.

0:17:52.000,0:17:54.000
The node

0:17:54.000,0:17:57.000
then has a collection of arcs,

0:17:57.000,0:18:00.000
and those arcs are expected to, in the adjacency list form, will all be arcs

0:18:00.000,0:18:02.000
that start at this node.

0:18:02.000,0:18:05.000
So their start node is equal to the one that's holding onto to them, and then

0:18:05.000,0:18:06.000
they have an end

0:18:06.000,0:18:10.000
pointer that points to the node at the other end of the arc. Well,

0:18:10.000,0:18:13.000
this gets us into a little C++ bind

0:18:13.000,0:18:16.000
because I've described these data structures

0:18:16.000,0:18:17.000
in a way

0:18:17.000,0:18:22.000
that they both depend on each other; that an arc has pointers to the start and the end

0:18:22.000,0:18:22.000
node.

0:18:22.000,0:18:27.000
A node has a collection of arcs, which is probably a vector or a set of arc pointers,

0:18:27.000,0:18:28.000
and so

0:18:28.000,0:18:31.000
I'm starting to define the structure for arc,

0:18:31.000,0:18:33.000
and then I want to talk about node in it,

0:18:33.000,0:18:35.000
and I think, oh, okay, well, then I better define node first because C++ always likes to

0:18:35.000,0:18:38.000
see things in order. Remember, it always wants to see A,

0:18:38.000,0:18:42.000
and then B if B uses A, and so in terms of how your functions and data

0:18:42.000,0:18:45.000
structures work, you've always gotten this idea like, well, go do the one first. Well, if I say,

0:18:45.000,0:18:48.000
okay, arc's gonna need node. I start to write arc, and I say, oh, I

0:18:48.000,0:18:50.000
need to talk about node. Well, I better put node up here,

0:18:50.000,0:18:54.000
right? But then when I'm starting to write node, I start talking about arc, and they have a

0:18:54.000,0:18:57.000
circular reference to each other, right? They both

0:18:57.000,0:19:00.000
want to depend on the other one before we've gotten around to telling the

0:19:00.000,0:19:01.000
compiler about it.

0:19:01.000,0:19:04.000
One of them has to go first, right? What are we gonna do?

0:19:04.000,0:19:08.000
What we need to do is use the C++ forward reference

0:19:08.000,0:19:09.000
as a

0:19:09.000,0:19:12.000
little hint to the compiler

0:19:12.000,0:19:13.000
that

0:19:13.000,0:19:17.000
we are gonna be defining a Node T structure in a little bit;

0:19:17.000,0:19:18.000
we're gonna get to it,

0:19:18.000,0:19:22.000
but first we're gonna define the Arc T structure that uses that Node T, and then

0:19:22.000,0:19:25.000
we're gonna get around to it. So we're gonna give it, like, a little heads up that says

0:19:25.000,0:19:29.000
there later will be an Act 2 of this play. We're gonna introduce a

0:19:29.000,0:19:31.000
character called Node T.

0:19:31.000,0:19:33.000
So you can now

0:19:33.000,0:19:34.000
start talking about Node

0:19:34.000,0:19:37.000
T in some simple ways

0:19:37.000,0:19:40.000
based on your agreement to the compiler that you plan on telling it more

0:19:40.000,0:19:41.000
about Node T later.

0:19:41.000,0:19:45.000
So the struct Node T says there will be a struct called Node T.

0:19:45.000,0:19:47.000
Okay, the compiler says, okay, I'll write that down.

0:19:47.000,0:19:51.000
You start defining struct Arc T, and it says, oh, I see you have these pointers

0:19:51.000,0:19:54.000
to the node T. Okay. Well, you told me there'd be a struct like that; I guess that'll work out.

0:19:54.000,0:19:58.000
And now you told it what struct Node T is. It's, like, oh, I have a vector

0:19:58.000,0:19:59.000
with these pointers to arc,

0:19:59.000,0:20:02.000
and then it says, okay, now I see the whole picture, and

0:20:02.000,0:20:03.000
it all worked out.

0:20:03.000,0:20:08.000
So it's just a little bit of a

0:20:08.000,0:20:11.000
requirement of how the C++ compiler likes to see stuff in order without ever having

0:20:11.000,0:20:16.000
to go backwards to check anything out again. Okay. So

0:20:16.000,0:20:19.000
I got myself some node structures, got

0:20:19.000,0:20:21.000
some arc structures; they're all

0:20:21.000,0:20:22.000
working together.

0:20:22.000,0:20:24.000
I'm gonna try to do some traversals.

0:20:24.000,0:20:26.000
Try to do some operations that

0:20:26.000,0:20:29.000
work their way around the graph.

0:20:29.000,0:20:30.000
So

0:20:30.000,0:20:33.000
tree traversals, right, the pre, and post, and in order

0:20:33.000,0:20:36.000
are ways that you start at the root, and you visit all the nodes on the tree,

0:20:36.000,0:20:40.000
a very common need to, kind of, process, to delete nodes,

0:20:40.000,0:20:43.000
to print the nodes, to whatnot. So we might want to do the same thing on graphs. I've got a

0:20:43.000,0:20:47.000
graph out there, and I'd like to go exploring it. Maybe I just want to print, and if

0:20:47.000,0:20:50.000
I'm in the Gates building, what are all the places that I can reach from

0:20:50.000,0:20:53.000
the Gates building? Where can I go to; what

0:20:53.000,0:20:54.000
options do I have?

0:20:54.000,0:20:57.000
What I'm gonna do is a traversal

0:20:57.000,0:21:01.000
starting at a node and, kind of, working my way outward to the visible, reachable

0:21:01.000,0:21:04.000
nodes that I can follow those paths to get there.

0:21:04.000,0:21:07.000
We're gonna look at two different ways to do this.

0:21:07.000,0:21:09.000
One is Depth-first and one is Breadth-first,

0:21:09.000,0:21:13.000
and I'll show you both algorithms, and they

0:21:13.000,0:21:15.000
will visit the same nodes;

0:21:15.000,0:21:19.000
when all is done and said, they will visit all the reachable nodes starting from a

0:21:19.000,0:21:20.000
position,

0:21:20.000,0:21:21.000
but they will visit them in different order.

0:21:21.000,0:21:25.000
So just like the pre/post in order tree traversals, they visit all the same

0:21:25.000,0:21:28.000
notes. It's just a matter of when they get around to doing it

0:21:28.000,0:21:29.000
is how

0:21:29.000,0:21:32.000
we distinguish depth and breadth-first traversals.

0:21:32.000,0:21:34.000
Because we have this graph structure

0:21:34.000,0:21:40.000
that has this loose connectivity and the possibility of cycles and multiple paths

0:21:40.000,0:21:41.000
to the same node,

0:21:41.000,0:21:44.000
we are gonna have to be a little bit careful at how we do our work to make

0:21:44.000,0:21:48.000
sure that we don't end up getting stuck in some infinite loop when we keep

0:21:48.000,0:21:51.000
going around the ABC cycle in a particular graph.

0:21:51.000,0:21:55.000
That we need to realize when we're revisiting something we've seen before,

0:21:55.000,0:21:56.000
and then not

0:21:56.000,0:21:57.000


0:21:57.000,0:21:59.000
trigger, kind of, an

0:21:59.000,0:22:03.000
exploration we've already done.

0:22:03.000,0:22:06.000
So let's look at depth-first first.

0:22:06.000,0:22:09.000
This is probably the simpler of the two. They're both, actually, pretty simple.

0:22:09.000,0:22:13.000
Depth-first traversal uses recursion to do its work,

0:22:13.000,0:22:18.000
and the idea is that you pick a starting node - let's say I want to start at Gates,

0:22:18.000,0:22:21.000
and that maybe what I'm trying to find is all the reachable nodes from Gates.

0:22:21.000,0:22:24.000
If I don't have a starting node, then I can just pick one arbitrarily from the

0:22:24.000,0:22:27.000
graph because there is actually no root node of special status,

0:22:27.000,0:22:30.000
and the idea is to go deep. That

0:22:30.000,0:22:33.000
in the case of, for example, a maze, it might be that I

0:22:33.000,0:22:36.000
choose to go north, and then I just keep going north. I just go north, north, north, north, north until

0:22:36.000,0:22:40.000
I run into a dead wall, and then I say, oh, I have to go east, and I go east, east, east, east, east. The idea is to

0:22:40.000,0:22:42.000
just go as far away from

0:22:42.000,0:22:46.000
the original state as I can, just keep going outward, outward, outward, outward,

0:22:46.000,0:22:47.000
outward, outward, outward,

0:22:47.000,0:22:50.000
and eventually I'm gonna run into some dead end, and that dead end could be I

0:22:50.000,0:22:52.000
get to a node that has no outgoing arcs,

0:22:52.000,0:22:54.000
or it could be that I get to a node

0:22:54.000,0:22:57.000
that whose only arcs come back to places I've already been,

0:22:57.000,0:22:58.000
so it cycles back on itself, in

0:22:58.000,0:23:01.000
which case, that also is really a dead end.

0:23:01.000,0:23:03.000
So you go deep.

0:23:03.000,0:23:06.000
You go north, and you see as far as you can get, and

0:23:06.000,0:23:10.000
only after you've, kind of, fully explored everything reachable from starting in

0:23:10.000,0:23:11.000
the north direction

0:23:11.000,0:23:13.000
do you backtrack,

0:23:13.000,0:23:16.000
unmake that decision, and say, well, how about not north; how about east, right? And then

0:23:16.000,0:23:19.000
find everything you can get to from the east, and

0:23:19.000,0:23:22.000
after, kind of, going all through that, come back and try south, and so on. So all of the

0:23:22.000,0:23:23.000
options you have

0:23:23.000,0:23:26.000
exhaustively getting through all your

0:23:26.000,0:23:26.000
neighbors

0:23:26.000,0:23:29.000
in this, kind of, go-deep strategy. Well,

0:23:29.000,0:23:32.000
the important thing is you realize if you've got this recursion going, so what

0:23:32.000,0:23:35.000
you're actually doing, basically, is saying I'm gonna step to my neighbor to the

0:23:35.000,0:23:40.000
right, and I'm gonna explore all the things, so do a depth-first search from there

0:23:40.000,0:23:43.000
to find all the places you can reach, and that one says, well, I'm gonna step through this spot

0:23:43.000,0:23:45.000
and go to one of my neighbors.

0:23:45.000,0:23:48.000
And so we can't do that infinitely without getting into trouble. We need

0:23:48.000,0:23:53.000
to have a base case that stops that recursion, and the two cases that I've

0:23:53.000,0:23:56.000
talked about - one is just when you have no neighbors left to go.

0:23:56.000,0:23:57.000
So when you

0:23:57.000,0:23:59.000


0:23:59.000,0:24:04.000
get to a node that actually, kind of, is a dead end, or that neighbor only has visited

0:24:04.000,0:24:05.000
neighbors

0:24:05.000,0:24:09.000
that surround it. So a

0:24:09.000,0:24:10.000
very simple little piece of

0:24:10.000,0:24:13.000
code that depends on recursion doing its job.

0:24:13.000,0:24:17.000
In this case, we need to know which nodes we have visited, right? We're gonna have to

0:24:17.000,0:24:19.000
have some kind of strategy for saying I

0:24:19.000,0:24:21.000
have been here before.

0:24:21.000,0:24:24.000
In this case, I'm using just a set because that's an easy thing to do.

0:24:24.000,0:24:27.000
I make a set of node pointers, and I update it

0:24:27.000,0:24:29.000
each time I see a new node; I add it to

0:24:29.000,0:24:34.000
that set, and if I ever get to a node who is already previously visited

0:24:34.000,0:24:38.000
from that, there's no reason to do any work from there; we can immediately

0:24:38.000,0:24:41.000
stop. So

0:24:41.000,0:24:43.000
starting that step would be empty.

0:24:43.000,0:24:46.000
If the visit already contains the node that we're currently trying to visit,

0:24:46.000,0:24:50.000
then there's nothing to do. Otherwise, we go ahead and add it, and there's a little node

0:24:50.000,0:24:53.000
here that says we'll do something would occur. What am I trying to do here? Am I trying to print those nodes? Am

0:24:53.000,0:24:56.000
I trying to draw pictures on those nodes, highlight those nodes, you know,

0:24:56.000,0:25:00.000
something I'm doing with them. I don't know what it is, but this is the structure for the depth-first search

0:25:00.000,0:25:01.000
here, and

0:25:01.000,0:25:02.000
then

0:25:02.000,0:25:04.000
for all of the neighbors,

0:25:04.000,0:25:07.000
using the vector of outgoing connections,

0:25:07.000,0:25:08.000
then I run a depth-first search

0:25:08.000,0:25:11.000
on each of those neighbors

0:25:11.000,0:25:14.000
passing that visited set by reference

0:25:14.000,0:25:16.000
through the whole operation. So I will

0:25:16.000,0:25:20.000
always be adding to and modifying this one set, so I will

0:25:20.000,0:25:24.000
be able to know where I am and where I still need to go.

0:25:24.000,0:25:27.000
So if we trace this guy in action -

0:25:27.000,0:25:31.000
if I start arbitrarily from A, I say I'd like to begin with A.

0:25:31.000,0:25:35.000
Then the depth-first search is gonna pick a neighbor, and let's say I happen to just work

0:25:35.000,0:25:38.000
over my neighbors in alphabetical order.

0:25:38.000,0:25:41.000
It would say okay, well, I've picked B. Let's go to everywhere we can get to from B. So A

0:25:41.000,0:25:42.000
gets marked as visited.

0:25:42.000,0:25:44.000
I move onto B; I say, well,

0:25:44.000,0:25:48.000
B, search everywhere you can to from B, and B says okay, well, I need to pick a

0:25:48.000,0:25:50.000
neighbor, how about I pick

0:25:50.000,0:25:51.000
C? And says

0:25:51.000,0:25:53.000
and now,

0:25:53.000,0:25:56.000
having visited C, go everywhere you can get to from C. Well,

0:25:56.000,0:25:59.000
C has no neighbors, right, no outgoing connections whatsoever.

0:25:59.000,0:26:02.000
So C says, well, okay, everywhere you can get to from C is C,

0:26:02.000,0:26:04.000
you know, I'm done.

0:26:04.000,0:26:09.000
That will cause it to backtrack to this place, to B, and B says okay, I explored everywhere I can get to

0:26:09.000,0:26:10.000
from C;

0:26:10.000,0:26:13.000
what other neighbors do I have? B has no other neighbors. So, in fact, B backtracks all the way

0:26:13.000,0:26:15.000
back to A.

0:26:15.000,0:26:19.000
So now A says okay, well, I went everywhere I can get to from B,

0:26:19.000,0:26:20.000
right? Let me look at my next neighbor.

0:26:20.000,0:26:23.000
My next neighbor is C. Ah,

0:26:23.000,0:26:25.000
but C, already visited,

0:26:25.000,0:26:27.000
so there's no reason to

0:26:27.000,0:26:29.000
go crazy on that thing. So, in fact, I can

0:26:29.000,0:26:33.000
immediately see that I've visited that, so I don't have any further path to look at

0:26:33.000,0:26:33.000
there.

0:26:33.000,0:26:35.000
I'll hit D then

0:26:35.000,0:26:37.000
as the next one in sequence. So

0:26:37.000,0:26:40.000
at this point, I've done everything reachable from the B arm, everything

0:26:40.000,0:26:43.000
reachable from the C arm, and now I'm starting on the D arm. I said okay, where can I get to

0:26:43.000,0:26:44.000
from D?

0:26:44.000,0:26:46.000
Well, I can get to E. All right,

0:26:46.000,0:26:48.000
where can I get to from E?

0:26:48.000,0:26:52.000
E goes to F, so the idea is you think of it going deep. It's going as far away as it

0:26:52.000,0:26:56.000
can, as deep as possible, before it, kind of, unwinds

0:26:56.000,0:26:58.000
getting to the F that has no neighbors

0:26:58.000,0:27:01.000
and saying okay. Well, how about - where can I get to

0:27:01.000,0:27:04.000
also from E? I can get to G.

0:27:04.000,0:27:07.000
From G where can I get to? Nowhere,

0:27:07.000,0:27:09.000
so we, kind of, unwind the

0:27:09.000,0:27:12.000
D arm, and then we will eventually get back to and hit the H.

0:27:12.000,0:27:16.000
So the order they actually were hit in this case happened to be alphabetical. I

0:27:16.000,0:27:18.000
assigned them such that that would come out that way.

0:27:18.000,0:27:21.000
That's not really a property of what depth-first search does,

0:27:21.000,0:27:24.000
but think of it as like it's going as deep as possible, as far away as possible,

0:27:24.000,0:27:28.000
and so it makes a choice - it's pretty much like the recursive backtrackers that we've seen. In

0:27:28.000,0:27:31.000
fact, they make a choice, commit to it, and just move forward on it, never, kind of,

0:27:31.000,0:27:35.000
giving a backwards glance, and only if the whole process

0:27:35.000,0:27:35.000
bottoms out

0:27:35.000,0:27:39.000
does it come back and start unmaking decisions and try alternatives,

0:27:39.000,0:27:43.000
eventually exploring everything reachable from A

0:27:43.000,0:27:46.000
when all is done and said. So if

0:27:46.000,0:27:49.000
you, for example, use that on the maze,

0:27:49.000,0:27:52.000
right, the depth-first search on a maze is very much the same strategy we're talking about

0:27:52.000,0:27:53.000
here.

0:27:53.000,0:27:56.000
Instead of the way we did it was actually breadth-first search, if you

0:27:56.000,0:27:57.000
remember back to that assignment,

0:27:57.000,0:28:00.000
the depth-first search alternative is actually doing it the -

0:28:00.000,0:28:01.000
just go deep,

0:28:01.000,0:28:04.000
make a choice, go with it, run all the way until it bottoms out,

0:28:04.000,0:28:12.000
and only then back up and try some other alternatives.

0:28:12.000,0:28:15.000
The breadth-first traversal, gonna hit

0:28:15.000,0:28:16.000
the same nodes

0:28:16.000,0:28:19.000
but just gonna hit them in a different way.

0:28:19.000,0:28:21.000
If my goal were to actually hit all of them,

0:28:21.000,0:28:25.000
then either of them is gonna be a fine strategy. There may be some reason why

0:28:25.000,0:28:25.000


0:28:25.000,0:28:29.000
I'm hoping to stop the search early, and that actually might dictate why I would prefer

0:28:29.000,0:28:31.000
which ones to look at first.

0:28:31.000,0:28:35.000
The breadth-first traversal is going to explore things in terms of distance,

0:28:35.000,0:28:40.000
in this case, expressed in terms of number of hops away from the starting

0:28:40.000,0:28:40.000
node.

0:28:40.000,0:28:44.000
So it's gonna look at everything that's one hop away, and then go back and look at

0:28:44.000,0:28:46.000
everything two hops away, and then three hops away.

0:28:46.000,0:28:49.000
And so if the goal, for example, was to find a node

0:28:49.000,0:28:51.000
in a shortest path connection between it,

0:28:51.000,0:28:55.000
then breadth-first search, by looking at all the ones one hop away, if it was a one hop

0:28:55.000,0:28:56.000
path, it'll find it,

0:28:56.000,0:28:59.000
and if it's a two hop path, it'll find it on that next round, and then the three hop

0:28:59.000,0:29:00.000
path,

0:29:00.000,0:29:02.000
and there might be paths that are 10, and 20, and 100

0:29:02.000,0:29:03.000
steps long,

0:29:03.000,0:29:06.000
but if I find it earlier, I won't go looking at them. It actually might prove to

0:29:06.000,0:29:09.000
be a more efficient way to get to the shortest solution as opposed to depth-first

0:29:09.000,0:29:11.000
search which go off and try all these deep

0:29:11.000,0:29:14.000
paths that may not ever end up where I want it to be

0:29:14.000,0:29:16.000
before it eventually made its way to the short

0:29:16.000,0:29:20.000
solution I was wanting to find. So the thing

0:29:20.000,0:29:21.000
is we have the starting node.

0:29:21.000,0:29:24.000
We're gonna visit all the immediate neighbors.

0:29:24.000,0:29:27.000
So, basically, a loop over the

0:29:27.000,0:29:28.000
immediate one,

0:29:28.000,0:29:31.000
and then while we're doing that, we're gonna actually, kind of, be gathering up that

0:29:31.000,0:29:33.000
next generation.

0:29:33.000,0:29:33.000


0:29:33.000,0:29:37.000
Sometimes that's called the Frontier, sort of, advancing the frontier

0:29:37.000,0:29:39.000
of the nodes to explore next,

0:29:39.000,0:29:42.000
so all the nodes that are two hops away,

0:29:42.000,0:29:44.000
and then while we're processing the ones that are two hops away, we're gonna

0:29:44.000,0:29:47.000
be gathering the frontier

0:29:47.000,0:29:50.000
that is three hops away. And so at each stage we'll be, kind of, processing a

0:29:50.000,0:29:54.000
generation but while assembling the generation N+1

0:29:54.000,0:29:58.000
in case we need to go further to find those things.

0:29:58.000,0:30:00.000
We'll use an auxiliary data structure during

0:30:00.000,0:30:04.000
this. That management of the frontier, keeping track of the nodes that are, kind

0:30:04.000,0:30:05.000
of,

0:30:05.000,0:30:07.000
staged for the

0:30:07.000,0:30:09.000
subsequent iterations,

0:30:09.000,0:30:12.000
the queue is a perfect data structure for that. If I put the

0:30:12.000,0:30:14.000
initial neighbors into a queue,

0:30:14.000,0:30:19.000
as I dequeue them, if I put their neighbors on the back of the queue, so enqueue them

0:30:19.000,0:30:21.000
behind all the one hop neighbors, then

0:30:21.000,0:30:24.000
if I do that with code, as I'm processing Generation 1 off the front of the queue, I'll be

0:30:24.000,0:30:27.000
stacking Generation 2 in the back of the queue,

0:30:27.000,0:30:30.000
and then when all of Generation 1 is exhausted, I'll be processing Generation

0:30:30.000,0:30:31.000
2,

0:30:31.000,0:30:35.000
but, kind of, enqueuing Generation 3 behind it.

0:30:35.000,0:30:39.000
Same deal that we had with depth-first search though is we will have to be

0:30:39.000,0:30:41.000
careful about cycles will pull past that

0:30:41.000,0:30:44.000
when there's more than one possibility of how we get to something, we don't

0:30:44.000,0:30:46.000
want to, kind of, keep going around that cycle

0:30:46.000,0:30:52.000
or do a lot of redundant work visiting nodes that we've already seen. So

0:30:52.000,0:30:55.000
I have the same idea of keeping track of a set

0:30:55.000,0:30:57.000
that knows what

0:30:57.000,0:31:00.000
we've previously visited,

0:31:00.000,0:31:01.000
and the

0:31:01.000,0:31:05.000
initial node gets enqueued just, kind of, by itself, just like Generation 0 gets

0:31:05.000,0:31:07.000
put into the queue,

0:31:07.000,0:31:10.000
and then while the queue is not empty, so while there's still neighbors I haven't

0:31:10.000,0:31:11.000
yet visited,

0:31:11.000,0:31:13.000
I dequeue the frontmost one,

0:31:13.000,0:31:16.000
if it hasn't already been visited on some previous iteration, then I mark it as

0:31:16.000,0:31:17.000
visited,

0:31:17.000,0:31:22.000
and then I enqueue all of its children or neighbors at the back of the queue.

0:31:22.000,0:31:26.000
And so, as of right now, I'm processing Generation 0, that means all of Generation 1, so the

0:31:26.000,0:31:29.000
neighbors that are one hop away, get placed in the queue, and then

0:31:29.000,0:31:34.000
the next iteration coming out here will pull off a Generation 1,

0:31:34.000,0:31:37.000
and then enqueue all of these two hop neighbors.

0:31:37.000,0:31:40.000
Come back and other Generation 1's will get pulled off, enqueue all their two hop

0:31:40.000,0:31:44.000
neighbors, and so the two hop generation will, kind of, be built up behind, and then

0:31:44.000,0:31:48.000
once the last of the one hop generation gets managed,

0:31:48.000,0:31:51.000
start processing the Generation 2 and so on.

0:31:51.000,0:31:55.000
And so this will keep going while the queue has some

0:31:55.000,0:31:58.000
contents in there. So that means some neighbors

0:31:58.000,0:32:01.000
we have connections to that we haven't yet had a chance to explore, and

0:32:01.000,0:32:05.000
either we will find out that they have all been visited, so this will end when

0:32:05.000,0:32:09.000
all of the nodes that are in the queue have been visited or

0:32:09.000,0:32:14.000
there are no more neighbors to enqueue. So we've gotten to those dead ends or those cycles

0:32:14.000,0:32:25.000
that mean we've reach everything that was reachable from that start node.

0:32:25.000,0:32:28.000
And so I've traced this guy

0:32:28.000,0:32:30.000
to see it doing its work.

0:32:30.000,0:32:33.000
So if I start again with A,

0:32:33.000,0:32:36.000
and, again, assuming that I'm gonna process my nodes

0:32:36.000,0:32:40.000
in alphabetical order when I have children, so I will through and I

0:32:40.000,0:32:44.000
will enqueue, so the queue right now has just node A in it. I pull it off. I say

0:32:44.000,0:32:45.000
have I visited? No, I have not.

0:32:45.000,0:32:47.000
So then I mark it as visited,

0:32:47.000,0:32:52.000
and now for each of its neighboring nodes, B, C, D, H, I'm gonna put those into

0:32:52.000,0:32:54.000
the queue,

0:32:54.000,0:32:57.000
and so then they're loaded up, and then I come back around, I say do I still have stuff in the queue? I

0:32:57.000,0:33:01.000
do, so let's take out the first thing that was put in; it was B. Okay,

0:33:01.000,0:33:05.000
and I'll say okay, well, where can you get to from B? You can get to C, so I'm gonna put C in the queue. Now,

0:33:05.000,0:33:09.000
C is actually already in the queue, but we're gonna pull it out

0:33:09.000,0:33:12.000
earlier in terms it'll get handled by the

0:33:12.000,0:33:14.000
visiting status later.

0:33:14.000,0:33:16.000
So, okay, we'll put C, D, and E in

0:33:16.000,0:33:20.000
and then put C behind it. So right now, we'll have C, D, H, and C again.

0:33:20.000,0:33:24.000
It'll find the C that's in the front there. It'll say where can you get to from C?

0:33:24.000,0:33:24.000
Nowhere,

0:33:24.000,0:33:26.000
so no additional

0:33:26.000,0:33:28.000
Generation 2's are added by that one.

0:33:28.000,0:33:31.000
I'll look at D, and I'll say okay, well, what Generation 2's do we have from D? It says, well, I can

0:33:31.000,0:33:32.000
get to E. All right, so put E on the

0:33:32.000,0:33:34.000
back of the

0:33:34.000,0:33:36.000


0:33:36.000,0:33:39.000
queue. Look at H; where can H get to? H can get to G,

0:33:39.000,0:33:42.000
so go ahead and put G on the back of the queue.

0:33:42.000,0:33:45.000
So now, it comes back around to the things that are two hops away.

0:33:45.000,0:33:47.000
The first one it's gonna pull off is C. It's

0:33:47.000,0:33:49.000
gonna say, well, C is already visited,

0:33:49.000,0:33:53.000
so there's no need to redundantly visit that or do anything with that.

0:33:53.000,0:33:54.000
It passes over C. It

0:33:54.000,0:33:56.000
finds the E

0:33:56.000,0:33:58.000
that's behind that,

0:33:58.000,0:34:00.000
and then it finds the G that was behind the

0:34:00.000,0:34:03.000
H. The E also enqueued the F

0:34:03.000,0:34:07.000
that was four hops, and the very last one, right, output will be that F that was

0:34:07.000,0:34:09.000
only reachable by taking

0:34:09.000,0:34:12.000
three hops away from the thing. So this one's, kind of,

0:34:12.000,0:34:13.000
working radially. The

0:34:13.000,0:34:16.000
idea that I'm looking at all the things that I can get to, kind of, in one hop

0:34:16.000,0:34:19.000
are my first generation, then two hops, then three hops,

0:34:19.000,0:34:23.000
and so it's like throwing a stone in the water and watching it ripple out

0:34:23.000,0:34:26.000
that the visiting

0:34:26.000,0:34:29.000
is managed in terms of, kind of, growing length of path,

0:34:29.000,0:34:32.000
and number of hops that it took to get there.

0:34:32.000,0:34:37.000
So this is the strategy that you used for maze solving, if you remember.

0:34:37.000,0:34:37.000
That

0:34:37.000,0:34:42.000
what you were tracking was the queue of paths where the path was

0:34:42.000,0:34:44.000
AB, or ADC, or

0:34:44.000,0:34:46.000
ADE, and that they grew

0:34:46.000,0:34:47.000


0:34:47.000,0:34:50.000
step wise. That, kind of, each iteration through the

0:34:50.000,0:34:54.000
traversal there where you're saying, okay, well, I've seen all the paths that are one hop. Now I've seen all the ones of two,

0:34:54.000,0:34:56.000
now the ones three, and four, and five,

0:34:56.000,0:34:58.000
and when you keep doing that, eventually you get to the hop,

0:34:58.000,0:35:01.000
you know, 75, which leads you

0:35:01.000,0:35:05.000
all the way to the goal, and since you have looked at all the paths that were

0:35:05.000,0:35:09.000
74 and shorter, the first path that you dequeue that leads to your goal

0:35:09.000,0:35:12.000
must be the best or the shortest overall

0:35:12.000,0:35:13.000
because

0:35:13.000,0:35:15.000
of the order you processed them.

0:35:15.000,0:35:18.000
That's not true in depth-first search, right? That depth-first search might find a much

0:35:18.000,0:35:21.000
longer and more circuitous path that led to the goal

0:35:21.000,0:35:25.000
when there is a shorter one that would be eventually found if I had a graph that

0:35:25.000,0:35:28.000
looked,

0:35:28.000,0:35:36.000
you know, had this long path, let's say,

0:35:36.000,0:35:41.000
and so this is goal node, let's say, and this is the start node that

0:35:41.000,0:35:44.000
there could be a two hop path, kind of, off to this angle,

0:35:44.000,0:35:47.000
but if this was the first one explored in the depth-first, right,

0:35:47.000,0:35:50.000
it could eventually, kind of, work its way all the way around through this

0:35:50.000,0:35:52.000
eight hop path, and say, okay, well, I found it, right?

0:35:52.000,0:35:55.000
It would be the first one we found, but there's no guarantee that would be the

0:35:55.000,0:35:58.000
shortest in depth-first search, right? It would just happen to be the one that based on

0:35:58.000,0:36:01.000
its arbitrary choice of which

0:36:01.000,0:36:04.000
neighbors to pursue it happened to get to that there could be a shorter

0:36:04.000,0:36:08.000
one that would be found later in the traversal. That's not true in

0:36:08.000,0:36:12.000
the breadth-first search strategy because it will be looking at this, and then these two,

0:36:12.000,0:36:16.000
and then these two, and, kind of, working its way outward down those edges.

0:36:16.000,0:36:19.000
So as soon as we get to one that hits the goal, we

0:36:19.000,0:36:25.000
know we have found the shortest we can have. Question?

0:36:25.000,0:36:28.000
In that case,

0:36:28.000,0:36:33.000
you could return, okay, we need two

0:36:33.000,0:36:34.000
jumps to get to

0:36:34.000,0:36:35.000
the goal,

0:36:35.000,0:36:36.000
like, as far

0:36:36.000,0:36:40.000
winning the goal. Yeah. How do we remember which path led us there?

0:36:40.000,0:36:42.000
So if you were really doing that, you know, in terms of depth-first search, what

0:36:42.000,0:36:46.000
you'd probably be tracking is what's the path? Probably a stack that said here's the stack

0:36:46.000,0:36:49.000
that led to that path. Let me hold

0:36:49.000,0:36:53.000
onto this, right, and I will compare it to these alternatives. So I'll try it on Neighbor 1,

0:36:53.000,0:36:54.000
get the stack back from that; it's the best.

0:36:54.000,0:36:58.000
Try it on Neighbor 2, get that stack back, see which one is smaller and use

0:36:58.000,0:37:01.000
that as the better choice, and so, eventually, when I tried it out this side, I'd get

0:37:01.000,0:37:03.000
the stack back that was just two,

0:37:03.000,0:37:06.000
and I could say, yeah, that was better than this ten step path I had over here, so I

0:37:06.000,0:37:09.000
will prefer that. So just using your standard, kind of, backtracking with

0:37:09.000,0:37:13.000
using your return values, incorporating them, and deciding which was better.

0:37:13.000,0:37:14.000
And so in this case,

0:37:14.000,0:37:16.000
the depth-first search

0:37:16.000,0:37:19.000
can't really be pruned very easily

0:37:19.000,0:37:24.000
because it has to, kind of, explore the whole thing to know that this

0:37:24.000,0:37:26.000
short little path right over here happened to be just

0:37:26.000,0:37:29.000
arbitrarily examined last, whereas, the breadth-first search actually does, kind of, because

0:37:29.000,0:37:31.000
it's prioritizing by

0:37:31.000,0:37:35.000
order of length. If the goal was shortest path, it will find it sooner and be able to

0:37:35.000,0:37:35.000
stop

0:37:35.000,0:37:39.000
when it found it rather than look at these longer paths. So it won't end up -

0:37:39.000,0:37:42.000
if there were hundreds of other paths over here, breadth-first search won't actually

0:37:42.000,0:37:47.000
have to ever reach them or explore them.

0:37:47.000,0:37:52.000
So there are a lot of things that actually end up being graph search in disguise. That's why

0:37:52.000,0:37:54.000
a, kind of, a graph is a great data structure to, kind of, get to in

0:37:54.000,0:37:57.000
106b because there are lots of problems

0:37:57.000,0:38:01.000
that, in the end, if you can model them using the same

0:38:01.000,0:38:03.000
setup of nodes, and arcs, and traversals,

0:38:03.000,0:38:06.000
that you can answer interesting questions about that data by applying

0:38:06.000,0:38:09.000
the same graph search techniques.

0:38:09.000,0:38:12.000
So as long as you want to know things like, well, which nodes are reachable from this node at

0:38:12.000,0:38:12.000
all?

0:38:12.000,0:38:14.000
And then

0:38:14.000,0:38:17.000
that just means, okay, I'm here and I want to get to these places; what places could

0:38:17.000,0:38:18.000
I get to? Or

0:38:18.000,0:38:21.000
knowing what courses I could take

0:38:21.000,0:38:25.000
that would lead me to a CS degree, it's, kind of, like well, looking for the ones

0:38:25.000,0:38:29.000
that lead to having all your graduation requirements

0:38:29.000,0:38:32.000
satisfied, and each of those arcs could be a course you could take, right, how close did

0:38:32.000,0:38:35.000
that get you to the goal? How can I get to the

0:38:35.000,0:38:40.000
place I'd like to be, or what are the places that this course satisfies

0:38:40.000,0:38:41.000
requirements that lead to?

0:38:41.000,0:38:44.000
Knowing things about

0:38:44.000,0:38:47.000
connectivity is, kind of, useful in things like the, kind of, bacon numbers or

0:38:47.000,0:38:50.000
erdish numbers where people want to know, well, how close am I to greatness by -

0:38:50.000,0:38:54.000
have I been in a movie with somebody who was in a movie, who was in a movie with Kevin Bacon? And,

0:38:54.000,0:38:55.000
for me, it turns out

0:38:55.000,0:38:57.000
zero; I have been in no movies. But have

0:38:57.000,0:38:59.000
you written a paper with someone who then

0:38:59.000,0:39:02.000
connects through a chain to some other famous person, right? Tells you about

0:39:02.000,0:39:06.000
your, kind of, social network distance or something, or I'm linked in. You're trying to get a

0:39:06.000,0:39:09.000
new job, and you want to find somebody to introduce you to

0:39:09.000,0:39:13.000
the head of this cool company. It's, like, figuring who you know who knows them, right?

0:39:13.000,0:39:18.000
It's finding paths through a social network graph.

0:39:18.000,0:39:19.000
Finding out things like

0:39:19.000,0:39:22.000
are there paths with cycles? What's the shortest path

0:39:22.000,0:39:25.000
through the cycle, longest path through the cycle often tells you about things about

0:39:25.000,0:39:27.000
redundancies in that structure

0:39:27.000,0:39:31.000
that, for example, when you're building a network infrastructure, sometimes you do have

0:39:31.000,0:39:34.000
cycles in it because you actually want to have multiple ways to get to things in case

0:39:34.000,0:39:37.000
there's a breakdown. Like if there's a freeway accident, you want to have another way

0:39:37.000,0:39:38.000
to get around it,

0:39:38.000,0:39:41.000
but you are interested in, maybe, trying to keep your cost down is trying to find

0:39:41.000,0:39:47.000
out where are the right places to put in those redundancies, those cycles that

0:39:47.000,0:39:50.000
give you the best benefit with the least cost.

0:39:50.000,0:39:54.000
Trying to find things like a continuous path that visits all the nodes once and exactly

0:39:54.000,0:39:57.000
once, especially doing so efficiently, picking a short

0:39:57.000,0:40:00.000
overall path for that. We talked a little bit about that in terms of what's called the

0:40:00.000,0:40:02.000
traveling salesman problem.

0:40:02.000,0:40:04.000
You are trying to get from - you have ten

0:40:04.000,0:40:06.000
cities to visit on your book tour. You don't want to spend all your lifetime

0:40:06.000,0:40:08.000
on a plane. What's the

0:40:08.000,0:40:13.000
path of crisscrossing the country that gets you to all ten of those with the least

0:40:13.000,0:40:15.000
time painfully

0:40:15.000,0:40:17.000
spent on an airline? And so

0:40:17.000,0:40:20.000
those are just graph search problems. You have these nodes. You have these arcs.

0:40:20.000,0:40:21.000
How can you

0:40:21.000,0:40:23.000
traverse them and visit

0:40:23.000,0:40:25.000
and come up with

0:40:25.000,0:40:27.000
a good solution. There's other

0:40:27.000,0:40:30.000
problems I think that are kind of neat. Word letters, we used to give out an assignment on word letters. We

0:40:30.000,0:40:31.000
didn't this time, but I thought it was an

0:40:31.000,0:40:34.000
interesting problem to think about. The idea of

0:40:34.000,0:40:36.000
how it is you can transform

0:40:36.000,0:40:41.000
one letter at a time from one word to another is very much a

0:40:41.000,0:40:45.000
graph problem behind the scenes, and that largely a breadth-first

0:40:45.000,0:40:47.000
traversal is exactly what you're looking for there. How

0:40:47.000,0:40:51.000
can I transform this with the shortest number of steps is, kind of,

0:40:51.000,0:40:54.000
working radially out from the starting word.

0:40:54.000,0:40:54.000


0:40:54.000,0:40:58.000
Your boggle board turns out, really, is just a graph. If

0:40:58.000,0:41:01.000
you think of

0:41:01.000,0:41:07.000
having this sequence of letters

0:41:07.000,0:41:12.000
that you're trying to work through,

0:41:12.000,0:41:16.000
that really each of these letters can actually just be thought of as a node,

0:41:16.000,0:41:20.000
and then the connections it has are to its eight neighbors, and that finding

0:41:20.000,0:41:24.000
words in there, especially finding paths, starting from

0:41:24.000,0:41:27.000
this node that lead away, that don't revisit, so, in fact, doing breadth-first

0:41:27.000,0:41:31.000
or depth-first search. You most likely did depth-first search when you were writing the

0:41:31.000,0:41:32.000
implementation of boggle,

0:41:32.000,0:41:35.000
but that's, kind of, a deep search on I found an A. I found a P. I found a P, and, kind

0:41:35.000,0:41:40.000
of, keeps going as long as that path looks viable. So, in this case, that the arc

0:41:40.000,0:41:41.000
information we're using is

0:41:41.000,0:41:44.000
having a sub of those letters, do they form a prefix that could be leading

0:41:44.000,0:41:45.000
somewhere good,

0:41:45.000,0:41:47.000
and then the, kind of, recursion bottoming out when

0:41:47.000,0:41:52.000
we have run ourselves into a corner where we have no unvisited nodes

0:41:52.000,0:41:53.000
that neighbor us,

0:41:53.000,0:41:57.000
or only nodes that would extend to a prefix that no longer forms any

0:41:57.000,0:41:58.000
viable words

0:41:58.000,0:42:01.000
as a search problem.

0:42:01.000,0:42:04.000
So it feels like everything you've done this quarter has actually been a graph in disguise; I just didn't

0:42:04.000,0:42:08.000
tell you. The maze problem is very much a graph search problem, building a

0:42:08.000,0:42:10.000
graph and searching it.

0:42:10.000,0:42:13.000
Another one that I think is, kind of, neat to think about is, kind of, related to the idea of word letters

0:42:13.000,0:42:18.000
is that often when you mistype a word, a word

0:42:18.000,0:42:19.000
processor will suggest to you

0:42:19.000,0:42:22.000
here's a word that you might have meant instead. You type a word that it doesn't

0:42:22.000,0:42:24.000
have in its lexicon, and it says, well,

0:42:24.000,0:42:25.000
that could just be

0:42:25.000,0:42:28.000
a word I don't know, but it also could mean that you actually mistyped something, and so

0:42:28.000,0:42:31.000
what you want to look at is having neighboring nodes,

0:42:31.000,0:42:33.000
sort of, diagramming a graph

0:42:33.000,0:42:35.000
of all the words you know of

0:42:35.000,0:42:39.000
and then the, kind of, permutations, the tiny little twiddles to that word that

0:42:39.000,0:42:41.000
are neighbors to it so that

0:42:41.000,0:42:44.000
you could describe a set of suggestions

0:42:44.000,0:42:47.000
as being those things that are within a certain

0:42:47.000,0:42:50.000
hop distance from the original mistyped word. And

0:42:50.000,0:42:53.000
so you could say, yeah, things that have two letters different are probably

0:42:53.000,0:42:56.000
close enough, but things that have three or more letters different aren't actually

0:42:56.000,0:42:58.000
likely to be

0:42:58.000,0:43:00.000
the mistyped word,

0:43:00.000,0:43:04.000
and then that, kind of, leads to supporting things like wildcard searches

0:43:04.000,0:43:08.000
where you're trying to remember whether it's E-R-A-T or A-R-T-E at the

0:43:08.000,0:43:10.000
end of desperate.

0:43:10.000,0:43:14.000
Those things could be modeled as, kind of, like they search through a graph space where

0:43:14.000,0:43:16.000
you're modeling those letters, and then there's, like, a

0:43:16.000,0:43:20.000
branching where you try all of the letters coming out of D-E-S-P, seeing if any

0:43:20.000,0:43:23.000
of them lead to an R-A-T-E.

0:43:23.000,0:43:26.000
So the last thing I wanted

0:43:26.000,0:43:28.000
to just bring up, and this is the one that's going into this week's assignment. I'm just

0:43:28.000,0:43:31.000
gonna actually show it to you to, kind of, get you thinking about it because it's

0:43:31.000,0:43:35.000
actually the problem at hand that you're gonna be solving

0:43:35.000,0:43:36.000
is that

0:43:36.000,0:43:41.000
the one of where I'm interested in finding the shortest path, but I'm gonna add in

0:43:41.000,0:43:43.000
this new idea that

0:43:43.000,0:43:45.000
all hops are not created equal.

0:43:45.000,0:43:49.000
That a flight from San Francisco to Seattle is a shorter distance than the

0:43:49.000,0:43:52.000
one that goes from San Francisco to Boston,

0:43:52.000,0:43:53.000
and if I'm interested in getting

0:43:53.000,0:43:57.000
to someplace, it's not just that I would like to take a direct flight. I'd also

0:43:57.000,0:44:00.000
like to take the set of flights that involves the least, kind of, going out of my way,

0:44:00.000,0:44:03.000
and going through other cities, and other directions.

0:44:03.000,0:44:07.000
And so in this case, a breadth-first search is a little bit too simplistic.

0:44:07.000,0:44:10.000
That if I just look at all the places I can get to in one hop,

0:44:10.000,0:44:11.000


0:44:11.000,0:44:15.000
those are not really equally

0:44:15.000,0:44:16.000
weighted

0:44:16.000,0:44:19.000
in terms of my goal of minimizing my total path distance,

0:44:19.000,0:44:22.000
but if I, instead, prioritize -

0:44:22.000,0:44:25.000
prioritize in your mind, immediately conjure up images of the

0:44:25.000,0:44:26.000
priority queue,

0:44:26.000,0:44:28.000
to look at paths that are,

0:44:28.000,0:44:32.000
if I'm trying to minimize total distance, the shortest path, even if they

0:44:32.000,0:44:36.000
represent several hops, if they're still shorter - if I have three flights, and each are

0:44:36.000,0:44:37.000
100 miles,

0:44:37.000,0:44:40.000
then they represent a flight of 300 total miles,

0:44:40.000,0:44:44.000
and that's still preferred to one that actually is 2,000 miles or 7,000 miles

0:44:44.000,0:44:45.000
going to Asia

0:44:45.000,0:44:49.000
alternatively. And so if I worked on a breadth-first that's been modified by a total

0:44:49.000,0:44:50.000
distance,

0:44:50.000,0:44:51.000
that it would be a way

0:44:51.000,0:44:54.000
to work out the shortest path

0:44:54.000,0:44:55.000
in terms of total weight

0:44:55.000,0:44:58.000
from San Francisco to my destination

0:44:58.000,0:45:01.000
by a

0:45:01.000,0:45:03.000
modified breadth-first, kind of, weighted strategy,

0:45:03.000,0:45:06.000
and that is exactly what you're gonna be doing in this assignment. I'm not gonna

0:45:06.000,0:45:08.000
give away too much by, kind of,

0:45:08.000,0:45:11.000
delving too much in it, but the handout does a pretty good job of talking you through what

0:45:11.000,0:45:15.000
you're doing, but it's a, kind of, neat way to think about that's how the GPS, the MapQuest, and things like

0:45:15.000,0:45:16.000
that are working is

0:45:16.000,0:45:17.000
you're at a site;

0:45:17.000,0:45:21.000
you have a goal, and you have this information that you're trying to

0:45:21.000,0:45:24.000
guide that search, and you're gonna use heuristics about

0:45:24.000,0:45:28.000
appears to be the, kind of, optimal path to, kind of, pick that, and go with it,

0:45:28.000,0:45:31.000
and see where it leads you. So

0:45:31.000,0:45:32.000
that will be your

0:45:32.000,0:45:40.000
task for this week. What we'll talk about on Friday is caching.