Lecture 23 | Programming Abstractions (Stanford)

Edit subtitles

0:00 - 0:11
0:11 - 0:14

This presentation is delivered by the Stanford Center for Professional
0:14 - 0:25

Development.
0:25 - 0:27

Okay, we got volume and everything. All
0:27 - 0:28

right, here we go. Lots of you think your
0:28 - 0:32

learning things that aren't gonna actually someday help you in your future
0:32 - 0:34

career. Let's say you plan to be president. You
0:34 - 0:38

think, ''Should I take 106b; should I not take 106b?'' Watch this interview and
0:38 - 0:39

you will see. [Video]
0:39 - 0:41

We have questions, and we ask
0:41 - 0:45

our candidates questions, and this one is from Larry Schwimmer. What
0:45 - 0:48

- you
0:48 - 0:52

guys think I'm kidding; it's right here.
0:52 - 0:59

What is the most efficient way to sort a million 32-bit integers?
0:59 - 1:01

Well - I'm
1:01 - 1:07

sorry. Maybe we should - that's not a - I think the bubble sort
1:07 - 1:09

would be the wrong way to go. Come on; who told him this?
1:09 - 1:14

There you go. So, apparently -
1:14 - 1:15

the question was
1:15 - 1:16

actually
1:16 - 1:19

by a former student of mine, in
1:19 - 1:22

fact, Larry Schwimmer. So apparently, he's been prepped well. As part of going on the campaign trail, not only do you
1:22 - 1:25

have to know all your policy statements, you also have to have your
1:25 - 1:29

N2 and login sorts kept straight in your mind, so - just so
1:29 - 1:31

you know. So let me give you
1:31 - 1:35

some examples of things that are graphs, just to get you thinking about
1:35 - 1:36
1:36 - 1:39

the data structure that we're gonna be working with and playing around with
1:39 - 1:40

today. Sometimes you
1:40 - 1:44

think of graphs of being those, like, bar graphs, right, your graphing
1:44 - 1:46

histograms. This is actually a different kind of graph. The computer scientist's form of
1:46 - 1:50

graph is just a generalized, recursive data structure
1:50 - 1:51

that, kind of,
1:51 - 1:54

starts with something that looks a little bit like a tree and extends it to
1:54 - 1:55

this notion that there are
1:55 - 1:57

any number of nodes, in
1:57 - 2:00

this case, represented as the cities in this airline route map,
2:00 - 2:04

and then there are connections between them, sometimes called edges or arcs that wire up
2:04 - 2:08

the connections that in the case, for example, of this graph are showing you
2:08 - 2:10

which cities have connecting flights. You have a direct
2:10 - 2:15

flight that takes you from Las Vegas to Phoenix, right, one from Phoenix to Oklahoma
2:15 - 2:15

City.
2:15 - 2:18

There is not a direct flight from Phoenix to Kansas City, but there's a path
2:18 - 2:21

by which you could get there by taking a sequence
2:21 - 2:25

of connecting flights, right, going from Phoenix, to Oklahoma City, to Kansas City is
2:25 - 2:26

one way to get there.
2:26 - 2:30

And so the entire, kind of, map of airline routes forms a graph of the city,
2:30 - 2:34

and there are a lot of interesting questions that having represented that data you might want to
2:34 - 2:35

answer about finding
2:35 - 2:36

flights that are
2:36 - 2:40

cheap, or meet your time specification, or get you where you want to go,
2:40 - 2:47

and that this data would be, kind of, primary in solving that problem.
2:47 - 2:48

Another one,
2:48 - 2:49

right, another
2:49 - 2:52

idea of something that's represented as a graph is something that would support
2:52 - 2:54

the notion of word ladders, those little puzzles where you're
2:54 - 2:56

given the word ''chord,''
2:56 - 3:00

and you want to change it into the word ''worm,'' and you have these rules that say, well, you
3:00 - 3:02

can change one letter at a time.
3:02 - 3:04

And so the
3:04 - 3:07

connections in the case of the word letter have to do with words that are
3:07 - 3:12

one character different from their neighbors, have direct connections or arcs
3:12 - 3:16

between them. Each of the nodes themselves represents a word, and then
3:16 - 3:19

paths between that represent word ladders. That if you can go from here
3:19 - 3:22

through a sequence of steps directly connecting your way, each of those stepping
3:22 - 3:26

stones is a word in the chain that makes a word ladder. So
3:26 - 3:29

things you might want to solve there is what's the shortest word ladder? How many different
3:29 - 3:31

word ladders - how many different ways can I go from this word to that one
3:31 - 3:36

could be answered by searching around in a space like this.
3:36 - 3:39

Any kind of prerequisite structure,
3:39 - 3:42

such as the one for the major that might say you need to take this class before you take
3:42 - 3:45

that class, and this class before that class, right,
3:45 - 3:46

forms a
3:46 - 3:49

structure that also fits into the main graph.
3:49 - 3:52

In this case, there's a connection to those arcs.
3:52 - 3:54

That the prerequisite is such that
3:54 - 3:58

you take 106a, and it leads into 106b, and that connection doesn't really go in
3:58 - 4:01

the reverse, right? You don't start in 106b and move backwards.
4:01 - 4:05

Whereas in the case, for example, of the word ladder, all of these arcs are what we called undirected,
4:05 - 4:09

right? You can traverse from wood to word and back again, right?
4:09 - 4:10

They're equally
4:10 - 4:12
4:12 - 4:15

visitable or neighbors, in this case. So there may be a situation where you have directed
4:15 - 4:20

arcs where they really imply a one-way directionality through the paths.
4:20 - 4:22

Another one, lots and lots of
4:22 - 4:23

arrows that
4:23 - 4:25

tell you about the lifecycle of
4:25 - 4:26
4:26 - 4:30

Stanford relationships all captured on one
4:30 - 4:31

slide
4:31 - 4:34

in, sort of, the flowchart. So flowcharts have a lot of things about directionality. Like, you're
4:34 - 4:37

in this state, and you can move to this state, so that's your neighboring state, and
4:37 - 4:39

the arcs in there connect those things.
4:39 - 4:43

And so, as you will see, like, you get to start over here and, like, be single and love
4:43 - 4:47

it, and then you can, kind of, flirt with various degrees of seriousness,
4:47 - 4:50

and then eventually there evolves a trombone and serenade, which, you know,
4:50 - 4:53

you can't go back to flirting aimlessly after the trombone and serenade; there's no
4:53 - 4:55

arc that direction.
4:55 - 4:57

When somebody gets the trombone out,
4:57 - 4:59

you know, you take them seriously,
4:59 - 5:02

or you completely blow them off. There's the [inaudible] there.
5:02 - 5:04

And then there's dating, and then there's
5:04 - 5:08

two-phase commit, which is a very important part of any computer scientist's education,
5:08 - 5:10

and then, potentially, other
5:10 - 5:13

tracks of children and what not. But it does
5:13 - 5:15

show you, kind of, places you can go, right?
5:15 - 5:18

It turns out you actually don't get to go from flirt aimlessly to have baby, so
5:18 - 5:20

write that down -
5:20 - 5:23

not allowed in this graph. And
5:23 - 5:28

then it turns out some things that you've already seen are actually graphs in disguise.
5:28 - 5:32

I didn't tell you when I gave you the maze problem and its solution, the solving and
5:32 - 5:33

building of a maze,
5:33 - 5:37

that, in fact, what you're working on though is something that is a graph.
5:37 - 5:39

That a perfect maze, right,
5:39 - 5:40

so we started with this
5:40 - 5:44

fully disconnected maze. If you remember how the Aldis/Broder algorithm worked, it's like
5:44 - 5:47

you have all these nodes that have no connections between them,
5:47 - 5:51

and then you went bopping around breaking down walls, which is effectively
5:51 - 5:53

connecting up the nodes to each other,
5:53 - 5:57

and we kept doing that until actually all of the nodes were connected. So, in effect, we
5:57 - 6:00

were building a graph out of this big grid of cells
6:00 - 6:03

by knocking down those walls to build the connections, and when we were done, then
6:03 - 6:07

we made it so that it was possible to trace paths starting from that lower corner
6:07 - 6:10

and working our way through the neighbors all the way over to there.
6:10 - 6:13

So in the case of this, if you imagine that each of these cells was broken out in this little
6:13 - 6:16

thing, there'd be these neighboring connections where this guy has this
6:16 - 6:19

neighbor but no other ones because actually it has no
6:19 - 6:22

other directions you can move from there, but some of them, for example like this
6:22 - 6:23

one,
6:23 - 6:26

has a full set of four neighbors you can reach
6:26 - 6:28

depending on which walls are intact or not.
6:28 - 6:29

And so the
6:29 - 6:31

creation of that was creating a graph out
6:31 - 6:34

of a set of disconnected nodes, and then solving it is searching that
6:34 - 6:38

graph for a path. Here's one that is a little horrifying that came from the newspaper, which is social networking, kind of, a very big [inaudible] of, like, on the net who do you know, and who do they know, and how do those connections, like, linked in maybe help you get a job. This is one that was actually based on the liaisons, let's say, of a group of high school students, right, and it turns out there was a lot of more interconnected in that graph than I would expect, or appreciate, or approve of but interesting to, kind of, look at. So
6:38 - 6:39

let's talk, like, concretely.
6:39 - 6:43

How are we gonna make stuff work on graphs? Graphs turn out to actually - by showing you that, I'm trying to
6:43 - 6:46

get you this idea that a lot of things are really just graphs in disguise. That a lot
6:46 - 6:50

of the problems you may want to solve, if you can represent it as a graph, then things
6:50 - 6:53

you learn about how to manipulate graphs will help you to solve that kind of
6:53 - 6:54

problem.
6:54 - 6:59

So the basic idea of a graph, it is a recursive data structure based on the node where
6:59 - 7:02

nodes have connections to other nodes.
7:02 - 7:04

Okay. So that's, kind of, like a tree,
7:04 - 7:07

but it actually is more freeform than a tree, right?
7:07 - 7:11

In a tree, right, there's that single root node that has, kind of, a special place, and
7:11 - 7:15

there's this restriction about the connections that there's a single path
7:15 - 7:17

from every other node in the tree
7:17 - 7:20

starting from the root that leads to it. So you never can get there by
7:20 - 7:24

multiple ways, and there's no cycles, and it's all connected, and things like
7:24 - 7:25

that.
7:25 - 7:26

So, in terms of a graph,
7:26 - 7:29

it just doesn't place those restrictions. It says there's a node.
7:29 - 7:32

It has any number of pointers to other nodes. It potentially could even
7:32 - 7:36

have zero pointers to other's nodes, so it could be on a little island of its own out
7:36 - 7:38

here that has no incoming or outgoing flights, right? You
7:38 - 7:40

have to swim if you want to get there.
7:40 - 7:42
7:42 - 7:45

And then there's also not a rule that says that you can't have
7:45 - 7:47

multiple ways to get to the same
7:47 - 7:50

node. So if you're at B, and you're interested in getting to C,
7:50 - 7:52

you could take the direction connection here,
7:52 - 7:54

but you could also take a longer path
7:54 - 7:58

that bops through A and hits to C, and so there's more than one way to get there,
7:58 - 7:59
7:59 - 8:02

which would not be true in a tree structure where it has that constraint,
8:02 - 8:05

but that adds some interesting challenges to solving problems in the graphs
8:05 - 8:08

because there actually are so many different ways you can
8:08 - 8:11

get to the same place. We have to be careful avoiding
8:11 - 8:13

getting into circularities where we're
8:13 - 8:16

doing that, and also not redundantly doing work we've already done before because
8:16 - 8:18

we've visited it previously.
8:18 - 8:21

There can be cycles where
8:21 - 8:22

you can
8:22 - 8:24

completely come around. I this one, actually,
8:24 - 8:26

doesn't have a cycle in it.
8:26 - 8:27

Nope, it doesn't,
8:27 - 8:31

but if I add it - I could add a link, for example, from
8:31 - 8:34

C back to B, and then there would be this place you could just, kind of, keep
8:34 - 8:36

rolling around in the B, A, C
8:36 - 8:38

area.
8:38 - 8:41

So we call each of those guys, the circles here, the nodes, right?
8:41 - 8:44

The connection between them arcs. We
8:44 - 8:47

will talk about the arcs as being directed and undirected in different situations
8:47 - 8:51

where we imply that there's a directionality; you can travel the arc only one-way,
8:51 - 8:54

or in cases where it's undirected, it means it goes both ways.
8:54 - 8:56

We'll talk about two nodes being connected,
8:56 - 8:58

meaning there is a direct connection, an arc, between them,
8:58 - 9:02

and if they are not directly connected, there may possibly be a path
9:02 - 9:05

between them. A sequence of arcs forms a path that you can follow to get
9:05 - 9:07

from one node to another,
9:07 - 9:11

and then cycles just meaning a path that revisits one of the nodes that
9:11 - 9:14

was previously traveled on that path. So I've got
9:14 - 9:16

a little terminology.
9:16 - 9:18

Let's talk a little bit about how we're gonna represent it.
9:18 - 9:22

So before we start making stuff happen on them, it's, like, well, how does this -
9:22 - 9:26

in C++, right, what's the best way, or what are the reasonable ways you
9:26 - 9:27

might represent
9:27 - 9:30

this kind of connected structure?
9:30 - 9:31

So if I have
9:31 - 9:36

this four-node graph, A, B, C, D over where with the connections that are shown with
9:36 - 9:38

direction, in this case, I have directed arcs.
9:38 - 9:41

That one way to think of it is that what a
9:41 - 9:45

graph really is is a set of nodes and a set of arcs, and by set, I mean, sort of, just the
9:45 - 9:48

generalization. There might be a vector of arcs; there might be a vector of
9:48 - 9:51

nodes, whatever, but some collection of arcs, some collection of nodes,
9:51 - 9:53

and that the
9:53 - 9:56

nodes might have information about what location they represent, or
9:56 - 9:59

what word, or what entity, you know, in a social network that they are,
9:59 - 10:03

and then the arcs would show who is directly connected to whom.
10:03 - 10:06

And so one way to manage that, right, is just to have two independent collections - a collection
10:06 - 10:08

of nodes and a
10:08 - 10:11

collection of arcs, and that when I need to know things about connections, I'll have to, kind of, use both
10:11 - 10:12

in tandem.
10:12 - 10:14

So if I want to know if A is connected to B,
10:14 - 10:18

I might actually have to just walk through the entire set of arcs trying to find
10:18 - 10:22

one that connects A and B as its endpoints. If I want to see if
10:22 - 10:25

C is connected to B, same thing, just walk down the whole set.
10:25 - 10:27
10:27 - 10:29
10:29 - 10:31

That's probably the simplest thing to do, right, is to just, kind of,
10:31 - 10:35

have them just be maintained as the two things that are used in conjunction.
10:35 - 10:37

More likely, what you're gonna want to do
10:37 - 10:39

is provide some access that
10:39 - 10:42

when at a current node, you're currently trying to do some processing
10:42 - 10:47

from the Node B, it would be convenient if you could easily access those nodes
10:47 - 10:51

that emanate or outgo from the B node. That's typically the thing you're trying
10:51 - 10:52

to do is
10:52 - 10:55

at B try to visit its neighbors,
10:55 - 10:58

and rather that having to, kind of, pluck them out of the entire collection,
10:58 - 11:01

it might be handy if they were already, kind of, stored in such a way to make it
11:01 - 11:03

easy for you to get to that.
11:03 - 11:06

So the two ways that try to represent adjacency
11:06 - 11:10

in a more efficient way are the adjacency list and the adjacency matrix.
11:10 - 11:14

So in an adjacency list representation, we have a way of associating with each node,
11:14 - 11:17

probably just by storing it in the node structure itself,
11:17 - 11:21

those arcs that outgo from there. So, in this case, A has as direct
11:21 - 11:22

connection to the B
11:22 - 11:23

and C Nodes,
11:23 - 11:26

and so I would have that information stored with the A Node, which is my
11:26 - 11:29

vector or set of outgoing connections.
11:29 - 11:33

Similarly, B has one outgoing connection, which goes to D. C has an outgoing connection to D, and
11:33 - 11:37

then D has outgoing connections that lead back to A and
11:37 - 11:40

to B. So at a particular node I could say, well, who's my neighbors?
11:40 - 11:41

It would be
11:41 - 11:46

easily accessible and not have to be retrieved, or searched, or filtered.
11:46 - 11:48

Another way of doing that
11:48 - 11:51

is to realize that, in a sense, what we have is this, kind of,
11:51 - 11:53

end by end grid
11:53 - 11:56

where each node is potentially connected to
11:56 - 12:00

the other N -1 nodes in that graph. And so if I just build
12:00 - 12:03

a 2x2 matrix that's labeled with
12:03 - 12:06

all the node names on this side and all the node names across the top, that the
12:06 - 12:08

intersection of this says
12:08 - 12:11

is there an arc that leads away from A and
12:11 - 12:14

ends at B? Yes, there is. Is there one from A to C?
12:14 - 12:18

And in the places where there is not an existing connection, a self loop or
12:18 - 12:19
12:19 - 12:23

just a connection that doesn't exist, right, I have an empty slot. So maybe these would be Booleans
12:23 - 12:25

true and false or something, or maybe there'd be some more information here
12:25 - 12:27

about the distance
12:27 - 12:30

or other associated arc properties for that
12:30 - 12:31

particular connection.
12:31 - 12:34

In this case, it's actually very easy then to actually do the really quick
12:34 - 12:35

search of
12:35 - 12:38

I'm at A and I want to know if I can get to B, then it's really just a
12:38 - 12:41

matter of reaching right into the
12:41 - 12:45

slot in constant time in that matrix to pull it out, and
12:45 - 12:47

so
12:47 - 12:49

this one involves a lot of searching to find things. This one involves, perhaps, a
12:49 - 12:52

little bit of searching. If I want to know if A is connected to B, I might have to look
12:52 - 12:55

through all its outgoing connections, right? This one gives me that direct
12:55 - 12:56

access,
12:56 - 13:00

but the tradeoffs here has to do with where the space starts building up, right?
13:00 - 13:03

A full NxN matrix could be very big,
13:03 - 13:06

and sometimes we're allocating space as though there are a full set of
13:06 - 13:09

connections between all the nodes, every node connected to every other,
13:09 - 13:12

and in the cases where a lot of those things are false, there's a lot of
13:12 - 13:13

capacity
13:13 - 13:16

that we have set aside that we're not really tapping into.
13:16 - 13:19

And so in the case of a graph that we call dense,
13:19 - 13:23

where there's a lot of connections, it will use a lot of that capacity, and it might make sense
13:23 - 13:27

in a graph that's more sparse, where there's a few connections. So you have
13:27 - 13:31

thousands of nodes, but maybe only three or four on average are connected to one.
13:31 - 13:35

Then having allocated 1,000 slots of which to only fill in three
13:35 - 13:36

might become
13:36 - 13:40

rather space inefficient. You might prefer an adjacency list representation that
13:40 - 13:41

can get you
13:41 - 13:44

to the ones you're actually using versus the other,
13:44 - 13:47

but in terms of the adjacency matrices is very fast,
13:47 - 13:48

right? A lot of space thrown at this
13:48 - 13:51

gives you this immediate 0,1 access to know
13:51 - 13:55

who your neighbors are. So
13:55 - 13:59

we're gonna use the adjacency list, kind of, strategy here
13:59 - 14:02

for the code we're gonna do today, and it's, kind
14:02 - 14:07

of, a good compromise between the two alternatives that we've seen there.
14:07 - 14:08

So I have a struct node.
14:08 - 14:11

It has some information for the node. Given that I'm not being very specific
14:11 - 14:14

about what the graph is, I'm just gonna, kind of, leave that unspecified. It might be that it's
14:14 - 14:17

the city name. It might be that it's a word. It might be that it's a person in a
14:17 - 14:19

social network, you know,
14:19 - 14:23

some information that represents what this node is an entity,
14:23 - 14:28

and then there's gonna be a set of arcs or a set of nodes it's connected to,
14:28 - 14:32

and so I'm using vector here. I could be using set. I could be using a
14:32 - 14:35

raw array, or a link list, or any of these things.
14:35 - 14:38

I need to use some unbounded collection though because there's no guarantee that
14:38 - 14:42

they will be 0, or 1, or 2 the way there is in a link list or a tree where there's a
14:42 - 14:45

specified number of outgoing links. There can be any number of them, so I'm just leaving
14:45 - 14:46
14:46 - 14:47

a
14:47 - 14:51

variable size collection here to do that work for me.
14:51 - 14:52

The graph itself,
14:52 - 14:55

so I have this idea of all the nodes in the graph, right, would each get a new
14:55 - 14:58

node structure and then be wired up to the other nodes that they are connected
14:58 - 15:00

to,
15:00 - 15:02

but then when I'm trying to operate on that graph,
15:02 - 15:06

I can't just take one pointer and say here's a pointer to some node in the
15:06 - 15:09

graph and say this is - and from here you have accessed everything. In the way
15:09 - 15:11

that a tree,
15:11 - 15:13

the only thing we need to keep track of is the pointer to the root,
15:13 - 15:17

or a link list, the only thing we keep a pointer to, typically, is that frontmost
15:17 - 15:19

cell, and from there, we can reach everything else.
15:19 - 15:23

There is no special head or root cell in a graph.
15:23 - 15:27

A graph is this
15:27 - 15:30

being, kind of, a crazy collection without a lot of rules about
15:30 - 15:35

how they are connected. In fact, it's not even guaranteed that they totally are
15:35 - 15:36

connected.
15:36 - 15:38

That I can't
15:38 - 15:41

guarantee that if I had a pointer, for example, to C,
15:41 - 15:44

right, I may or may not be able to reach all the nodes from there.
15:44 - 15:49

If I had a pointer to A, right, A can get to C and B, and then also
15:49 - 15:52

down to D, but, for example, there could just be an E node over here that was only
15:52 - 15:55

connected to C or not connected to anything at all for that matter, connected
15:55 - 15:57

to F in their own little island.
15:57 - 16:01

That there is no way to identify some special node from which you could reach
16:01 - 16:02

all the nodes.
16:02 - 16:04

There isn't a special root node,
16:04 - 16:07

and you can't even just arbitrarily pick one to be the root because actually there's no
16:07 - 16:10

guarantee that it will be connected to all of the others. So it would not give you
16:10 - 16:13

access to the full entirety of your graph if you just picked on and said I want anything
16:13 - 16:16

reachable from here, it might not get you the whole thing. So,
16:16 - 16:18

typically, what you need to really operate on is
16:18 - 16:21

the entire collection of node stars. You'll have a set or a vector
16:21 - 16:24

that contains all of the nodes that are in the graph,
16:24 - 16:26

and then within that, there's a bunch of other connections that are
16:26 - 16:35

being made on the graph connectivity that you're also exploring.
16:35 - 16:37

So let's make it a little bit more concrete
16:37 - 16:39
16:39 - 16:40

and talk a little bit about
16:40 - 16:43

what really is a node; what really is an arc?
16:43 - 16:46

There may be, actually, be some more information I need to store than just a
16:46 - 16:48

node is connected to other nodes.
16:48 - 16:51

So in the case of a list or a tree,
16:51 - 16:54

the next field and the left and the right field are really just pointers for the
16:54 - 16:56

purpose of organization.
16:56 - 16:58

That's the pointer to the next cell.
16:58 - 17:02

There's no data that is really associated with that link itself.
17:02 - 17:05

That's not true in the case of a graph.
17:05 - 17:09

That often what the links that connect up the nodes
17:09 - 17:12

actually do have a real role to play in terms of being data.
17:12 - 17:14

It may be that what they tell you is
17:14 - 17:18

what road you're taking here, or how long this path is, or how much this
17:18 - 17:22

flight costs, or what time this flight leaves, or how full this flight already is,
17:22 - 17:24

or who knows, you know, just other information
17:24 - 17:27

that is more than just this nodes connected to that one. It's, like, how is it
17:27 - 17:30

connected; what is it connected by, and what are the details of how that
17:30 - 17:31

connection
17:31 - 17:32
17:32 - 17:34

is being represented?
17:34 - 17:38

So it's likely that what you really want is not just pointers to other nodes but
17:38 - 17:42

information about that actual link stored with an arc structure.
17:42 - 17:44

So, typically, we're gonna have
17:44 - 17:45

an arc
17:45 - 17:48

which has information about that arc, how long, how much it costs, you know,
17:48 - 17:50

what flight number it is,
17:50 - 17:52

that will have pointers to the start and end node that it is connecting.
17:52 - 17:54

The node
17:54 - 17:57

then has a collection of arcs,
17:57 - 18:00

and those arcs are expected to, in the adjacency list form, will all be arcs
18:00 - 18:02

that start at this node.
18:02 - 18:05

So their start node is equal to the one that's holding onto to them, and then
18:05 - 18:06

they have an end
18:06 - 18:10

pointer that points to the node at the other end of the arc. Well,
18:10 - 18:13

this gets us into a little C++ bind
18:13 - 18:16

because I've described these data structures
18:16 - 18:17

in a way
18:17 - 18:22

that they both depend on each other; that an arc has pointers to the start and the end
18:22 - 18:22

node.
18:22 - 18:27

A node has a collection of arcs, which is probably a vector or a set of arc pointers,
18:27 - 18:28

and so
18:28 - 18:31

I'm starting to define the structure for arc,
18:31 - 18:33

and then I want to talk about node in it,
18:33 - 18:35

and I think, oh, okay, well, then I better define node first because C++ always likes to
18:35 - 18:38

see things in order. Remember, it always wants to see A,
18:38 - 18:42

and then B if B uses A, and so in terms of how your functions and data
18:42 - 18:45

structures work, you've always gotten this idea like, well, go do the one first. Well, if I say,
18:45 - 18:48

okay, arc's gonna need node. I start to write arc, and I say, oh, I
18:48 - 18:50

need to talk about node. Well, I better put node up here,
18:50 - 18:54

right? But then when I'm starting to write node, I start talking about arc, and they have a
18:54 - 18:57

circular reference to each other, right? They both
18:57 - 19:00

want to depend on the other one before we've gotten around to telling the
19:00 - 19:01

compiler about it.
19:01 - 19:04

One of them has to go first, right? What are we gonna do?
19:04 - 19:08

What we need to do is use the C++ forward reference
19:08 - 19:09

as a
19:09 - 19:12

little hint to the compiler
19:12 - 19:13

that
19:13 - 19:17

we are gonna be defining a Node T structure in a little bit;
19:17 - 19:18

we're gonna get to it,
19:18 - 19:22

but first we're gonna define the Arc T structure that uses that Node T, and then
19:22 - 19:25

we're gonna get around to it. So we're gonna give it, like, a little heads up that says
19:25 - 19:29

there later will be an Act 2 of this play. We're gonna introduce a
19:29 - 19:31

character called Node T.
19:31 - 19:33

So you can now
19:33 - 19:34

start talking about Node
19:34 - 19:37

T in some simple ways
19:37 - 19:40

based on your agreement to the compiler that you plan on telling it more
19:40 - 19:41

about Node T later.
19:41 - 19:45

So the struct Node T says there will be a struct called Node T.
19:45 - 19:47

Okay, the compiler says, okay, I'll write that down.
19:47 - 19:51

You start defining struct Arc T, and it says, oh, I see you have these pointers
19:51 - 19:54

to the node T. Okay. Well, you told me there'd be a struct like that; I guess that'll work out.
19:54 - 19:58

And now you told it what struct Node T is. It's, like, oh, I have a vector
19:58 - 19:59

with these pointers to arc,
19:59 - 20:02

and then it says, okay, now I see the whole picture, and
20:02 - 20:03

it all worked out.
20:03 - 20:08

So it's just a little bit of a
20:08 - 20:11

requirement of how the C++ compiler likes to see stuff in order without ever having
20:11 - 20:16

to go backwards to check anything out again. Okay. So
20:16 - 20:19

I got myself some node structures, got
20:19 - 20:21

some arc structures; they're all
20:21 - 20:22

working together.
20:22 - 20:24

I'm gonna try to do some traversals.
20:24 - 20:26

Try to do some operations that
20:26 - 20:29

work their way around the graph.
20:29 - 20:30

So
20:30 - 20:33

tree traversals, right, the pre, and post, and in order
20:33 - 20:36

are ways that you start at the root, and you visit all the nodes on the tree,
20:36 - 20:40

a very common need to, kind of, process, to delete nodes,
20:40 - 20:43

to print the nodes, to whatnot. So we might want to do the same thing on graphs. I've got a
20:43 - 20:47

graph out there, and I'd like to go exploring it. Maybe I just want to print, and if
20:47 - 20:50

I'm in the Gates building, what are all the places that I can reach from
20:50 - 20:53

the Gates building? Where can I go to; what
20:53 - 20:54

options do I have?
20:54 - 20:57

What I'm gonna do is a traversal
20:57 - 21:01

starting at a node and, kind of, working my way outward to the visible, reachable
21:01 - 21:04

nodes that I can follow those paths to get there.
21:04 - 21:07

We're gonna look at two different ways to do this.
21:07 - 21:09

One is Depth-first and one is Breadth-first,
21:09 - 21:13

and I'll show you both algorithms, and they
21:13 - 21:15

will visit the same nodes;
21:15 - 21:19

when all is done and said, they will visit all the reachable nodes starting from a
21:19 - 21:20

position,
21:20 - 21:21

but they will visit them in different order.
21:21 - 21:25

So just like the pre/post in order tree traversals, they visit all the same
21:25 - 21:28

notes. It's just a matter of when they get around to doing it
21:28 - 21:29

is how
21:29 - 21:32

we distinguish depth and breadth-first traversals.
21:32 - 21:34

Because we have this graph structure
21:34 - 21:40

that has this loose connectivity and the possibility of cycles and multiple paths
21:40 - 21:41

to the same node,
21:41 - 21:44

we are gonna have to be a little bit careful at how we do our work to make
21:44 - 21:48

sure that we don't end up getting stuck in some infinite loop when we keep
21:48 - 21:51

going around the ABC cycle in a particular graph.
21:51 - 21:55

That we need to realize when we're revisiting something we've seen before,
21:55 - 21:56

and then not
21:56 - 21:57
21:57 - 21:59

trigger, kind of, an
21:59 - 22:03

exploration we've already done.
22:03 - 22:06

So let's look at depth-first first.
22:06 - 22:09

This is probably the simpler of the two. They're both, actually, pretty simple.
22:09 - 22:13

Depth-first traversal uses recursion to do its work,
22:13 - 22:18

and the idea is that you pick a starting node - let's say I want to start at Gates,
22:18 - 22:21

and that maybe what I'm trying to find is all the reachable nodes from Gates.
22:21 - 22:24

If I don't have a starting node, then I can just pick one arbitrarily from the
22:24 - 22:27

graph because there is actually no root node of special status,
22:27 - 22:30

and the idea is to go deep. That
22:30 - 22:33

in the case of, for example, a maze, it might be that I
22:33 - 22:36

choose to go north, and then I just keep going north. I just go north, north, north, north, north until
22:36 - 22:40

I run into a dead wall, and then I say, oh, I have to go east, and I go east, east, east, east, east. The idea is to
22:40 - 22:42

just go as far away from
22:42 - 22:46

the original state as I can, just keep going outward, outward, outward, outward,
22:46 - 22:47

outward, outward, outward,
22:47 - 22:50

and eventually I'm gonna run into some dead end, and that dead end could be I
22:50 - 22:52

get to a node that has no outgoing arcs,
22:52 - 22:54

or it could be that I get to a node
22:54 - 22:57

that whose only arcs come back to places I've already been,
22:57 - 22:58

so it cycles back on itself, in
22:58 - 23:01

which case, that also is really a dead end.
23:01 - 23:03

So you go deep.
23:03 - 23:06

You go north, and you see as far as you can get, and
23:06 - 23:10

only after you've, kind of, fully explored everything reachable from starting in
23:10 - 23:11

the north direction
23:11 - 23:13

do you backtrack,
23:13 - 23:16

unmake that decision, and say, well, how about not north; how about east, right? And then
23:16 - 23:19

find everything you can get to from the east, and
23:19 - 23:22

after, kind of, going all through that, come back and try south, and so on. So all of the
23:22 - 23:23

options you have
23:23 - 23:26

exhaustively getting through all your
23:26 - 23:26

neighbors
23:26 - 23:29

in this, kind of, go-deep strategy. Well,
23:29 - 23:32

the important thing is you realize if you've got this recursion going, so what
23:32 - 23:35

you're actually doing, basically, is saying I'm gonna step to my neighbor to the
23:35 - 23:40

right, and I'm gonna explore all the things, so do a depth-first search from there
23:40 - 23:43

to find all the places you can reach, and that one says, well, I'm gonna step through this spot
23:43 - 23:45

and go to one of my neighbors.
23:45 - 23:48

And so we can't do that infinitely without getting into trouble. We need
23:48 - 23:53

to have a base case that stops that recursion, and the two cases that I've
23:53 - 23:56

talked about - one is just when you have no neighbors left to go.
23:56 - 23:57

So when you
23:57 - 23:59
23:59 - 24:04

get to a node that actually, kind of, is a dead end, or that neighbor only has visited
24:04 - 24:05

neighbors
24:05 - 24:09

that surround it. So a
24:09 - 24:10

very simple little piece of
24:10 - 24:13

code that depends on recursion doing its job.
24:13 - 24:17

In this case, we need to know which nodes we have visited, right? We're gonna have to
24:17 - 24:19

have some kind of strategy for saying I
24:19 - 24:21

have been here before.
24:21 - 24:24

In this case, I'm using just a set because that's an easy thing to do.
24:24 - 24:27

I make a set of node pointers, and I update it
24:27 - 24:29

each time I see a new node; I add it to
24:29 - 24:34

that set, and if I ever get to a node who is already previously visited
24:34 - 24:38

from that, there's no reason to do any work from there; we can immediately
24:38 - 24:41

stop. So
24:41 - 24:43

starting that step would be empty.
24:43 - 24:46

If the visit already contains the node that we're currently trying to visit,
24:46 - 24:50

then there's nothing to do. Otherwise, we go ahead and add it, and there's a little node
24:50 - 24:53

here that says we'll do something would occur. What am I trying to do here? Am I trying to print those nodes? Am
24:53 - 24:56

I trying to draw pictures on those nodes, highlight those nodes, you know,
24:56 - 25:00

something I'm doing with them. I don't know what it is, but this is the structure for the depth-first search
25:00 - 25:01

here, and
25:01 - 25:02

then
25:02 - 25:04

for all of the neighbors,
25:04 - 25:07

using the vector of outgoing connections,
25:07 - 25:08

then I run a depth-first search
25:08 - 25:11

on each of those neighbors
25:11 - 25:14

passing that visited set by reference
25:14 - 25:16

through the whole operation. So I will
25:16 - 25:20

always be adding to and modifying this one set, so I will
25:20 - 25:24

be able to know where I am and where I still need to go.
25:24 - 25:27

So if we trace this guy in action -
25:27 - 25:31

if I start arbitrarily from A, I say I'd like to begin with A.
25:31 - 25:35

Then the depth-first search is gonna pick a neighbor, and let's say I happen to just work
25:35 - 25:38

over my neighbors in alphabetical order.
25:38 - 25:41

It would say okay, well, I've picked B. Let's go to everywhere we can get to from B. So A
25:41 - 25:42

gets marked as visited.
25:42 - 25:44

I move onto B; I say, well,
25:44 - 25:48

B, search everywhere you can to from B, and B says okay, well, I need to pick a
25:48 - 25:50

neighbor, how about I pick
25:50 - 25:51

C? And says
25:51 - 25:53

and now,
25:53 - 25:56

having visited C, go everywhere you can get to from C. Well,
25:56 - 25:59

C has no neighbors, right, no outgoing connections whatsoever.
25:59 - 26:02

So C says, well, okay, everywhere you can get to from C is C,
26:02 - 26:04

you know, I'm done.
26:04 - 26:09

That will cause it to backtrack to this place, to B, and B says okay, I explored everywhere I can get to
26:09 - 26:10

from C;
26:10 - 26:13

what other neighbors do I have? B has no other neighbors. So, in fact, B backtracks all the way
26:13 - 26:15

back to A.
26:15 - 26:19

So now A says okay, well, I went everywhere I can get to from B,
26:19 - 26:20

right? Let me look at my next neighbor.
26:20 - 26:23

My next neighbor is C. Ah,
26:23 - 26:25

but C, already visited,
26:25 - 26:27

so there's no reason to
26:27 - 26:29

go crazy on that thing. So, in fact, I can
26:29 - 26:33

immediately see that I've visited that, so I don't have any further path to look at
26:33 - 26:33

there.
26:33 - 26:35

I'll hit D then
26:35 - 26:37

as the next one in sequence. So
26:37 - 26:40

at this point, I've done everything reachable from the B arm, everything
26:40 - 26:43

reachable from the C arm, and now I'm starting on the D arm. I said okay, where can I get to
26:43 - 26:44

from D?
26:44 - 26:46

Well, I can get to E. All right,
26:46 - 26:48

where can I get to from E?
26:48 - 26:52

E goes to F, so the idea is you think of it going deep. It's going as far away as it
26:52 - 26:56

can, as deep as possible, before it, kind of, unwinds
26:56 - 26:58

getting to the F that has no neighbors
26:58 - 27:01

and saying okay. Well, how about - where can I get to
27:01 - 27:04

also from E? I can get to G.
27:04 - 27:07

From G where can I get to? Nowhere,
27:07 - 27:09

so we, kind of, unwind the
27:09 - 27:12

D arm, and then we will eventually get back to and hit the H.
27:12 - 27:16

So the order they actually were hit in this case happened to be alphabetical. I
27:16 - 27:18

assigned them such that that would come out that way.
27:18 - 27:21

That's not really a property of what depth-first search does,
27:21 - 27:24

but think of it as like it's going as deep as possible, as far away as possible,
27:24 - 27:28

and so it makes a choice - it's pretty much like the recursive backtrackers that we've seen. In
27:28 - 27:31

fact, they make a choice, commit to it, and just move forward on it, never, kind of,
27:31 - 27:35

giving a backwards glance, and only if the whole process
27:35 - 27:35

bottoms out
27:35 - 27:39

does it come back and start unmaking decisions and try alternatives,
27:39 - 27:43

eventually exploring everything reachable from A
27:43 - 27:46

when all is done and said. So if
27:46 - 27:49

you, for example, use that on the maze,
27:49 - 27:52

right, the depth-first search on a maze is very much the same strategy we're talking about
27:52 - 27:53

here.
27:53 - 27:56

Instead of the way we did it was actually breadth-first search, if you
27:56 - 27:57

remember back to that assignment,
27:57 - 28:00

the depth-first search alternative is actually doing it the -
28:00 - 28:01

just go deep,
28:01 - 28:04

make a choice, go with it, run all the way until it bottoms out,
28:04 - 28:12

and only then back up and try some other alternatives.
28:12 - 28:15

The breadth-first traversal, gonna hit
28:15 - 28:16

the same nodes
28:16 - 28:19

but just gonna hit them in a different way.
28:19 - 28:21

If my goal were to actually hit all of them,
28:21 - 28:25

then either of them is gonna be a fine strategy. There may be some reason why
28:25 - 28:25
28:25 - 28:29

I'm hoping to stop the search early, and that actually might dictate why I would prefer
28:29 - 28:31

which ones to look at first.
28:31 - 28:35

The breadth-first traversal is going to explore things in terms of distance,
28:35 - 28:40

in this case, expressed in terms of number of hops away from the starting
28:40 - 28:40

node.
28:40 - 28:44

So it's gonna look at everything that's one hop away, and then go back and look at
28:44 - 28:46

everything two hops away, and then three hops away.
28:46 - 28:49

And so if the goal, for example, was to find a node
28:49 - 28:51

in a shortest path connection between it,
28:51 - 28:55

then breadth-first search, by looking at all the ones one hop away, if it was a one hop
28:55 - 28:56

path, it'll find it,
28:56 - 28:59

and if it's a two hop path, it'll find it on that next round, and then the three hop
28:59 - 29:00

path,
29:00 - 29:02

and there might be paths that are 10, and 20, and 100
29:02 - 29:03

steps long,
29:03 - 29:06

but if I find it earlier, I won't go looking at them. It actually might prove to
29:06 - 29:09

be a more efficient way to get to the shortest solution as opposed to depth-first
29:09 - 29:11

search which go off and try all these deep
29:11 - 29:14

paths that may not ever end up where I want it to be
29:14 - 29:16

before it eventually made its way to the short
29:16 - 29:20

solution I was wanting to find. So the thing
29:20 - 29:21

is we have the starting node.
29:21 - 29:24

We're gonna visit all the immediate neighbors.
29:24 - 29:27

So, basically, a loop over the
29:27 - 29:28

immediate one,
29:28 - 29:31

and then while we're doing that, we're gonna actually, kind of, be gathering up that
29:31 - 29:33

next generation.
29:33 - 29:33
29:33 - 29:37

Sometimes that's called the Frontier, sort of, advancing the frontier
29:37 - 29:39

of the nodes to explore next,
29:39 - 29:42

so all the nodes that are two hops away,
29:42 - 29:44

and then while we're processing the ones that are two hops away, we're gonna
29:44 - 29:47

be gathering the frontier
29:47 - 29:50

that is three hops away. And so at each stage we'll be, kind of, processing a
29:50 - 29:54

generation but while assembling the generation N+1
29:54 - 29:58

in case we need to go further to find those things.
29:58 - 30:00

We'll use an auxiliary data structure during
30:00 - 30:04

this. That management of the frontier, keeping track of the nodes that are, kind
30:04 - 30:05

of,
30:05 - 30:07

staged for the
30:07 - 30:09

subsequent iterations,
30:09 - 30:12

the queue is a perfect data structure for that. If I put the
30:12 - 30:14

initial neighbors into a queue,
30:14 - 30:19

as I dequeue them, if I put their neighbors on the back of the queue, so enqueue them
30:19 - 30:21

behind all the one hop neighbors, then
30:21 - 30:24

if I do that with code, as I'm processing Generation 1 off the front of the queue, I'll be
30:24 - 30:27

stacking Generation 2 in the back of the queue,
30:27 - 30:30

and then when all of Generation 1 is exhausted, I'll be processing Generation
30:30 - 30:31

2,
30:31 - 30:35

but, kind of, enqueuing Generation 3 behind it.
30:35 - 30:39

Same deal that we had with depth-first search though is we will have to be
30:39 - 30:41

careful about cycles will pull past that
30:41 - 30:44

when there's more than one possibility of how we get to something, we don't
30:44 - 30:46

want to, kind of, keep going around that cycle
30:46 - 30:52

or do a lot of redundant work visiting nodes that we've already seen. So
30:52 - 30:55

I have the same idea of keeping track of a set
30:55 - 30:57

that knows what
30:57 - 31:00

we've previously visited,
31:00 - 31:01

and the
31:01 - 31:05

initial node gets enqueued just, kind of, by itself, just like Generation 0 gets
31:05 - 31:07

put into the queue,
31:07 - 31:10

and then while the queue is not empty, so while there's still neighbors I haven't
31:10 - 31:11

yet visited,
31:11 - 31:13

I dequeue the frontmost one,
31:13 - 31:16

if it hasn't already been visited on some previous iteration, then I mark it as
31:16 - 31:17

visited,
31:17 - 31:22

and then I enqueue all of its children or neighbors at the back of the queue.
31:22 - 31:26

And so, as of right now, I'm processing Generation 0, that means all of Generation 1, so the
31:26 - 31:29

neighbors that are one hop away, get placed in the queue, and then
31:29 - 31:34

the next iteration coming out here will pull off a Generation 1,
31:34 - 31:37

and then enqueue all of these two hop neighbors.
31:37 - 31:40

Come back and other Generation 1's will get pulled off, enqueue all their two hop
31:40 - 31:44

neighbors, and so the two hop generation will, kind of, be built up behind, and then
31:44 - 31:48

once the last of the one hop generation gets managed,
31:48 - 31:51

start processing the Generation 2 and so on.
31:51 - 31:55

And so this will keep going while the queue has some
31:55 - 31:58

contents in there. So that means some neighbors
31:58 - 32:01

we have connections to that we haven't yet had a chance to explore, and
32:01 - 32:05

either we will find out that they have all been visited, so this will end when
32:05 - 32:09

all of the nodes that are in the queue have been visited or
32:09 - 32:14

there are no more neighbors to enqueue. So we've gotten to those dead ends or those cycles
32:14 - 32:25

that mean we've reach everything that was reachable from that start node.
32:25 - 32:28

And so I've traced this guy
32:28 - 32:30

to see it doing its work.
32:30 - 32:33

So if I start again with A,
32:33 - 32:36

and, again, assuming that I'm gonna process my nodes
32:36 - 32:40

in alphabetical order when I have children, so I will through and I
32:40 - 32:44

will enqueue, so the queue right now has just node A in it. I pull it off. I say
32:44 - 32:45

have I visited? No, I have not.
32:45 - 32:47

So then I mark it as visited,
32:47 - 32:52

and now for each of its neighboring nodes, B, C, D, H, I'm gonna put those into
32:52 - 32:54

the queue,
32:54 - 32:57

and so then they're loaded up, and then I come back around, I say do I still have stuff in the queue? I
32:57 - 33:01

do, so let's take out the first thing that was put in; it was B. Okay,
33:01 - 33:05

and I'll say okay, well, where can you get to from B? You can get to C, so I'm gonna put C in the queue. Now,
33:05 - 33:09

C is actually already in the queue, but we're gonna pull it out
33:09 - 33:12

earlier in terms it'll get handled by the
33:12 - 33:14

visiting status later.
33:14 - 33:16

So, okay, we'll put C, D, and E in
33:16 - 33:20

and then put C behind it. So right now, we'll have C, D, H, and C again.
33:20 - 33:24

It'll find the C that's in the front there. It'll say where can you get to from C?
33:24 - 33:24

Nowhere,
33:24 - 33:26

so no additional
33:26 - 33:28

Generation 2's are added by that one.
33:28 - 33:31

I'll look at D, and I'll say okay, well, what Generation 2's do we have from D? It says, well, I can
33:31 - 33:32

get to E. All right, so put E on the
33:32 - 33:34

back of the
33:34 - 33:36
33:36 - 33:39

queue. Look at H; where can H get to? H can get to G,
33:39 - 33:42

so go ahead and put G on the back of the queue.
33:42 - 33:45

So now, it comes back around to the things that are two hops away.
33:45 - 33:47

The first one it's gonna pull off is C. It's
33:47 - 33:49

gonna say, well, C is already visited,
33:49 - 33:53

so there's no need to redundantly visit that or do anything with that.
33:53 - 33:54

It passes over C. It
33:54 - 33:56

finds the E
33:56 - 33:58

that's behind that,
33:58 - 34:00

and then it finds the G that was behind the
34:00 - 34:03

H. The E also enqueued the F
34:03 - 34:07

that was four hops, and the very last one, right, output will be that F that was
34:07 - 34:09

only reachable by taking
34:09 - 34:12

three hops away from the thing. So this one's, kind of,
34:12 - 34:13

working radially. The
34:13 - 34:16

idea that I'm looking at all the things that I can get to, kind of, in one hop
34:16 - 34:19

are my first generation, then two hops, then three hops,
34:19 - 34:23

and so it's like throwing a stone in the water and watching it ripple out
34:23 - 34:26

that the visiting
34:26 - 34:29

is managed in terms of, kind of, growing length of path,
34:29 - 34:32

and number of hops that it took to get there.
34:32 - 34:37

So this is the strategy that you used for maze solving, if you remember.
34:37 - 34:37

That
34:37 - 34:42

what you were tracking was the queue of paths where the path was
34:42 - 34:44

AB, or ADC, or
34:44 - 34:46

ADE, and that they grew
34:46 - 34:47
34:47 - 34:50

step wise. That, kind of, each iteration through the
34:50 - 34:54

traversal there where you're saying, okay, well, I've seen all the paths that are one hop. Now I've seen all the ones of two,
34:54 - 34:56

now the ones three, and four, and five,
34:56 - 34:58

and when you keep doing that, eventually you get to the hop,
34:58 - 35:01

you know, 75, which leads you
35:01 - 35:05

all the way to the goal, and since you have looked at all the paths that were
35:05 - 35:09

74 and shorter, the first path that you dequeue that leads to your goal
35:09 - 35:12

must be the best or the shortest overall
35:12 - 35:13

because
35:13 - 35:15

of the order you processed them.
35:15 - 35:18

That's not true in depth-first search, right? That depth-first search might find a much
35:18 - 35:21

longer and more circuitous path that led to the goal
35:21 - 35:25

when there is a shorter one that would be eventually found if I had a graph that
35:25 - 35:28

looked,
35:28 - 35:36

you know, had this long path, let's say,
35:36 - 35:41

and so this is goal node, let's say, and this is the start node that
35:41 - 35:44

there could be a two hop path, kind of, off to this angle,
35:44 - 35:47

but if this was the first one explored in the depth-first, right,
35:47 - 35:50

it could eventually, kind of, work its way all the way around through this
35:50 - 35:52

eight hop path, and say, okay, well, I found it, right?
35:52 - 35:55

It would be the first one we found, but there's no guarantee that would be the
35:55 - 35:58

shortest in depth-first search, right? It would just happen to be the one that based on
35:58 - 36:01

its arbitrary choice of which
36:01 - 36:04

neighbors to pursue it happened to get to that there could be a shorter
36:04 - 36:08

one that would be found later in the traversal. That's not true in
36:08 - 36:12

the breadth-first search strategy because it will be looking at this, and then these two,
36:12 - 36:16

and then these two, and, kind of, working its way outward down those edges.
36:16 - 36:19

So as soon as we get to one that hits the goal, we
36:19 - 36:25

know we have found the shortest we can have. Question?
36:25 - 36:28

In that case,
36:28 - 36:33

you could return, okay, we need two
36:33 - 36:34

jumps to get to
36:34 - 36:35

the goal,
36:35 - 36:36

like, as far
36:36 - 36:40

winning the goal. Yeah. How do we remember which path led us there?
36:40 - 36:42

So if you were really doing that, you know, in terms of depth-first search, what
36:42 - 36:46

you'd probably be tracking is what's the path? Probably a stack that said here's the stack
36:46 - 36:49

that led to that path. Let me hold
36:49 - 36:53

onto this, right, and I will compare it to these alternatives. So I'll try it on Neighbor 1,
36:53 - 36:54

get the stack back from that; it's the best.
36:54 - 36:58

Try it on Neighbor 2, get that stack back, see which one is smaller and use
36:58 - 37:01

that as the better choice, and so, eventually, when I tried it out this side, I'd get
37:01 - 37:03

the stack back that was just two,
37:03 - 37:06

and I could say, yeah, that was better than this ten step path I had over here, so I
37:06 - 37:09

will prefer that. So just using your standard, kind of, backtracking with
37:09 - 37:13

using your return values, incorporating them, and deciding which was better.
37:13 - 37:14

And so in this case,
37:14 - 37:16

the depth-first search
37:16 - 37:19

can't really be pruned very easily
37:19 - 37:24

because it has to, kind of, explore the whole thing to know that this
37:24 - 37:26

short little path right over here happened to be just
37:26 - 37:29

arbitrarily examined last, whereas, the breadth-first search actually does, kind of, because
37:29 - 37:31

it's prioritizing by
37:31 - 37:35

order of length. If the goal was shortest path, it will find it sooner and be able to
37:35 - 37:35

stop
37:35 - 37:39

when it found it rather than look at these longer paths. So it won't end up -
37:39 - 37:42

if there were hundreds of other paths over here, breadth-first search won't actually
37:42 - 37:47

have to ever reach them or explore them.
37:47 - 37:52

So there are a lot of things that actually end up being graph search in disguise. That's why
37:52 - 37:54

a, kind of, a graph is a great data structure to, kind of, get to in
37:54 - 37:57

106b because there are lots of problems
37:57 - 38:01

that, in the end, if you can model them using the same
38:01 - 38:03

setup of nodes, and arcs, and traversals,
38:03 - 38:06

that you can answer interesting questions about that data by applying
38:06 - 38:09

the same graph search techniques.
38:09 - 38:12

So as long as you want to know things like, well, which nodes are reachable from this node at
38:12 - 38:12

all?
38:12 - 38:14

And then
38:14 - 38:17

that just means, okay, I'm here and I want to get to these places; what places could
38:17 - 38:18

I get to? Or
38:18 - 38:21

knowing what courses I could take
38:21 - 38:25

that would lead me to a CS degree, it's, kind of, like well, looking for the ones
38:25 - 38:29

that lead to having all your graduation requirements
38:29 - 38:32

satisfied, and each of those arcs could be a course you could take, right, how close did
38:32 - 38:35

that get you to the goal? How can I get to the
38:35 - 38:40

place I'd like to be, or what are the places that this course satisfies
38:40 - 38:41

requirements that lead to?
38:41 - 38:44

Knowing things about
38:44 - 38:47

connectivity is, kind of, useful in things like the, kind of, bacon numbers or
38:47 - 38:50

erdish numbers where people want to know, well, how close am I to greatness by -
38:50 - 38:54

have I been in a movie with somebody who was in a movie, who was in a movie with Kevin Bacon? And,
38:54 - 38:55

for me, it turns out
38:55 - 38:57

zero; I have been in no movies. But have
38:57 - 38:59

you written a paper with someone who then
38:59 - 39:02

connects through a chain to some other famous person, right? Tells you about
39:02 - 39:06

your, kind of, social network distance or something, or I'm linked in. You're trying to get a
39:06 - 39:09

new job, and you want to find somebody to introduce you to
39:09 - 39:13

the head of this cool company. It's, like, figuring who you know who knows them, right?
39:13 - 39:18

It's finding paths through a social network graph.
39:18 - 39:19

Finding out things like
39:19 - 39:22

are there paths with cycles? What's the shortest path
39:22 - 39:25

through the cycle, longest path through the cycle often tells you about things about
39:25 - 39:27

redundancies in that structure
39:27 - 39:31

that, for example, when you're building a network infrastructure, sometimes you do have
39:31 - 39:34

cycles in it because you actually want to have multiple ways to get to things in case
39:34 - 39:37

there's a breakdown. Like if there's a freeway accident, you want to have another way
39:37 - 39:38

to get around it,
39:38 - 39:41

but you are interested in, maybe, trying to keep your cost down is trying to find
39:41 - 39:47

out where are the right places to put in those redundancies, those cycles that
39:47 - 39:50

give you the best benefit with the least cost.
39:50 - 39:54

Trying to find things like a continuous path that visits all the nodes once and exactly
39:54 - 39:57

once, especially doing so efficiently, picking a short
39:57 - 40:00

overall path for that. We talked a little bit about that in terms of what's called the
40:00 - 40:02

traveling salesman problem.
40:02 - 40:04

You are trying to get from - you have ten
40:04 - 40:06

cities to visit on your book tour. You don't want to spend all your lifetime
40:06 - 40:08

on a plane. What's the
40:08 - 40:13

path of crisscrossing the country that gets you to all ten of those with the least
40:13 - 40:15

time painfully
40:15 - 40:17

spent on an airline? And so
40:17 - 40:20

those are just graph search problems. You have these nodes. You have these arcs.
40:20 - 40:21

How can you
40:21 - 40:23

traverse them and visit
40:23 - 40:25

and come up with
40:25 - 40:27

a good solution. There's other
40:27 - 40:30

problems I think that are kind of neat. Word letters, we used to give out an assignment on word letters. We
40:30 - 40:31

didn't this time, but I thought it was an
40:31 - 40:34

interesting problem to think about. The idea of
40:34 - 40:36

how it is you can transform
40:36 - 40:41

one letter at a time from one word to another is very much a
40:41 - 40:45

graph problem behind the scenes, and that largely a breadth-first
40:45 - 40:47

traversal is exactly what you're looking for there. How
40:47 - 40:51

can I transform this with the shortest number of steps is, kind of,
40:51 - 40:54

working radially out from the starting word.
40:54 - 40:54
40:54 - 40:58

Your boggle board turns out, really, is just a graph. If
40:58 - 41:01

you think of
41:01 - 41:07

having this sequence of letters
41:07 - 41:12

that you're trying to work through,
41:12 - 41:16

that really each of these letters can actually just be thought of as a node,
41:16 - 41:20

and then the connections it has are to its eight neighbors, and that finding
41:20 - 41:24

words in there, especially finding paths, starting from
41:24 - 41:27

this node that lead away, that don't revisit, so, in fact, doing breadth-first
41:27 - 41:31

or depth-first search. You most likely did depth-first search when you were writing the
41:31 - 41:32

implementation of boggle,
41:32 - 41:35

but that's, kind of, a deep search on I found an A. I found a P. I found a P, and, kind
41:35 - 41:40

of, keeps going as long as that path looks viable. So, in this case, that the arc
41:40 - 41:41

information we're using is
41:41 - 41:44

having a sub of those letters, do they form a prefix that could be leading
41:44 - 41:45

somewhere good,
41:45 - 41:47

and then the, kind of, recursion bottoming out when
41:47 - 41:52

we have run ourselves into a corner where we have no unvisited nodes
41:52 - 41:53

that neighbor us,
41:53 - 41:57

or only nodes that would extend to a prefix that no longer forms any
41:57 - 41:58

viable words
41:58 - 42:01

as a search problem.
42:01 - 42:04

So it feels like everything you've done this quarter has actually been a graph in disguise; I just didn't
42:04 - 42:08

tell you. The maze problem is very much a graph search problem, building a
42:08 - 42:10

graph and searching it.
42:10 - 42:13

Another one that I think is, kind of, neat to think about is, kind of, related to the idea of word letters
42:13 - 42:18

is that often when you mistype a word, a word
42:18 - 42:19

processor will suggest to you
42:19 - 42:22

here's a word that you might have meant instead. You type a word that it doesn't
42:22 - 42:24

have in its lexicon, and it says, well,
42:24 - 42:25

that could just be
42:25 - 42:28

a word I don't know, but it also could mean that you actually mistyped something, and so
42:28 - 42:31

what you want to look at is having neighboring nodes,
42:31 - 42:33

sort of, diagramming a graph
42:33 - 42:35

of all the words you know of
42:35 - 42:39

and then the, kind of, permutations, the tiny little twiddles to that word that
42:39 - 42:41

are neighbors to it so that
42:41 - 42:44

you could describe a set of suggestions
42:44 - 42:47

as being those things that are within a certain
42:47 - 42:50

hop distance from the original mistyped word. And
42:50 - 42:53

so you could say, yeah, things that have two letters different are probably
42:53 - 42:56

close enough, but things that have three or more letters different aren't actually
42:56 - 42:58

likely to be
42:58 - 43:00

the mistyped word,
43:00 - 43:04

and then that, kind of, leads to supporting things like wildcard searches
43:04 - 43:08

where you're trying to remember whether it's E-R-A-T or A-R-T-E at the
43:08 - 43:10

end of desperate.
43:10 - 43:14

Those things could be modeled as, kind of, like they search through a graph space where
43:14 - 43:16

you're modeling those letters, and then there's, like, a
43:16 - 43:20

branching where you try all of the letters coming out of D-E-S-P, seeing if any
43:20 - 43:23

of them lead to an R-A-T-E.
43:23 - 43:26

So the last thing I wanted
43:26 - 43:28

to just bring up, and this is the one that's going into this week's assignment. I'm just
43:28 - 43:31

gonna actually show it to you to, kind of, get you thinking about it because it's
43:31 - 43:35

actually the problem at hand that you're gonna be solving
43:35 - 43:36

is that
43:36 - 43:41

the one of where I'm interested in finding the shortest path, but I'm gonna add in
43:41 - 43:43

this new idea that
43:43 - 43:45

all hops are not created equal.
43:45 - 43:49

That a flight from San Francisco to Seattle is a shorter distance than the
43:49 - 43:52

one that goes from San Francisco to Boston,
43:52 - 43:53

and if I'm interested in getting
43:53 - 43:57

to someplace, it's not just that I would like to take a direct flight. I'd also
43:57 - 44:00

like to take the set of flights that involves the least, kind of, going out of my way,
44:00 - 44:03

and going through other cities, and other directions.
44:03 - 44:07

And so in this case, a breadth-first search is a little bit too simplistic.
44:07 - 44:10

That if I just look at all the places I can get to in one hop,
44:10 - 44:11
44:11 - 44:15

those are not really equally
44:15 - 44:16

weighted
44:16 - 44:19

in terms of my goal of minimizing my total path distance,
44:19 - 44:22

but if I, instead, prioritize -
44:22 - 44:25

prioritize in your mind, immediately conjure up images of the
44:25 - 44:26

priority queue,
44:26 - 44:28

to look at paths that are,
44:28 - 44:32

if I'm trying to minimize total distance, the shortest path, even if they
44:32 - 44:36

represent several hops, if they're still shorter - if I have three flights, and each are
44:36 - 44:37

100 miles,
44:37 - 44:40

then they represent a flight of 300 total miles,
44:40 - 44:44

and that's still preferred to one that actually is 2,000 miles or 7,000 miles
44:44 - 44:45

going to Asia
44:45 - 44:49

alternatively. And so if I worked on a breadth-first that's been modified by a total
44:49 - 44:50

distance,
44:50 - 44:51

that it would be a way
44:51 - 44:54

to work out the shortest path
44:54 - 44:55

in terms of total weight
44:55 - 44:58

from San Francisco to my destination
44:58 - 45:01

by a
45:01 - 45:03

modified breadth-first, kind of, weighted strategy,
45:03 - 45:06

and that is exactly what you're gonna be doing in this assignment. I'm not gonna
45:06 - 45:08

give away too much by, kind of,
45:08 - 45:11

delving too much in it, but the handout does a pretty good job of talking you through what
45:11 - 45:15

you're doing, but it's a, kind of, neat way to think about that's how the GPS, the MapQuest, and things like
45:15 - 45:16

that are working is
45:16 - 45:17

you're at a site;
45:17 - 45:21

you have a goal, and you have this information that you're trying to
45:21 - 45:24

guide that search, and you're gonna use heuristics about
45:24 - 45:28

appears to be the, kind of, optimal path to, kind of, pick that, and go with it,
45:28 - 45:31

and see where it leads you. So
45:31 - 45:32

that will be your
45:32 - 45:40

task for this week. What we'll talk about on Friday is caching.

Title:: Lecture 23 | Programming Abstractions (Stanford)
Description:: Lecture 23 by Julie Zelenski for the Programming Abstractions Course (CS106B) in the Stanford Computer Science Department.

Julie shows a YouTube video of Barack Obama answering a question about what kind of sorting algorithm he would use to sort a list of data. She also gives several examples of problems that are capable of being solved with sorting. She goes on to say that if you think about certain problems as graphs, finding the solution is sometimes easier as well as easier to understand. To be able to do this, she explains how to represent graphs in C++.

Complete Playlist for the Course:
http://www.youtube.com/view_play_list?p=FE6E58F856038C69

CS 106B Course Website:
http://cs106b.stanford.edu

Stanford Center for Professional Development:
http://scpd.stanford.edu/

Stanford University:
http://www.stanford.edu/

Stanford University Channel on YouTube:
http://www.youtube.com/stanforduniversity/

more » « less
Video Language:: English
Duration:: 45:51

	N. Ueda edited English subtitles for Lecture 23 \| Programming Abstractions (Stanford)
	Eunjeong_Kim added a translation
	Eunjeong_Kim added a translation

English subtitles

Revisions

Revision 3

N. Ueda

Lecture 23 | Programming Abstractions (Stanford)

Revisions

Our website uses cookies

Operating cookies (Required)