Lecture 23 | Programming Abstractions (Stanford)
-
0:00 - 0:11
-
0:11 - 0:14This presentation is delivered by the Stanford Center for Professional
-
0:14 - 0:25Development.
-
0:25 - 0:27Okay, we got volume and everything. All
-
0:27 - 0:28right, here we go. Lots of you think your
-
0:28 - 0:32learning things that aren't gonna actually someday help you in your future
-
0:32 - 0:34career. Let's say you plan to be president. You
-
0:34 - 0:38think, ''Should I take 106b; should I not take 106b?'' Watch this interview and
-
0:38 - 0:39you will see. [Video]
-
0:39 - 0:41We have questions, and we ask
-
0:41 - 0:45our candidates questions, and this one is from Larry Schwimmer. What
-
0:45 - 0:48- you
-
0:48 - 0:52guys think I'm kidding; it's right here.
-
0:52 - 0:59What is the most efficient way to sort a million 32-bit integers?
-
0:59 - 1:01Well - I'm
-
1:01 - 1:07sorry. Maybe we should - that's not a - I think the bubble sort
-
1:07 - 1:09would be the wrong way to go. Come on; who told him this?
-
1:09 - 1:14There you go. So, apparently -
-
1:14 - 1:15the question was
-
1:15 - 1:16actually
-
1:16 - 1:19by a former student of mine, in
-
1:19 - 1:22fact, Larry Schwimmer. So apparently, he's been prepped well. As part of going on the campaign trail, not only do you
-
1:22 - 1:25have to know all your policy statements, you also have to have your
-
1:25 - 1:29N2 and login sorts kept straight in your mind, so - just so
-
1:29 - 1:31you know. So let me give you
-
1:31 - 1:35some examples of things that are graphs, just to get you thinking about
-
1:35 - 1:36
-
1:36 - 1:39the data structure that we're gonna be working with and playing around with
-
1:39 - 1:40today. Sometimes you
-
1:40 - 1:44think of graphs of being those, like, bar graphs, right, your graphing
-
1:44 - 1:46histograms. This is actually a different kind of graph. The computer scientist's form of
-
1:46 - 1:50graph is just a generalized, recursive data structure
-
1:50 - 1:51that, kind of,
-
1:51 - 1:54starts with something that looks a little bit like a tree and extends it to
-
1:54 - 1:55this notion that there are
-
1:55 - 1:57any number of nodes, in
-
1:57 - 2:00this case, represented as the cities in this airline route map,
-
2:00 - 2:04and then there are connections between them, sometimes called edges or arcs that wire up
-
2:04 - 2:08the connections that in the case, for example, of this graph are showing you
-
2:08 - 2:10which cities have connecting flights. You have a direct
-
2:10 - 2:15flight that takes you from Las Vegas to Phoenix, right, one from Phoenix to Oklahoma
-
2:15 - 2:15City.
-
2:15 - 2:18There is not a direct flight from Phoenix to Kansas City, but there's a path
-
2:18 - 2:21by which you could get there by taking a sequence
-
2:21 - 2:25of connecting flights, right, going from Phoenix, to Oklahoma City, to Kansas City is
-
2:25 - 2:26one way to get there.
-
2:26 - 2:30And so the entire, kind of, map of airline routes forms a graph of the city,
-
2:30 - 2:34and there are a lot of interesting questions that having represented that data you might want to
-
2:34 - 2:35answer about finding
-
2:35 - 2:36flights that are
-
2:36 - 2:40cheap, or meet your time specification, or get you where you want to go,
-
2:40 - 2:47and that this data would be, kind of, primary in solving that problem.
-
2:47 - 2:48Another one,
-
2:48 - 2:49right, another
-
2:49 - 2:52idea of something that's represented as a graph is something that would support
-
2:52 - 2:54the notion of word ladders, those little puzzles where you're
-
2:54 - 2:56given the word ''chord,''
-
2:56 - 3:00and you want to change it into the word ''worm,'' and you have these rules that say, well, you
-
3:00 - 3:02can change one letter at a time.
-
3:02 - 3:04And so the
-
3:04 - 3:07connections in the case of the word letter have to do with words that are
-
3:07 - 3:12one character different from their neighbors, have direct connections or arcs
-
3:12 - 3:16between them. Each of the nodes themselves represents a word, and then
-
3:16 - 3:19paths between that represent word ladders. That if you can go from here
-
3:19 - 3:22through a sequence of steps directly connecting your way, each of those stepping
-
3:22 - 3:26stones is a word in the chain that makes a word ladder. So
-
3:26 - 3:29things you might want to solve there is what's the shortest word ladder? How many different
-
3:29 - 3:31word ladders - how many different ways can I go from this word to that one
-
3:31 - 3:36could be answered by searching around in a space like this.
-
3:36 - 3:39Any kind of prerequisite structure,
-
3:39 - 3:42such as the one for the major that might say you need to take this class before you take
-
3:42 - 3:45that class, and this class before that class, right,
-
3:45 - 3:46forms a
-
3:46 - 3:49structure that also fits into the main graph.
-
3:49 - 3:52In this case, there's a connection to those arcs.
-
3:52 - 3:54That the prerequisite is such that
-
3:54 - 3:58you take 106a, and it leads into 106b, and that connection doesn't really go in
-
3:58 - 4:01the reverse, right? You don't start in 106b and move backwards.
-
4:01 - 4:05Whereas in the case, for example, of the word ladder, all of these arcs are what we called undirected,
-
4:05 - 4:09right? You can traverse from wood to word and back again, right?
-
4:09 - 4:10They're equally
-
4:10 - 4:12
-
4:12 - 4:15visitable or neighbors, in this case. So there may be a situation where you have directed
-
4:15 - 4:20arcs where they really imply a one-way directionality through the paths.
-
4:20 - 4:22Another one, lots and lots of
-
4:22 - 4:23arrows that
-
4:23 - 4:25tell you about the lifecycle of
-
4:25 - 4:26
-
4:26 - 4:30Stanford relationships all captured on one
-
4:30 - 4:31slide
-
4:31 - 4:34in, sort of, the flowchart. So flowcharts have a lot of things about directionality. Like, you're
-
4:34 - 4:37in this state, and you can move to this state, so that's your neighboring state, and
-
4:37 - 4:39the arcs in there connect those things.
-
4:39 - 4:43And so, as you will see, like, you get to start over here and, like, be single and love
-
4:43 - 4:47it, and then you can, kind of, flirt with various degrees of seriousness,
-
4:47 - 4:50and then eventually there evolves a trombone and serenade, which, you know,
-
4:50 - 4:53you can't go back to flirting aimlessly after the trombone and serenade; there's no
-
4:53 - 4:55arc that direction.
-
4:55 - 4:57When somebody gets the trombone out,
-
4:57 - 4:59you know, you take them seriously,
-
4:59 - 5:02or you completely blow them off. There's the [inaudible] there.
-
5:02 - 5:04And then there's dating, and then there's
-
5:04 - 5:08two-phase commit, which is a very important part of any computer scientist's education,
-
5:08 - 5:10and then, potentially, other
-
5:10 - 5:13tracks of children and what not. But it does
-
5:13 - 5:15show you, kind of, places you can go, right?
-
5:15 - 5:18It turns out you actually don't get to go from flirt aimlessly to have baby, so
-
5:18 - 5:20write that down -
-
5:20 - 5:23not allowed in this graph. And
-
5:23 - 5:28then it turns out some things that you've already seen are actually graphs in disguise.
-
5:28 - 5:32I didn't tell you when I gave you the maze problem and its solution, the solving and
-
5:32 - 5:33building of a maze,
-
5:33 - 5:37that, in fact, what you're working on though is something that is a graph.
-
5:37 - 5:39That a perfect maze, right,
-
5:39 - 5:40so we started with this
-
5:40 - 5:44fully disconnected maze. If you remember how the Aldis/Broder algorithm worked, it's like
-
5:44 - 5:47you have all these nodes that have no connections between them,
-
5:47 - 5:51and then you went bopping around breaking down walls, which is effectively
-
5:51 - 5:53connecting up the nodes to each other,
-
5:53 - 5:57and we kept doing that until actually all of the nodes were connected. So, in effect, we
-
5:57 - 6:00were building a graph out of this big grid of cells
-
6:00 - 6:03by knocking down those walls to build the connections, and when we were done, then
-
6:03 - 6:07we made it so that it was possible to trace paths starting from that lower corner
-
6:07 - 6:10and working our way through the neighbors all the way over to there.
-
6:10 - 6:13So in the case of this, if you imagine that each of these cells was broken out in this little
-
6:13 - 6:16thing, there'd be these neighboring connections where this guy has this
-
6:16 - 6:19neighbor but no other ones because actually it has no
-
6:19 - 6:22other directions you can move from there, but some of them, for example like this
-
6:22 - 6:23one,
-
6:23 - 6:26has a full set of four neighbors you can reach
-
6:26 - 6:28depending on which walls are intact or not.
-
6:28 - 6:29And so the
-
6:29 - 6:31creation of that was creating a graph out
-
6:31 - 6:34of a set of disconnected nodes, and then solving it is searching that
-
6:34 - 6:38graph for a path. Here's one that is a little horrifying that came from the newspaper, which is social networking, kind of, a very big [inaudible] of, like, on the net who do you know, and who do they know, and how do those connections, like, linked in maybe help you get a job. This is one that was actually based on the liaisons, let's say, of a group of high school students, right, and it turns out there was a lot of more interconnected in that graph than I would expect, or appreciate, or approve of but interesting to, kind of, look at. So
-
6:38 - 6:39let's talk, like, concretely.
-
6:39 - 6:43How are we gonna make stuff work on graphs? Graphs turn out to actually - by showing you that, I'm trying to
-
6:43 - 6:46get you this idea that a lot of things are really just graphs in disguise. That a lot
-
6:46 - 6:50of the problems you may want to solve, if you can represent it as a graph, then things
-
6:50 - 6:53you learn about how to manipulate graphs will help you to solve that kind of
-
6:53 - 6:54problem.
-
6:54 - 6:59So the basic idea of a graph, it is a recursive data structure based on the node where
-
6:59 - 7:02nodes have connections to other nodes.
-
7:02 - 7:04Okay. So that's, kind of, like a tree,
-
7:04 - 7:07but it actually is more freeform than a tree, right?
-
7:07 - 7:11In a tree, right, there's that single root node that has, kind of, a special place, and
-
7:11 - 7:15there's this restriction about the connections that there's a single path
-
7:15 - 7:17from every other node in the tree
-
7:17 - 7:20starting from the root that leads to it. So you never can get there by
-
7:20 - 7:24multiple ways, and there's no cycles, and it's all connected, and things like
-
7:24 - 7:25that.
-
7:25 - 7:26So, in terms of a graph,
-
7:26 - 7:29it just doesn't place those restrictions. It says there's a node.
-
7:29 - 7:32It has any number of pointers to other nodes. It potentially could even
-
7:32 - 7:36have zero pointers to other's nodes, so it could be on a little island of its own out
-
7:36 - 7:38here that has no incoming or outgoing flights, right? You
-
7:38 - 7:40have to swim if you want to get there.
-
7:40 - 7:42
-
7:42 - 7:45And then there's also not a rule that says that you can't have
-
7:45 - 7:47multiple ways to get to the same
-
7:47 - 7:50node. So if you're at B, and you're interested in getting to C,
-
7:50 - 7:52you could take the direction connection here,
-
7:52 - 7:54but you could also take a longer path
-
7:54 - 7:58that bops through A and hits to C, and so there's more than one way to get there,
-
7:58 - 7:59
-
7:59 - 8:02which would not be true in a tree structure where it has that constraint,
-
8:02 - 8:05but that adds some interesting challenges to solving problems in the graphs
-
8:05 - 8:08because there actually are so many different ways you can
-
8:08 - 8:11get to the same place. We have to be careful avoiding
-
8:11 - 8:13getting into circularities where we're
-
8:13 - 8:16doing that, and also not redundantly doing work we've already done before because
-
8:16 - 8:18we've visited it previously.
-
8:18 - 8:21There can be cycles where
-
8:21 - 8:22you can
-
8:22 - 8:24completely come around. I this one, actually,
-
8:24 - 8:26doesn't have a cycle in it.
-
8:26 - 8:27Nope, it doesn't,
-
8:27 - 8:31but if I add it - I could add a link, for example, from
-
8:31 - 8:34C back to B, and then there would be this place you could just, kind of, keep
-
8:34 - 8:36rolling around in the B, A, C
-
8:36 - 8:38area.
-
8:38 - 8:41So we call each of those guys, the circles here, the nodes, right?
-
8:41 - 8:44The connection between them arcs. We
-
8:44 - 8:47will talk about the arcs as being directed and undirected in different situations
-
8:47 - 8:51where we imply that there's a directionality; you can travel the arc only one-way,
-
8:51 - 8:54or in cases where it's undirected, it means it goes both ways.
-
8:54 - 8:56We'll talk about two nodes being connected,
-
8:56 - 8:58meaning there is a direct connection, an arc, between them,
-
8:58 - 9:02and if they are not directly connected, there may possibly be a path
-
9:02 - 9:05between them. A sequence of arcs forms a path that you can follow to get
-
9:05 - 9:07from one node to another,
-
9:07 - 9:11and then cycles just meaning a path that revisits one of the nodes that
-
9:11 - 9:14was previously traveled on that path. So I've got
-
9:14 - 9:16a little terminology.
-
9:16 - 9:18Let's talk a little bit about how we're gonna represent it.
-
9:18 - 9:22So before we start making stuff happen on them, it's, like, well, how does this -
-
9:22 - 9:26in C++, right, what's the best way, or what are the reasonable ways you
-
9:26 - 9:27might represent
-
9:27 - 9:30this kind of connected structure?
-
9:30 - 9:31So if I have
-
9:31 - 9:36this four-node graph, A, B, C, D over where with the connections that are shown with
-
9:36 - 9:38direction, in this case, I have directed arcs.
-
9:38 - 9:41That one way to think of it is that what a
-
9:41 - 9:45graph really is is a set of nodes and a set of arcs, and by set, I mean, sort of, just the
-
9:45 - 9:48generalization. There might be a vector of arcs; there might be a vector of
-
9:48 - 9:51nodes, whatever, but some collection of arcs, some collection of nodes,
-
9:51 - 9:53and that the
-
9:53 - 9:56nodes might have information about what location they represent, or
-
9:56 - 9:59what word, or what entity, you know, in a social network that they are,
-
9:59 - 10:03and then the arcs would show who is directly connected to whom.
-
10:03 - 10:06And so one way to manage that, right, is just to have two independent collections - a collection
-
10:06 - 10:08of nodes and a
-
10:08 - 10:11collection of arcs, and that when I need to know things about connections, I'll have to, kind of, use both
-
10:11 - 10:12in tandem.
-
10:12 - 10:14So if I want to know if A is connected to B,
-
10:14 - 10:18I might actually have to just walk through the entire set of arcs trying to find
-
10:18 - 10:22one that connects A and B as its endpoints. If I want to see if
-
10:22 - 10:25C is connected to B, same thing, just walk down the whole set.
-
10:25 - 10:27
-
10:27 - 10:29
-
10:29 - 10:31That's probably the simplest thing to do, right, is to just, kind of,
-
10:31 - 10:35have them just be maintained as the two things that are used in conjunction.
-
10:35 - 10:37More likely, what you're gonna want to do
-
10:37 - 10:39is provide some access that
-
10:39 - 10:42when at a current node, you're currently trying to do some processing
-
10:42 - 10:47from the Node B, it would be convenient if you could easily access those nodes
-
10:47 - 10:51that emanate or outgo from the B node. That's typically the thing you're trying
-
10:51 - 10:52to do is
-
10:52 - 10:55at B try to visit its neighbors,
-
10:55 - 10:58and rather that having to, kind of, pluck them out of the entire collection,
-
10:58 - 11:01it might be handy if they were already, kind of, stored in such a way to make it
-
11:01 - 11:03easy for you to get to that.
-
11:03 - 11:06So the two ways that try to represent adjacency
-
11:06 - 11:10in a more efficient way are the adjacency list and the adjacency matrix.
-
11:10 - 11:14So in an adjacency list representation, we have a way of associating with each node,
-
11:14 - 11:17probably just by storing it in the node structure itself,
-
11:17 - 11:21those arcs that outgo from there. So, in this case, A has as direct
-
11:21 - 11:22connection to the B
-
11:22 - 11:23and C Nodes,
-
11:23 - 11:26and so I would have that information stored with the A Node, which is my
-
11:26 - 11:29vector or set of outgoing connections.
-
11:29 - 11:33Similarly, B has one outgoing connection, which goes to D. C has an outgoing connection to D, and
-
11:33 - 11:37then D has outgoing connections that lead back to A and
-
11:37 - 11:40to B. So at a particular node I could say, well, who's my neighbors?
-
11:40 - 11:41It would be
-
11:41 - 11:46easily accessible and not have to be retrieved, or searched, or filtered.
-
11:46 - 11:48Another way of doing that
-
11:48 - 11:51is to realize that, in a sense, what we have is this, kind of,
-
11:51 - 11:53end by end grid
-
11:53 - 11:56where each node is potentially connected to
-
11:56 - 12:00the other N -1 nodes in that graph. And so if I just build
-
12:00 - 12:03a 2x2 matrix that's labeled with
-
12:03 - 12:06all the node names on this side and all the node names across the top, that the
-
12:06 - 12:08intersection of this says
-
12:08 - 12:11is there an arc that leads away from A and
-
12:11 - 12:14ends at B? Yes, there is. Is there one from A to C?
-
12:14 - 12:18And in the places where there is not an existing connection, a self loop or
-
12:18 - 12:19
-
12:19 - 12:23just a connection that doesn't exist, right, I have an empty slot. So maybe these would be Booleans
-
12:23 - 12:25true and false or something, or maybe there'd be some more information here
-
12:25 - 12:27about the distance
-
12:27 - 12:30or other associated arc properties for that
-
12:30 - 12:31particular connection.
-
12:31 - 12:34In this case, it's actually very easy then to actually do the really quick
-
12:34 - 12:35search of
-
12:35 - 12:38I'm at A and I want to know if I can get to B, then it's really just a
-
12:38 - 12:41matter of reaching right into the
-
12:41 - 12:45slot in constant time in that matrix to pull it out, and
-
12:45 - 12:47so
-
12:47 - 12:49this one involves a lot of searching to find things. This one involves, perhaps, a
-
12:49 - 12:52little bit of searching. If I want to know if A is connected to B, I might have to look
-
12:52 - 12:55through all its outgoing connections, right? This one gives me that direct
-
12:55 - 12:56access,
-
12:56 - 13:00but the tradeoffs here has to do with where the space starts building up, right?
-
13:00 - 13:03A full NxN matrix could be very big,
-
13:03 - 13:06and sometimes we're allocating space as though there are a full set of
-
13:06 - 13:09connections between all the nodes, every node connected to every other,
-
13:09 - 13:12and in the cases where a lot of those things are false, there's a lot of
-
13:12 - 13:13capacity
-
13:13 - 13:16that we have set aside that we're not really tapping into.
-
13:16 - 13:19And so in the case of a graph that we call dense,
-
13:19 - 13:23where there's a lot of connections, it will use a lot of that capacity, and it might make sense
-
13:23 - 13:27in a graph that's more sparse, where there's a few connections. So you have
-
13:27 - 13:31thousands of nodes, but maybe only three or four on average are connected to one.
-
13:31 - 13:35Then having allocated 1,000 slots of which to only fill in three
-
13:35 - 13:36might become
-
13:36 - 13:40rather space inefficient. You might prefer an adjacency list representation that
-
13:40 - 13:41can get you
-
13:41 - 13:44to the ones you're actually using versus the other,
-
13:44 - 13:47but in terms of the adjacency matrices is very fast,
-
13:47 - 13:48right? A lot of space thrown at this
-
13:48 - 13:51gives you this immediate 0,1 access to know
-
13:51 - 13:55who your neighbors are. So
-
13:55 - 13:59we're gonna use the adjacency list, kind of, strategy here
-
13:59 - 14:02for the code we're gonna do today, and it's, kind
-
14:02 - 14:07of, a good compromise between the two alternatives that we've seen there.
-
14:07 - 14:08So I have a struct node.
-
14:08 - 14:11It has some information for the node. Given that I'm not being very specific
-
14:11 - 14:14about what the graph is, I'm just gonna, kind of, leave that unspecified. It might be that it's
-
14:14 - 14:17the city name. It might be that it's a word. It might be that it's a person in a
-
14:17 - 14:19social network, you know,
-
14:19 - 14:23some information that represents what this node is an entity,
-
14:23 - 14:28and then there's gonna be a set of arcs or a set of nodes it's connected to,
-
14:28 - 14:32and so I'm using vector here. I could be using set. I could be using a
-
14:32 - 14:35raw array, or a link list, or any of these things.
-
14:35 - 14:38I need to use some unbounded collection though because there's no guarantee that
-
14:38 - 14:42they will be 0, or 1, or 2 the way there is in a link list or a tree where there's a
-
14:42 - 14:45specified number of outgoing links. There can be any number of them, so I'm just leaving
-
14:45 - 14:46
-
14:46 - 14:47a
-
14:47 - 14:51variable size collection here to do that work for me.
-
14:51 - 14:52The graph itself,
-
14:52 - 14:55so I have this idea of all the nodes in the graph, right, would each get a new
-
14:55 - 14:58node structure and then be wired up to the other nodes that they are connected
-
14:58 - 15:00to,
-
15:00 - 15:02but then when I'm trying to operate on that graph,
-
15:02 - 15:06I can't just take one pointer and say here's a pointer to some node in the
-
15:06 - 15:09graph and say this is - and from here you have accessed everything. In the way
-
15:09 - 15:11that a tree,
-
15:11 - 15:13the only thing we need to keep track of is the pointer to the root,
-
15:13 - 15:17or a link list, the only thing we keep a pointer to, typically, is that frontmost
-
15:17 - 15:19cell, and from there, we can reach everything else.
-
15:19 - 15:23There is no special head or root cell in a graph.
-
15:23 - 15:27A graph is this
-
15:27 - 15:30being, kind of, a crazy collection without a lot of rules about
-
15:30 - 15:35how they are connected. In fact, it's not even guaranteed that they totally are
-
15:35 - 15:36connected.
-
15:36 - 15:38That I can't
-
15:38 - 15:41guarantee that if I had a pointer, for example, to C,
-
15:41 - 15:44right, I may or may not be able to reach all the nodes from there.
-
15:44 - 15:49If I had a pointer to A, right, A can get to C and B, and then also
-
15:49 - 15:52down to D, but, for example, there could just be an E node over here that was only
-
15:52 - 15:55connected to C or not connected to anything at all for that matter, connected
-
15:55 - 15:57to F in their own little island.
-
15:57 - 16:01That there is no way to identify some special node from which you could reach
-
16:01 - 16:02all the nodes.
-
16:02 - 16:04There isn't a special root node,
-
16:04 - 16:07and you can't even just arbitrarily pick one to be the root because actually there's no
-
16:07 - 16:10guarantee that it will be connected to all of the others. So it would not give you
-
16:10 - 16:13access to the full entirety of your graph if you just picked on and said I want anything
-
16:13 - 16:16reachable from here, it might not get you the whole thing. So,
-
16:16 - 16:18typically, what you need to really operate on is
-
16:18 - 16:21the entire collection of node stars. You'll have a set or a vector
-
16:21 - 16:24that contains all of the nodes that are in the graph,
-
16:24 - 16:26and then within that, there's a bunch of other connections that are
-
16:26 - 16:35being made on the graph connectivity that you're also exploring.
-
16:35 - 16:37So let's make it a little bit more concrete
-
16:37 - 16:39
-
16:39 - 16:40and talk a little bit about
-
16:40 - 16:43what really is a node; what really is an arc?
-
16:43 - 16:46There may be, actually, be some more information I need to store than just a
-
16:46 - 16:48node is connected to other nodes.
-
16:48 - 16:51So in the case of a list or a tree,
-
16:51 - 16:54the next field and the left and the right field are really just pointers for the
-
16:54 - 16:56purpose of organization.
-
16:56 - 16:58That's the pointer to the next cell.
-
16:58 - 17:02There's no data that is really associated with that link itself.
-
17:02 - 17:05That's not true in the case of a graph.
-
17:05 - 17:09That often what the links that connect up the nodes
-
17:09 - 17:12actually do have a real role to play in terms of being data.
-
17:12 - 17:14It may be that what they tell you is
-
17:14 - 17:18what road you're taking here, or how long this path is, or how much this
-
17:18 - 17:22flight costs, or what time this flight leaves, or how full this flight already is,
-
17:22 - 17:24or who knows, you know, just other information
-
17:24 - 17:27that is more than just this nodes connected to that one. It's, like, how is it
-
17:27 - 17:30connected; what is it connected by, and what are the details of how that
-
17:30 - 17:31connection
-
17:31 - 17:32
-
17:32 - 17:34is being represented?
-
17:34 - 17:38So it's likely that what you really want is not just pointers to other nodes but
-
17:38 - 17:42information about that actual link stored with an arc structure.
-
17:42 - 17:44So, typically, we're gonna have
-
17:44 - 17:45an arc
-
17:45 - 17:48which has information about that arc, how long, how much it costs, you know,
-
17:48 - 17:50what flight number it is,
-
17:50 - 17:52that will have pointers to the start and end node that it is connecting.
-
17:52 - 17:54The node
-
17:54 - 17:57then has a collection of arcs,
-
17:57 - 18:00and those arcs are expected to, in the adjacency list form, will all be arcs
-
18:00 - 18:02that start at this node.
-
18:02 - 18:05So their start node is equal to the one that's holding onto to them, and then
-
18:05 - 18:06they have an end
-
18:06 - 18:10pointer that points to the node at the other end of the arc. Well,
-
18:10 - 18:13this gets us into a little C++ bind
-
18:13 - 18:16because I've described these data structures
-
18:16 - 18:17in a way
-
18:17 - 18:22that they both depend on each other; that an arc has pointers to the start and the end
-
18:22 - 18:22node.
-
18:22 - 18:27A node has a collection of arcs, which is probably a vector or a set of arc pointers,
-
18:27 - 18:28and so
-
18:28 - 18:31I'm starting to define the structure for arc,
-
18:31 - 18:33and then I want to talk about node in it,
-
18:33 - 18:35and I think, oh, okay, well, then I better define node first because C++ always likes to
-
18:35 - 18:38see things in order. Remember, it always wants to see A,
-
18:38 - 18:42and then B if B uses A, and so in terms of how your functions and data
-
18:42 - 18:45structures work, you've always gotten this idea like, well, go do the one first. Well, if I say,
-
18:45 - 18:48okay, arc's gonna need node. I start to write arc, and I say, oh, I
-
18:48 - 18:50need to talk about node. Well, I better put node up here,
-
18:50 - 18:54right? But then when I'm starting to write node, I start talking about arc, and they have a
-
18:54 - 18:57circular reference to each other, right? They both
-
18:57 - 19:00want to depend on the other one before we've gotten around to telling the
-
19:00 - 19:01compiler about it.
-
19:01 - 19:04One of them has to go first, right? What are we gonna do?
-
19:04 - 19:08What we need to do is use the C++ forward reference
-
19:08 - 19:09as a
-
19:09 - 19:12little hint to the compiler
-
19:12 - 19:13that
-
19:13 - 19:17we are gonna be defining a Node T structure in a little bit;
-
19:17 - 19:18we're gonna get to it,
-
19:18 - 19:22but first we're gonna define the Arc T structure that uses that Node T, and then
-
19:22 - 19:25we're gonna get around to it. So we're gonna give it, like, a little heads up that says
-
19:25 - 19:29there later will be an Act 2 of this play. We're gonna introduce a
-
19:29 - 19:31character called Node T.
-
19:31 - 19:33So you can now
-
19:33 - 19:34start talking about Node
-
19:34 - 19:37T in some simple ways
-
19:37 - 19:40based on your agreement to the compiler that you plan on telling it more
-
19:40 - 19:41about Node T later.
-
19:41 - 19:45So the struct Node T says there will be a struct called Node T.
-
19:45 - 19:47Okay, the compiler says, okay, I'll write that down.
-
19:47 - 19:51You start defining struct Arc T, and it says, oh, I see you have these pointers
-
19:51 - 19:54to the node T. Okay. Well, you told me there'd be a struct like that; I guess that'll work out.
-
19:54 - 19:58And now you told it what struct Node T is. It's, like, oh, I have a vector
-
19:58 - 19:59with these pointers to arc,
-
19:59 - 20:02and then it says, okay, now I see the whole picture, and
-
20:02 - 20:03it all worked out.
-
20:03 - 20:08So it's just a little bit of a
-
20:08 - 20:11requirement of how the C++ compiler likes to see stuff in order without ever having
-
20:11 - 20:16to go backwards to check anything out again. Okay. So
-
20:16 - 20:19I got myself some node structures, got
-
20:19 - 20:21some arc structures; they're all
-
20:21 - 20:22working together.
-
20:22 - 20:24I'm gonna try to do some traversals.
-
20:24 - 20:26Try to do some operations that
-
20:26 - 20:29work their way around the graph.
-
20:29 - 20:30So
-
20:30 - 20:33tree traversals, right, the pre, and post, and in order
-
20:33 - 20:36are ways that you start at the root, and you visit all the nodes on the tree,
-
20:36 - 20:40a very common need to, kind of, process, to delete nodes,
-
20:40 - 20:43to print the nodes, to whatnot. So we might want to do the same thing on graphs. I've got a
-
20:43 - 20:47graph out there, and I'd like to go exploring it. Maybe I just want to print, and if
-
20:47 - 20:50I'm in the Gates building, what are all the places that I can reach from
-
20:50 - 20:53the Gates building? Where can I go to; what
-
20:53 - 20:54options do I have?
-
20:54 - 20:57What I'm gonna do is a traversal
-
20:57 - 21:01starting at a node and, kind of, working my way outward to the visible, reachable
-
21:01 - 21:04nodes that I can follow those paths to get there.
-
21:04 - 21:07We're gonna look at two different ways to do this.
-
21:07 - 21:09One is Depth-first and one is Breadth-first,
-
21:09 - 21:13and I'll show you both algorithms, and they
-
21:13 - 21:15will visit the same nodes;
-
21:15 - 21:19when all is done and said, they will visit all the reachable nodes starting from a
-
21:19 - 21:20position,
-
21:20 - 21:21but they will visit them in different order.
-
21:21 - 21:25So just like the pre/post in order tree traversals, they visit all the same
-
21:25 - 21:28notes. It's just a matter of when they get around to doing it
-
21:28 - 21:29is how
-
21:29 - 21:32we distinguish depth and breadth-first traversals.
-
21:32 - 21:34Because we have this graph structure
-
21:34 - 21:40that has this loose connectivity and the possibility of cycles and multiple paths
-
21:40 - 21:41to the same node,
-
21:41 - 21:44we are gonna have to be a little bit careful at how we do our work to make
-
21:44 - 21:48sure that we don't end up getting stuck in some infinite loop when we keep
-
21:48 - 21:51going around the ABC cycle in a particular graph.
-
21:51 - 21:55That we need to realize when we're revisiting something we've seen before,
-
21:55 - 21:56and then not
-
21:56 - 21:57
-
21:57 - 21:59trigger, kind of, an
-
21:59 - 22:03exploration we've already done.
-
22:03 - 22:06So let's look at depth-first first.
-
22:06 - 22:09This is probably the simpler of the two. They're both, actually, pretty simple.
-
22:09 - 22:13Depth-first traversal uses recursion to do its work,
-
22:13 - 22:18and the idea is that you pick a starting node - let's say I want to start at Gates,
-
22:18 - 22:21and that maybe what I'm trying to find is all the reachable nodes from Gates.
-
22:21 - 22:24If I don't have a starting node, then I can just pick one arbitrarily from the
-
22:24 - 22:27graph because there is actually no root node of special status,
-
22:27 - 22:30and the idea is to go deep. That
-
22:30 - 22:33in the case of, for example, a maze, it might be that I
-
22:33 - 22:36choose to go north, and then I just keep going north. I just go north, north, north, north, north until
-
22:36 - 22:40I run into a dead wall, and then I say, oh, I have to go east, and I go east, east, east, east, east. The idea is to
-
22:40 - 22:42just go as far away from
-
22:42 - 22:46the original state as I can, just keep going outward, outward, outward, outward,
-
22:46 - 22:47outward, outward, outward,
-
22:47 - 22:50and eventually I'm gonna run into some dead end, and that dead end could be I
-
22:50 - 22:52get to a node that has no outgoing arcs,
-
22:52 - 22:54or it could be that I get to a node
-
22:54 - 22:57that whose only arcs come back to places I've already been,
-
22:57 - 22:58so it cycles back on itself, in
-
22:58 - 23:01which case, that also is really a dead end.
-
23:01 - 23:03So you go deep.
-
23:03 - 23:06You go north, and you see as far as you can get, and
-
23:06 - 23:10only after you've, kind of, fully explored everything reachable from starting in
-
23:10 - 23:11the north direction
-
23:11 - 23:13do you backtrack,
-
23:13 - 23:16unmake that decision, and say, well, how about not north; how about east, right? And then
-
23:16 - 23:19find everything you can get to from the east, and
-
23:19 - 23:22after, kind of, going all through that, come back and try south, and so on. So all of the
-
23:22 - 23:23options you have
-
23:23 - 23:26exhaustively getting through all your
-
23:26 - 23:26neighbors
-
23:26 - 23:29in this, kind of, go-deep strategy. Well,
-
23:29 - 23:32the important thing is you realize if you've got this recursion going, so what
-
23:32 - 23:35you're actually doing, basically, is saying I'm gonna step to my neighbor to the
-
23:35 - 23:40right, and I'm gonna explore all the things, so do a depth-first search from there
-
23:40 - 23:43to find all the places you can reach, and that one says, well, I'm gonna step through this spot
-
23:43 - 23:45and go to one of my neighbors.
-
23:45 - 23:48And so we can't do that infinitely without getting into trouble. We need
-
23:48 - 23:53to have a base case that stops that recursion, and the two cases that I've
-
23:53 - 23:56talked about - one is just when you have no neighbors left to go.
-
23:56 - 23:57So when you
-
23:57 - 23:59
-
23:59 - 24:04get to a node that actually, kind of, is a dead end, or that neighbor only has visited
-
24:04 - 24:05neighbors
-
24:05 - 24:09that surround it. So a
-
24:09 - 24:10very simple little piece of
-
24:10 - 24:13code that depends on recursion doing its job.
-
24:13 - 24:17In this case, we need to know which nodes we have visited, right? We're gonna have to
-
24:17 - 24:19have some kind of strategy for saying I
-
24:19 - 24:21have been here before.
-
24:21 - 24:24In this case, I'm using just a set because that's an easy thing to do.
-
24:24 - 24:27I make a set of node pointers, and I update it
-
24:27 - 24:29each time I see a new node; I add it to
-
24:29 - 24:34that set, and if I ever get to a node who is already previously visited
-
24:34 - 24:38from that, there's no reason to do any work from there; we can immediately
-
24:38 - 24:41stop. So
-
24:41 - 24:43starting that step would be empty.
-
24:43 - 24:46If the visit already contains the node that we're currently trying to visit,
-
24:46 - 24:50then there's nothing to do. Otherwise, we go ahead and add it, and there's a little node
-
24:50 - 24:53here that says we'll do something would occur. What am I trying to do here? Am I trying to print those nodes? Am
-
24:53 - 24:56I trying to draw pictures on those nodes, highlight those nodes, you know,
-
24:56 - 25:00something I'm doing with them. I don't know what it is, but this is the structure for the depth-first search
-
25:00 - 25:01here, and
-
25:01 - 25:02then
-
25:02 - 25:04for all of the neighbors,
-
25:04 - 25:07using the vector of outgoing connections,
-
25:07 - 25:08then I run a depth-first search
-
25:08 - 25:11on each of those neighbors
-
25:11 - 25:14passing that visited set by reference
-
25:14 - 25:16through the whole operation. So I will
-
25:16 - 25:20always be adding to and modifying this one set, so I will
-
25:20 - 25:24be able to know where I am and where I still need to go.
-
25:24 - 25:27So if we trace this guy in action -
-
25:27 - 25:31if I start arbitrarily from A, I say I'd like to begin with A.
-
25:31 - 25:35Then the depth-first search is gonna pick a neighbor, and let's say I happen to just work
-
25:35 - 25:38over my neighbors in alphabetical order.
-
25:38 - 25:41It would say okay, well, I've picked B. Let's go to everywhere we can get to from B. So A
-
25:41 - 25:42gets marked as visited.
-
25:42 - 25:44I move onto B; I say, well,
-
25:44 - 25:48B, search everywhere you can to from B, and B says okay, well, I need to pick a
-
25:48 - 25:50neighbor, how about I pick
-
25:50 - 25:51C? And says
-
25:51 - 25:53and now,
-
25:53 - 25:56having visited C, go everywhere you can get to from C. Well,
-
25:56 - 25:59C has no neighbors, right, no outgoing connections whatsoever.
-
25:59 - 26:02So C says, well, okay, everywhere you can get to from C is C,
-
26:02 - 26:04you know, I'm done.
-
26:04 - 26:09That will cause it to backtrack to this place, to B, and B says okay, I explored everywhere I can get to
-
26:09 - 26:10from C;
-
26:10 - 26:13what other neighbors do I have? B has no other neighbors. So, in fact, B backtracks all the way
-
26:13 - 26:15back to A.
-
26:15 - 26:19So now A says okay, well, I went everywhere I can get to from B,
-
26:19 - 26:20right? Let me look at my next neighbor.
-
26:20 - 26:23My next neighbor is C. Ah,
-
26:23 - 26:25but C, already visited,
-
26:25 - 26:27so there's no reason to
-
26:27 - 26:29go crazy on that thing. So, in fact, I can
-
26:29 - 26:33immediately see that I've visited that, so I don't have any further path to look at
-
26:33 - 26:33there.
-
26:33 - 26:35I'll hit D then
-
26:35 - 26:37as the next one in sequence. So
-
26:37 - 26:40at this point, I've done everything reachable from the B arm, everything
-
26:40 - 26:43reachable from the C arm, and now I'm starting on the D arm. I said okay, where can I get to
-
26:43 - 26:44from D?
-
26:44 - 26:46Well, I can get to E. All right,
-
26:46 - 26:48where can I get to from E?
-
26:48 - 26:52E goes to F, so the idea is you think of it going deep. It's going as far away as it
-
26:52 - 26:56can, as deep as possible, before it, kind of, unwinds
-
26:56 - 26:58getting to the F that has no neighbors
-
26:58 - 27:01and saying okay. Well, how about - where can I get to
-
27:01 - 27:04also from E? I can get to G.
-
27:04 - 27:07From G where can I get to? Nowhere,
-
27:07 - 27:09so we, kind of, unwind the
-
27:09 - 27:12D arm, and then we will eventually get back to and hit the H.
-
27:12 - 27:16So the order they actually were hit in this case happened to be alphabetical. I
-
27:16 - 27:18assigned them such that that would come out that way.
-
27:18 - 27:21That's not really a property of what depth-first search does,
-
27:21 - 27:24but think of it as like it's going as deep as possible, as far away as possible,
-
27:24 - 27:28and so it makes a choice - it's pretty much like the recursive backtrackers that we've seen. In
-
27:28 - 27:31fact, they make a choice, commit to it, and just move forward on it, never, kind of,
-
27:31 - 27:35giving a backwards glance, and only if the whole process
-
27:35 - 27:35bottoms out
-
27:35 - 27:39does it come back and start unmaking decisions and try alternatives,
-
27:39 - 27:43eventually exploring everything reachable from A
-
27:43 - 27:46when all is done and said. So if
-
27:46 - 27:49you, for example, use that on the maze,
-
27:49 - 27:52right, the depth-first search on a maze is very much the same strategy we're talking about
-
27:52 - 27:53here.
-
27:53 - 27:56Instead of the way we did it was actually breadth-first search, if you
-
27:56 - 27:57remember back to that assignment,
-
27:57 - 28:00the depth-first search alternative is actually doing it the -
-
28:00 - 28:01just go deep,
-
28:01 - 28:04make a choice, go with it, run all the way until it bottoms out,
-
28:04 - 28:12and only then back up and try some other alternatives.
-
28:12 - 28:15The breadth-first traversal, gonna hit
-
28:15 - 28:16the same nodes
-
28:16 - 28:19but just gonna hit them in a different way.
-
28:19 - 28:21If my goal were to actually hit all of them,
-
28:21 - 28:25then either of them is gonna be a fine strategy. There may be some reason why
-
28:25 - 28:25
-
28:25 - 28:29I'm hoping to stop the search early, and that actually might dictate why I would prefer
-
28:29 - 28:31which ones to look at first.
-
28:31 - 28:35The breadth-first traversal is going to explore things in terms of distance,
-
28:35 - 28:40in this case, expressed in terms of number of hops away from the starting
-
28:40 - 28:40node.
-
28:40 - 28:44So it's gonna look at everything that's one hop away, and then go back and look at
-
28:44 - 28:46everything two hops away, and then three hops away.
-
28:46 - 28:49And so if the goal, for example, was to find a node
-
28:49 - 28:51in a shortest path connection between it,
-
28:51 - 28:55then breadth-first search, by looking at all the ones one hop away, if it was a one hop
-
28:55 - 28:56path, it'll find it,
-
28:56 - 28:59and if it's a two hop path, it'll find it on that next round, and then the three hop
-
28:59 - 29:00path,
-
29:00 - 29:02and there might be paths that are 10, and 20, and 100
-
29:02 - 29:03steps long,
-
29:03 - 29:06but if I find it earlier, I won't go looking at them. It actually might prove to
-
29:06 - 29:09be a more efficient way to get to the shortest solution as opposed to depth-first
-
29:09 - 29:11search which go off and try all these deep
-
29:11 - 29:14paths that may not ever end up where I want it to be
-
29:14 - 29:16before it eventually made its way to the short
-
29:16 - 29:20solution I was wanting to find. So the thing
-
29:20 - 29:21is we have the starting node.
-
29:21 - 29:24We're gonna visit all the immediate neighbors.
-
29:24 - 29:27So, basically, a loop over the
-
29:27 - 29:28immediate one,
-
29:28 - 29:31and then while we're doing that, we're gonna actually, kind of, be gathering up that
-
29:31 - 29:33next generation.
-
29:33 - 29:33
-
29:33 - 29:37Sometimes that's called the Frontier, sort of, advancing the frontier
-
29:37 - 29:39of the nodes to explore next,
-
29:39 - 29:42so all the nodes that are two hops away,
-
29:42 - 29:44and then while we're processing the ones that are two hops away, we're gonna
-
29:44 - 29:47be gathering the frontier
-
29:47 - 29:50that is three hops away. And so at each stage we'll be, kind of, processing a
-
29:50 - 29:54generation but while assembling the generation N+1
-
29:54 - 29:58in case we need to go further to find those things.
-
29:58 - 30:00We'll use an auxiliary data structure during
-
30:00 - 30:04this. That management of the frontier, keeping track of the nodes that are, kind
-
30:04 - 30:05of,
-
30:05 - 30:07staged for the
-
30:07 - 30:09subsequent iterations,
-
30:09 - 30:12the queue is a perfect data structure for that. If I put the
-
30:12 - 30:14initial neighbors into a queue,
-
30:14 - 30:19as I dequeue them, if I put their neighbors on the back of the queue, so enqueue them
-
30:19 - 30:21behind all the one hop neighbors, then
-
30:21 - 30:24if I do that with code, as I'm processing Generation 1 off the front of the queue, I'll be
-
30:24 - 30:27stacking Generation 2 in the back of the queue,
-
30:27 - 30:30and then when all of Generation 1 is exhausted, I'll be processing Generation
-
30:30 - 30:312,
-
30:31 - 30:35but, kind of, enqueuing Generation 3 behind it.
-
30:35 - 30:39Same deal that we had with depth-first search though is we will have to be
-
30:39 - 30:41careful about cycles will pull past that
-
30:41 - 30:44when there's more than one possibility of how we get to something, we don't
-
30:44 - 30:46want to, kind of, keep going around that cycle
-
30:46 - 30:52or do a lot of redundant work visiting nodes that we've already seen. So
-
30:52 - 30:55I have the same idea of keeping track of a set
-
30:55 - 30:57that knows what
-
30:57 - 31:00we've previously visited,
-
31:00 - 31:01and the
-
31:01 - 31:05initial node gets enqueued just, kind of, by itself, just like Generation 0 gets
-
31:05 - 31:07put into the queue,
-
31:07 - 31:10and then while the queue is not empty, so while there's still neighbors I haven't
-
31:10 - 31:11yet visited,
-
31:11 - 31:13I dequeue the frontmost one,
-
31:13 - 31:16if it hasn't already been visited on some previous iteration, then I mark it as
-
31:16 - 31:17visited,
-
31:17 - 31:22and then I enqueue all of its children or neighbors at the back of the queue.
-
31:22 - 31:26And so, as of right now, I'm processing Generation 0, that means all of Generation 1, so the
-
31:26 - 31:29neighbors that are one hop away, get placed in the queue, and then
-
31:29 - 31:34the next iteration coming out here will pull off a Generation 1,
-
31:34 - 31:37and then enqueue all of these two hop neighbors.
-
31:37 - 31:40Come back and other Generation 1's will get pulled off, enqueue all their two hop
-
31:40 - 31:44neighbors, and so the two hop generation will, kind of, be built up behind, and then
-
31:44 - 31:48once the last of the one hop generation gets managed,
-
31:48 - 31:51start processing the Generation 2 and so on.
-
31:51 - 31:55And so this will keep going while the queue has some
-
31:55 - 31:58contents in there. So that means some neighbors
-
31:58 - 32:01we have connections to that we haven't yet had a chance to explore, and
-
32:01 - 32:05either we will find out that they have all been visited, so this will end when
-
32:05 - 32:09all of the nodes that are in the queue have been visited or
-
32:09 - 32:14there are no more neighbors to enqueue. So we've gotten to those dead ends or those cycles
-
32:14 - 32:25that mean we've reach everything that was reachable from that start node.
-
32:25 - 32:28And so I've traced this guy
-
32:28 - 32:30to see it doing its work.
-
32:30 - 32:33So if I start again with A,
-
32:33 - 32:36and, again, assuming that I'm gonna process my nodes
-
32:36 - 32:40in alphabetical order when I have children, so I will through and I
-
32:40 - 32:44will enqueue, so the queue right now has just node A in it. I pull it off. I say
-
32:44 - 32:45have I visited? No, I have not.
-
32:45 - 32:47So then I mark it as visited,
-
32:47 - 32:52and now for each of its neighboring nodes, B, C, D, H, I'm gonna put those into
-
32:52 - 32:54the queue,
-
32:54 - 32:57and so then they're loaded up, and then I come back around, I say do I still have stuff in the queue? I
-
32:57 - 33:01do, so let's take out the first thing that was put in; it was B. Okay,
-
33:01 - 33:05and I'll say okay, well, where can you get to from B? You can get to C, so I'm gonna put C in the queue. Now,
-
33:05 - 33:09C is actually already in the queue, but we're gonna pull it out
-
33:09 - 33:12earlier in terms it'll get handled by the
-
33:12 - 33:14visiting status later.
-
33:14 - 33:16So, okay, we'll put C, D, and E in
-
33:16 - 33:20and then put C behind it. So right now, we'll have C, D, H, and C again.
-
33:20 - 33:24It'll find the C that's in the front there. It'll say where can you get to from C?
-
33:24 - 33:24Nowhere,
-
33:24 - 33:26so no additional
-
33:26 - 33:28Generation 2's are added by that one.
-
33:28 - 33:31I'll look at D, and I'll say okay, well, what Generation 2's do we have from D? It says, well, I can
-
33:31 - 33:32get to E. All right, so put E on the
-
33:32 - 33:34back of the
-
33:34 - 33:36
-
33:36 - 33:39queue. Look at H; where can H get to? H can get to G,
-
33:39 - 33:42so go ahead and put G on the back of the queue.
-
33:42 - 33:45So now, it comes back around to the things that are two hops away.
-
33:45 - 33:47The first one it's gonna pull off is C. It's
-
33:47 - 33:49gonna say, well, C is already visited,
-
33:49 - 33:53so there's no need to redundantly visit that or do anything with that.
-
33:53 - 33:54It passes over C. It
-
33:54 - 33:56finds the E
-
33:56 - 33:58that's behind that,
-
33:58 - 34:00and then it finds the G that was behind the
-
34:00 - 34:03H. The E also enqueued the F
-
34:03 - 34:07that was four hops, and the very last one, right, output will be that F that was
-
34:07 - 34:09only reachable by taking
-
34:09 - 34:12three hops away from the thing. So this one's, kind of,
-
34:12 - 34:13working radially. The
-
34:13 - 34:16idea that I'm looking at all the things that I can get to, kind of, in one hop
-
34:16 - 34:19are my first generation, then two hops, then three hops,
-
34:19 - 34:23and so it's like throwing a stone in the water and watching it ripple out
-
34:23 - 34:26that the visiting
-
34:26 - 34:29is managed in terms of, kind of, growing length of path,
-
34:29 - 34:32and number of hops that it took to get there.
-
34:32 - 34:37So this is the strategy that you used for maze solving, if you remember.
-
34:37 - 34:37That
-
34:37 - 34:42what you were tracking was the queue of paths where the path was
-
34:42 - 34:44AB, or ADC, or
-
34:44 - 34:46ADE, and that they grew
-
34:46 - 34:47
-
34:47 - 34:50step wise. That, kind of, each iteration through the
-
34:50 - 34:54traversal there where you're saying, okay, well, I've seen all the paths that are one hop. Now I've seen all the ones of two,
-
34:54 - 34:56now the ones three, and four, and five,
-
34:56 - 34:58and when you keep doing that, eventually you get to the hop,
-
34:58 - 35:01you know, 75, which leads you
-
35:01 - 35:05all the way to the goal, and since you have looked at all the paths that were
-
35:05 - 35:0974 and shorter, the first path that you dequeue that leads to your goal
-
35:09 - 35:12must be the best or the shortest overall
-
35:12 - 35:13because
-
35:13 - 35:15of the order you processed them.
-
35:15 - 35:18That's not true in depth-first search, right? That depth-first search might find a much
-
35:18 - 35:21longer and more circuitous path that led to the goal
-
35:21 - 35:25when there is a shorter one that would be eventually found if I had a graph that
-
35:25 - 35:28looked,
-
35:28 - 35:36you know, had this long path, let's say,
-
35:36 - 35:41and so this is goal node, let's say, and this is the start node that
-
35:41 - 35:44there could be a two hop path, kind of, off to this angle,
-
35:44 - 35:47but if this was the first one explored in the depth-first, right,
-
35:47 - 35:50it could eventually, kind of, work its way all the way around through this
-
35:50 - 35:52eight hop path, and say, okay, well, I found it, right?
-
35:52 - 35:55It would be the first one we found, but there's no guarantee that would be the
-
35:55 - 35:58shortest in depth-first search, right? It would just happen to be the one that based on
-
35:58 - 36:01its arbitrary choice of which
-
36:01 - 36:04neighbors to pursue it happened to get to that there could be a shorter
-
36:04 - 36:08one that would be found later in the traversal. That's not true in
-
36:08 - 36:12the breadth-first search strategy because it will be looking at this, and then these two,
-
36:12 - 36:16and then these two, and, kind of, working its way outward down those edges.
-
36:16 - 36:19So as soon as we get to one that hits the goal, we
-
36:19 - 36:25know we have found the shortest we can have. Question?
-
36:25 - 36:28In that case,
-
36:28 - 36:33you could return, okay, we need two
-
36:33 - 36:34jumps to get to
-
36:34 - 36:35the goal,
-
36:35 - 36:36like, as far
-
36:36 - 36:40winning the goal. Yeah. How do we remember which path led us there?
-
36:40 - 36:42So if you were really doing that, you know, in terms of depth-first search, what
-
36:42 - 36:46you'd probably be tracking is what's the path? Probably a stack that said here's the stack
-
36:46 - 36:49that led to that path. Let me hold
-
36:49 - 36:53onto this, right, and I will compare it to these alternatives. So I'll try it on Neighbor 1,
-
36:53 - 36:54get the stack back from that; it's the best.
-
36:54 - 36:58Try it on Neighbor 2, get that stack back, see which one is smaller and use
-
36:58 - 37:01that as the better choice, and so, eventually, when I tried it out this side, I'd get
-
37:01 - 37:03the stack back that was just two,
-
37:03 - 37:06and I could say, yeah, that was better than this ten step path I had over here, so I
-
37:06 - 37:09will prefer that. So just using your standard, kind of, backtracking with
-
37:09 - 37:13using your return values, incorporating them, and deciding which was better.
-
37:13 - 37:14And so in this case,
-
37:14 - 37:16the depth-first search
-
37:16 - 37:19can't really be pruned very easily
-
37:19 - 37:24because it has to, kind of, explore the whole thing to know that this
-
37:24 - 37:26short little path right over here happened to be just
-
37:26 - 37:29arbitrarily examined last, whereas, the breadth-first search actually does, kind of, because
-
37:29 - 37:31it's prioritizing by
-
37:31 - 37:35order of length. If the goal was shortest path, it will find it sooner and be able to
-
37:35 - 37:35stop
-
37:35 - 37:39when it found it rather than look at these longer paths. So it won't end up -
-
37:39 - 37:42if there were hundreds of other paths over here, breadth-first search won't actually
-
37:42 - 37:47have to ever reach them or explore them.
-
37:47 - 37:52So there are a lot of things that actually end up being graph search in disguise. That's why
-
37:52 - 37:54a, kind of, a graph is a great data structure to, kind of, get to in
-
37:54 - 37:57106b because there are lots of problems
-
37:57 - 38:01that, in the end, if you can model them using the same
-
38:01 - 38:03setup of nodes, and arcs, and traversals,
-
38:03 - 38:06that you can answer interesting questions about that data by applying
-
38:06 - 38:09the same graph search techniques.
-
38:09 - 38:12So as long as you want to know things like, well, which nodes are reachable from this node at
-
38:12 - 38:12all?
-
38:12 - 38:14And then
-
38:14 - 38:17that just means, okay, I'm here and I want to get to these places; what places could
-
38:17 - 38:18I get to? Or
-
38:18 - 38:21knowing what courses I could take
-
38:21 - 38:25that would lead me to a CS degree, it's, kind of, like well, looking for the ones
-
38:25 - 38:29that lead to having all your graduation requirements
-
38:29 - 38:32satisfied, and each of those arcs could be a course you could take, right, how close did
-
38:32 - 38:35that get you to the goal? How can I get to the
-
38:35 - 38:40place I'd like to be, or what are the places that this course satisfies
-
38:40 - 38:41requirements that lead to?
-
38:41 - 38:44Knowing things about
-
38:44 - 38:47connectivity is, kind of, useful in things like the, kind of, bacon numbers or
-
38:47 - 38:50erdish numbers where people want to know, well, how close am I to greatness by -
-
38:50 - 38:54have I been in a movie with somebody who was in a movie, who was in a movie with Kevin Bacon? And,
-
38:54 - 38:55for me, it turns out
-
38:55 - 38:57zero; I have been in no movies. But have
-
38:57 - 38:59you written a paper with someone who then
-
38:59 - 39:02connects through a chain to some other famous person, right? Tells you about
-
39:02 - 39:06your, kind of, social network distance or something, or I'm linked in. You're trying to get a
-
39:06 - 39:09new job, and you want to find somebody to introduce you to
-
39:09 - 39:13the head of this cool company. It's, like, figuring who you know who knows them, right?
-
39:13 - 39:18It's finding paths through a social network graph.
-
39:18 - 39:19Finding out things like
-
39:19 - 39:22are there paths with cycles? What's the shortest path
-
39:22 - 39:25through the cycle, longest path through the cycle often tells you about things about
-
39:25 - 39:27redundancies in that structure
-
39:27 - 39:31that, for example, when you're building a network infrastructure, sometimes you do have
-
39:31 - 39:34cycles in it because you actually want to have multiple ways to get to things in case
-
39:34 - 39:37there's a breakdown. Like if there's a freeway accident, you want to have another way
-
39:37 - 39:38to get around it,
-
39:38 - 39:41but you are interested in, maybe, trying to keep your cost down is trying to find
-
39:41 - 39:47out where are the right places to put in those redundancies, those cycles that
-
39:47 - 39:50give you the best benefit with the least cost.
-
39:50 - 39:54Trying to find things like a continuous path that visits all the nodes once and exactly
-
39:54 - 39:57once, especially doing so efficiently, picking a short
-
39:57 - 40:00overall path for that. We talked a little bit about that in terms of what's called the
-
40:00 - 40:02traveling salesman problem.
-
40:02 - 40:04You are trying to get from - you have ten
-
40:04 - 40:06cities to visit on your book tour. You don't want to spend all your lifetime
-
40:06 - 40:08on a plane. What's the
-
40:08 - 40:13path of crisscrossing the country that gets you to all ten of those with the least
-
40:13 - 40:15time painfully
-
40:15 - 40:17spent on an airline? And so
-
40:17 - 40:20those are just graph search problems. You have these nodes. You have these arcs.
-
40:20 - 40:21How can you
-
40:21 - 40:23traverse them and visit
-
40:23 - 40:25and come up with
-
40:25 - 40:27a good solution. There's other
-
40:27 - 40:30problems I think that are kind of neat. Word letters, we used to give out an assignment on word letters. We
-
40:30 - 40:31didn't this time, but I thought it was an
-
40:31 - 40:34interesting problem to think about. The idea of
-
40:34 - 40:36how it is you can transform
-
40:36 - 40:41one letter at a time from one word to another is very much a
-
40:41 - 40:45graph problem behind the scenes, and that largely a breadth-first
-
40:45 - 40:47traversal is exactly what you're looking for there. How
-
40:47 - 40:51can I transform this with the shortest number of steps is, kind of,
-
40:51 - 40:54working radially out from the starting word.
-
40:54 - 40:54
-
40:54 - 40:58Your boggle board turns out, really, is just a graph. If
-
40:58 - 41:01you think of
-
41:01 - 41:07having this sequence of letters
-
41:07 - 41:12that you're trying to work through,
-
41:12 - 41:16that really each of these letters can actually just be thought of as a node,
-
41:16 - 41:20and then the connections it has are to its eight neighbors, and that finding
-
41:20 - 41:24words in there, especially finding paths, starting from
-
41:24 - 41:27this node that lead away, that don't revisit, so, in fact, doing breadth-first
-
41:27 - 41:31or depth-first search. You most likely did depth-first search when you were writing the
-
41:31 - 41:32implementation of boggle,
-
41:32 - 41:35but that's, kind of, a deep search on I found an A. I found a P. I found a P, and, kind
-
41:35 - 41:40of, keeps going as long as that path looks viable. So, in this case, that the arc
-
41:40 - 41:41information we're using is
-
41:41 - 41:44having a sub of those letters, do they form a prefix that could be leading
-
41:44 - 41:45somewhere good,
-
41:45 - 41:47and then the, kind of, recursion bottoming out when
-
41:47 - 41:52we have run ourselves into a corner where we have no unvisited nodes
-
41:52 - 41:53that neighbor us,
-
41:53 - 41:57or only nodes that would extend to a prefix that no longer forms any
-
41:57 - 41:58viable words
-
41:58 - 42:01as a search problem.
-
42:01 - 42:04So it feels like everything you've done this quarter has actually been a graph in disguise; I just didn't
-
42:04 - 42:08tell you. The maze problem is very much a graph search problem, building a
-
42:08 - 42:10graph and searching it.
-
42:10 - 42:13Another one that I think is, kind of, neat to think about is, kind of, related to the idea of word letters
-
42:13 - 42:18is that often when you mistype a word, a word
-
42:18 - 42:19processor will suggest to you
-
42:19 - 42:22here's a word that you might have meant instead. You type a word that it doesn't
-
42:22 - 42:24have in its lexicon, and it says, well,
-
42:24 - 42:25that could just be
-
42:25 - 42:28a word I don't know, but it also could mean that you actually mistyped something, and so
-
42:28 - 42:31what you want to look at is having neighboring nodes,
-
42:31 - 42:33sort of, diagramming a graph
-
42:33 - 42:35of all the words you know of
-
42:35 - 42:39and then the, kind of, permutations, the tiny little twiddles to that word that
-
42:39 - 42:41are neighbors to it so that
-
42:41 - 42:44you could describe a set of suggestions
-
42:44 - 42:47as being those things that are within a certain
-
42:47 - 42:50hop distance from the original mistyped word. And
-
42:50 - 42:53so you could say, yeah, things that have two letters different are probably
-
42:53 - 42:56close enough, but things that have three or more letters different aren't actually
-
42:56 - 42:58likely to be
-
42:58 - 43:00the mistyped word,
-
43:00 - 43:04and then that, kind of, leads to supporting things like wildcard searches
-
43:04 - 43:08where you're trying to remember whether it's E-R-A-T or A-R-T-E at the
-
43:08 - 43:10end of desperate.
-
43:10 - 43:14Those things could be modeled as, kind of, like they search through a graph space where
-
43:14 - 43:16you're modeling those letters, and then there's, like, a
-
43:16 - 43:20branching where you try all of the letters coming out of D-E-S-P, seeing if any
-
43:20 - 43:23of them lead to an R-A-T-E.
-
43:23 - 43:26So the last thing I wanted
-
43:26 - 43:28to just bring up, and this is the one that's going into this week's assignment. I'm just
-
43:28 - 43:31gonna actually show it to you to, kind of, get you thinking about it because it's
-
43:31 - 43:35actually the problem at hand that you're gonna be solving
-
43:35 - 43:36is that
-
43:36 - 43:41the one of where I'm interested in finding the shortest path, but I'm gonna add in
-
43:41 - 43:43this new idea that
-
43:43 - 43:45all hops are not created equal.
-
43:45 - 43:49That a flight from San Francisco to Seattle is a shorter distance than the
-
43:49 - 43:52one that goes from San Francisco to Boston,
-
43:52 - 43:53and if I'm interested in getting
-
43:53 - 43:57to someplace, it's not just that I would like to take a direct flight. I'd also
-
43:57 - 44:00like to take the set of flights that involves the least, kind of, going out of my way,
-
44:00 - 44:03and going through other cities, and other directions.
-
44:03 - 44:07And so in this case, a breadth-first search is a little bit too simplistic.
-
44:07 - 44:10That if I just look at all the places I can get to in one hop,
-
44:10 - 44:11
-
44:11 - 44:15those are not really equally
-
44:15 - 44:16weighted
-
44:16 - 44:19in terms of my goal of minimizing my total path distance,
-
44:19 - 44:22but if I, instead, prioritize -
-
44:22 - 44:25prioritize in your mind, immediately conjure up images of the
-
44:25 - 44:26priority queue,
-
44:26 - 44:28to look at paths that are,
-
44:28 - 44:32if I'm trying to minimize total distance, the shortest path, even if they
-
44:32 - 44:36represent several hops, if they're still shorter - if I have three flights, and each are
-
44:36 - 44:37100 miles,
-
44:37 - 44:40then they represent a flight of 300 total miles,
-
44:40 - 44:44and that's still preferred to one that actually is 2,000 miles or 7,000 miles
-
44:44 - 44:45going to Asia
-
44:45 - 44:49alternatively. And so if I worked on a breadth-first that's been modified by a total
-
44:49 - 44:50distance,
-
44:50 - 44:51that it would be a way
-
44:51 - 44:54to work out the shortest path
-
44:54 - 44:55in terms of total weight
-
44:55 - 44:58from San Francisco to my destination
-
44:58 - 45:01by a
-
45:01 - 45:03modified breadth-first, kind of, weighted strategy,
-
45:03 - 45:06and that is exactly what you're gonna be doing in this assignment. I'm not gonna
-
45:06 - 45:08give away too much by, kind of,
-
45:08 - 45:11delving too much in it, but the handout does a pretty good job of talking you through what
-
45:11 - 45:15you're doing, but it's a, kind of, neat way to think about that's how the GPS, the MapQuest, and things like
-
45:15 - 45:16that are working is
-
45:16 - 45:17you're at a site;
-
45:17 - 45:21you have a goal, and you have this information that you're trying to
-
45:21 - 45:24guide that search, and you're gonna use heuristics about
-
45:24 - 45:28appears to be the, kind of, optimal path to, kind of, pick that, and go with it,
-
45:28 - 45:31and see where it leads you. So
-
45:31 - 45:32that will be your
-
45:32 - 45:40task for this week. What we'll talk about on Friday is caching.
- Title:
- Lecture 23 | Programming Abstractions (Stanford)
- Description:
-
Lecture 23 by Julie Zelenski for the Programming Abstractions Course (CS106B) in the Stanford Computer Science Department.
Julie shows a YouTube video of Barack Obama answering a question about what kind of sorting algorithm he would use to sort a list of data. She also gives several examples of problems that are capable of being solved with sorting. She goes on to say that if you think about certain problems as graphs, finding the solution is sometimes easier as well as easier to understand. To be able to do this, she explains how to represent graphs in C++.
Complete Playlist for the Course:
http://www.youtube.com/view_play_list?p=FE6E58F856038C69CS 106B Course Website:
http://cs106b.stanford.eduStanford Center for Professional Development:
http://scpd.stanford.edu/Stanford University:
http://www.stanford.edu/Stanford University Channel on YouTube:
http://www.youtube.com/stanforduniversity/ - Video Language:
- English
- Duration:
- 45:51
N. Ueda edited English subtitles for Lecture 23 | Programming Abstractions (Stanford) | ||
Eunjeong_Kim added a translation | ||
Eunjeong_Kim added a translation |