0:00:00.000,0:00:11.000 0:00:11.000,0:00:14.000 This presentation is delivered by the Stanford Center for Professional 0:00:14.000,0:00:25.000 Development. 0:00:25.000,0:00:27.000 Okay, we got volume and everything. All 0:00:27.000,0:00:28.000 right, here we go. Lots of you think your 0:00:28.000,0:00:32.000 learning things that aren't gonna actually someday help you in your future 0:00:32.000,0:00:34.000 career. Let's say you plan to be president. You 0:00:34.000,0:00:38.000 think, ''Should I take 106b; should I not take 106b?'' Watch this interview and 0:00:38.000,0:00:39.000 you will see. [Video] 0:00:39.000,0:00:41.000 We have questions, and we ask 0:00:41.000,0:00:45.000 our candidates questions, and this one is from Larry Schwimmer. What 0:00:45.000,0:00:48.000 - you 0:00:48.000,0:00:52.000 guys think I'm kidding; it's right here. 0:00:52.000,0:00:59.000 What is the most efficient way to sort a million 32-bit integers? 0:00:59.000,0:01:01.000 Well - I'm 0:01:01.000,0:01:07.000 sorry. Maybe we should - that's not a - I think the bubble sort 0:01:07.000,0:01:09.000 would be the wrong way to go. Come on; who told him this? 0:01:09.000,0:01:14.000 There you go. So, apparently - 0:01:14.000,0:01:15.000 the question was 0:01:15.000,0:01:16.000 actually 0:01:16.000,0:01:19.000 by a former student of mine, in 0:01:19.000,0:01:22.000 fact, Larry Schwimmer. So apparently, he's been prepped well. As part of going on the campaign trail, not only do you 0:01:22.000,0:01:25.000 have to know all your policy statements, you also have to have your 0:01:25.000,0:01:29.000 N2 and login sorts kept straight in your mind, so - just so 0:01:29.000,0:01:31.000 you know. So let me give you 0:01:31.000,0:01:35.000 some examples of things that are graphs, just to get you thinking about 0:01:35.000,0:01:36.000 0:01:36.000,0:01:39.000 the data structure that we're gonna be working with and playing around with 0:01:39.000,0:01:40.000 today. Sometimes you 0:01:40.000,0:01:44.000 think of graphs of being those, like, bar graphs, right, your graphing 0:01:44.000,0:01:46.000 histograms. This is actually a different kind of graph. The computer scientist's form of 0:01:46.000,0:01:50.000 graph is just a generalized, recursive data structure 0:01:50.000,0:01:51.000 that, kind of, 0:01:51.000,0:01:54.000 starts with something that looks a little bit like a tree and extends it to 0:01:54.000,0:01:55.000 this notion that there are 0:01:55.000,0:01:57.000 any number of nodes, in 0:01:57.000,0:02:00.000 this case, represented as the cities in this airline route map, 0:02:00.000,0:02:04.000 and then there are connections between them, sometimes called edges or arcs that wire up 0:02:04.000,0:02:08.000 the connections that in the case, for example, of this graph are showing you 0:02:08.000,0:02:10.000 which cities have connecting flights. You have a direct 0:02:10.000,0:02:15.000 flight that takes you from Las Vegas to Phoenix, right, one from Phoenix to Oklahoma 0:02:15.000,0:02:15.000 City. 0:02:15.000,0:02:18.000 There is not a direct flight from Phoenix to Kansas City, but there's a path 0:02:18.000,0:02:21.000 by which you could get there by taking a sequence 0:02:21.000,0:02:25.000 of connecting flights, right, going from Phoenix, to Oklahoma City, to Kansas City is 0:02:25.000,0:02:26.000 one way to get there. 0:02:26.000,0:02:30.000 And so the entire, kind of, map of airline routes forms a graph of the city, 0:02:30.000,0:02:34.000 and there are a lot of interesting questions that having represented that data you might want to 0:02:34.000,0:02:35.000 answer about finding 0:02:35.000,0:02:36.000 flights that are 0:02:36.000,0:02:40.000 cheap, or meet your time specification, or get you where you want to go, 0:02:40.000,0:02:47.000 and that this data would be, kind of, primary in solving that problem. 0:02:47.000,0:02:48.000 Another one, 0:02:48.000,0:02:49.000 right, another 0:02:49.000,0:02:52.000 idea of something that's represented as a graph is something that would support 0:02:52.000,0:02:54.000 the notion of word ladders, those little puzzles where you're 0:02:54.000,0:02:56.000 given the word ''chord,'' 0:02:56.000,0:03:00.000 and you want to change it into the word ''worm,'' and you have these rules that say, well, you 0:03:00.000,0:03:02.000 can change one letter at a time. 0:03:02.000,0:03:04.000 And so the 0:03:04.000,0:03:07.000 connections in the case of the word letter have to do with words that are 0:03:07.000,0:03:12.000 one character different from their neighbors, have direct connections or arcs 0:03:12.000,0:03:16.000 between them. Each of the nodes themselves represents a word, and then 0:03:16.000,0:03:19.000 paths between that represent word ladders. That if you can go from here 0:03:19.000,0:03:22.000 through a sequence of steps directly connecting your way, each of those stepping 0:03:22.000,0:03:26.000 stones is a word in the chain that makes a word ladder. So 0:03:26.000,0:03:29.000 things you might want to solve there is what's the shortest word ladder? How many different 0:03:29.000,0:03:31.000 word ladders - how many different ways can I go from this word to that one 0:03:31.000,0:03:36.000 could be answered by searching around in a space like this. 0:03:36.000,0:03:39.000 Any kind of prerequisite structure, 0:03:39.000,0:03:42.000 such as the one for the major that might say you need to take this class before you take 0:03:42.000,0:03:45.000 that class, and this class before that class, right, 0:03:45.000,0:03:46.000 forms a 0:03:46.000,0:03:49.000 structure that also fits into the main graph. 0:03:49.000,0:03:52.000 In this case, there's a connection to those arcs. 0:03:52.000,0:03:54.000 That the prerequisite is such that 0:03:54.000,0:03:58.000 you take 106a, and it leads into 106b, and that connection doesn't really go in 0:03:58.000,0:04:01.000 the reverse, right? You don't start in 106b and move backwards. 0:04:01.000,0:04:05.000 Whereas in the case, for example, of the word ladder, all of these arcs are what we called undirected, 0:04:05.000,0:04:09.000 right? You can traverse from wood to word and back again, right? 0:04:09.000,0:04:10.000 They're equally 0:04:10.000,0:04:12.000 0:04:12.000,0:04:15.000 visitable or neighbors, in this case. So there may be a situation where you have directed 0:04:15.000,0:04:20.000 arcs where they really imply a one-way directionality through the paths. 0:04:20.000,0:04:22.000 Another one, lots and lots of 0:04:22.000,0:04:23.000 arrows that 0:04:23.000,0:04:25.000 tell you about the lifecycle of 0:04:25.000,0:04:26.000 0:04:26.000,0:04:30.000 Stanford relationships all captured on one 0:04:30.000,0:04:31.000 slide 0:04:31.000,0:04:34.000 in, sort of, the flowchart. So flowcharts have a lot of things about directionality. Like, you're 0:04:34.000,0:04:37.000 in this state, and you can move to this state, so that's your neighboring state, and 0:04:37.000,0:04:39.000 the arcs in there connect those things. 0:04:39.000,0:04:43.000 And so, as you will see, like, you get to start over here and, like, be single and love 0:04:43.000,0:04:47.000 it, and then you can, kind of, flirt with various degrees of seriousness, 0:04:47.000,0:04:50.000 and then eventually there evolves a trombone and serenade, which, you know, 0:04:50.000,0:04:53.000 you can't go back to flirting aimlessly after the trombone and serenade; there's no 0:04:53.000,0:04:55.000 arc that direction. 0:04:55.000,0:04:57.000 When somebody gets the trombone out, 0:04:57.000,0:04:59.000 you know, you take them seriously, 0:04:59.000,0:05:02.000 or you completely blow them off. There's the [inaudible] there. 0:05:02.000,0:05:04.000 And then there's dating, and then there's 0:05:04.000,0:05:08.000 two-phase commit, which is a very important part of any computer scientist's education, 0:05:08.000,0:05:10.000 and then, potentially, other 0:05:10.000,0:05:13.000 tracks of children and what not. But it does 0:05:13.000,0:05:15.000 show you, kind of, places you can go, right? 0:05:15.000,0:05:18.000 It turns out you actually don't get to go from flirt aimlessly to have baby, so 0:05:18.000,0:05:20.000 write that down - 0:05:20.000,0:05:23.000 not allowed in this graph. And 0:05:23.000,0:05:28.000 then it turns out some things that you've already seen are actually graphs in disguise. 0:05:28.000,0:05:32.000 I didn't tell you when I gave you the maze problem and its solution, the solving and 0:05:32.000,0:05:33.000 building of a maze, 0:05:33.000,0:05:37.000 that, in fact, what you're working on though is something that is a graph. 0:05:37.000,0:05:39.000 That a perfect maze, right, 0:05:39.000,0:05:40.000 so we started with this 0:05:40.000,0:05:44.000 fully disconnected maze. If you remember how the Aldis/Broder algorithm worked, it's like 0:05:44.000,0:05:47.000 you have all these nodes that have no connections between them, 0:05:47.000,0:05:51.000 and then you went bopping around breaking down walls, which is effectively 0:05:51.000,0:05:53.000 connecting up the nodes to each other, 0:05:53.000,0:05:57.000 and we kept doing that until actually all of the nodes were connected. So, in effect, we 0:05:57.000,0:06:00.000 were building a graph out of this big grid of cells 0:06:00.000,0:06:03.000 by knocking down those walls to build the connections, and when we were done, then 0:06:03.000,0:06:07.000 we made it so that it was possible to trace paths starting from that lower corner 0:06:07.000,0:06:10.000 and working our way through the neighbors all the way over to there. 0:06:10.000,0:06:13.000 So in the case of this, if you imagine that each of these cells was broken out in this little 0:06:13.000,0:06:16.000 thing, there'd be these neighboring connections where this guy has this 0:06:16.000,0:06:19.000 neighbor but no other ones because actually it has no 0:06:19.000,0:06:22.000 other directions you can move from there, but some of them, for example like this 0:06:22.000,0:06:23.000 one, 0:06:23.000,0:06:26.000 has a full set of four neighbors you can reach 0:06:26.000,0:06:28.000 depending on which walls are intact or not. 0:06:28.000,0:06:29.000 And so the 0:06:29.000,0:06:31.000 creation of that was creating a graph out 0:06:31.000,0:06:34.000 of a set of disconnected nodes, and then solving it is searching that 0:06:34.000,0:06:38.000 graph for a path. Here's one that is a little horrifying that came from the newspaper, which is social networking, kind of, a very big [inaudible] of, like, on the net who do you know, and who do they know, and how do those connections, like, linked in maybe help you get a job. This is one that was actually based on the liaisons, let's say, of a group of high school students, right, and it turns out there was a lot of more interconnected in that graph than I would expect, or appreciate, or approve of but interesting to, kind of, look at. So 0:06:38.000,0:06:39.000 let's talk, like, concretely. 0:06:39.000,0:06:43.000 How are we gonna make stuff work on graphs? Graphs turn out to actually - by showing you that, I'm trying to 0:06:43.000,0:06:46.000 get you this idea that a lot of things are really just graphs in disguise. That a lot 0:06:46.000,0:06:50.000 of the problems you may want to solve, if you can represent it as a graph, then things 0:06:50.000,0:06:53.000 you learn about how to manipulate graphs will help you to solve that kind of 0:06:53.000,0:06:54.000 problem. 0:06:54.000,0:06:59.000 So the basic idea of a graph, it is a recursive data structure based on the node where 0:06:59.000,0:07:02.000 nodes have connections to other nodes. 0:07:02.000,0:07:04.000 Okay. So that's, kind of, like a tree, 0:07:04.000,0:07:07.000 but it actually is more freeform than a tree, right? 0:07:07.000,0:07:11.000 In a tree, right, there's that single root node that has, kind of, a special place, and 0:07:11.000,0:07:15.000 there's this restriction about the connections that there's a single path 0:07:15.000,0:07:17.000 from every other node in the tree 0:07:17.000,0:07:20.000 starting from the root that leads to it. So you never can get there by 0:07:20.000,0:07:24.000 multiple ways, and there's no cycles, and it's all connected, and things like 0:07:24.000,0:07:25.000 that. 0:07:25.000,0:07:26.000 So, in terms of a graph, 0:07:26.000,0:07:29.000 it just doesn't place those restrictions. It says there's a node. 0:07:29.000,0:07:32.000 It has any number of pointers to other nodes. It potentially could even 0:07:32.000,0:07:36.000 have zero pointers to other's nodes, so it could be on a little island of its own out 0:07:36.000,0:07:38.000 here that has no incoming or outgoing flights, right? You 0:07:38.000,0:07:40.000 have to swim if you want to get there. 0:07:40.000,0:07:42.000 0:07:42.000,0:07:45.000 And then there's also not a rule that says that you can't have 0:07:45.000,0:07:47.000 multiple ways to get to the same 0:07:47.000,0:07:50.000 node. So if you're at B, and you're interested in getting to C, 0:07:50.000,0:07:52.000 you could take the direction connection here, 0:07:52.000,0:07:54.000 but you could also take a longer path 0:07:54.000,0:07:58.000 that bops through A and hits to C, and so there's more than one way to get there, 0:07:58.000,0:07:59.000 0:07:59.000,0:08:02.000 which would not be true in a tree structure where it has that constraint, 0:08:02.000,0:08:05.000 but that adds some interesting challenges to solving problems in the graphs 0:08:05.000,0:08:08.000 because there actually are so many different ways you can 0:08:08.000,0:08:11.000 get to the same place. We have to be careful avoiding 0:08:11.000,0:08:13.000 getting into circularities where we're 0:08:13.000,0:08:16.000 doing that, and also not redundantly doing work we've already done before because 0:08:16.000,0:08:18.000 we've visited it previously. 0:08:18.000,0:08:21.000 There can be cycles where 0:08:21.000,0:08:22.000 you can 0:08:22.000,0:08:24.000 completely come around. I this one, actually, 0:08:24.000,0:08:26.000 doesn't have a cycle in it. 0:08:26.000,0:08:27.000 Nope, it doesn't, 0:08:27.000,0:08:31.000 but if I add it - I could add a link, for example, from 0:08:31.000,0:08:34.000 C back to B, and then there would be this place you could just, kind of, keep 0:08:34.000,0:08:36.000 rolling around in the B, A, C 0:08:36.000,0:08:38.000 area. 0:08:38.000,0:08:41.000 So we call each of those guys, the circles here, the nodes, right? 0:08:41.000,0:08:44.000 The connection between them arcs. We 0:08:44.000,0:08:47.000 will talk about the arcs as being directed and undirected in different situations 0:08:47.000,0:08:51.000 where we imply that there's a directionality; you can travel the arc only one-way, 0:08:51.000,0:08:54.000 or in cases where it's undirected, it means it goes both ways. 0:08:54.000,0:08:56.000 We'll talk about two nodes being connected, 0:08:56.000,0:08:58.000 meaning there is a direct connection, an arc, between them, 0:08:58.000,0:09:02.000 and if they are not directly connected, there may possibly be a path 0:09:02.000,0:09:05.000 between them. A sequence of arcs forms a path that you can follow to get 0:09:05.000,0:09:07.000 from one node to another, 0:09:07.000,0:09:11.000 and then cycles just meaning a path that revisits one of the nodes that 0:09:11.000,0:09:14.000 was previously traveled on that path. So I've got 0:09:14.000,0:09:16.000 a little terminology. 0:09:16.000,0:09:18.000 Let's talk a little bit about how we're gonna represent it. 0:09:18.000,0:09:22.000 So before we start making stuff happen on them, it's, like, well, how does this - 0:09:22.000,0:09:26.000 in C++, right, what's the best way, or what are the reasonable ways you 0:09:26.000,0:09:27.000 might represent 0:09:27.000,0:09:30.000 this kind of connected structure? 0:09:30.000,0:09:31.000 So if I have 0:09:31.000,0:09:36.000 this four-node graph, A, B, C, D over where with the connections that are shown with 0:09:36.000,0:09:38.000 direction, in this case, I have directed arcs. 0:09:38.000,0:09:41.000 That one way to think of it is that what a 0:09:41.000,0:09:45.000 graph really is is a set of nodes and a set of arcs, and by set, I mean, sort of, just the 0:09:45.000,0:09:48.000 generalization. There might be a vector of arcs; there might be a vector of 0:09:48.000,0:09:51.000 nodes, whatever, but some collection of arcs, some collection of nodes, 0:09:51.000,0:09:53.000 and that the 0:09:53.000,0:09:56.000 nodes might have information about what location they represent, or 0:09:56.000,0:09:59.000 what word, or what entity, you know, in a social network that they are, 0:09:59.000,0:10:03.000 and then the arcs would show who is directly connected to whom. 0:10:03.000,0:10:06.000 And so one way to manage that, right, is just to have two independent collections - a collection 0:10:06.000,0:10:08.000 of nodes and a 0:10:08.000,0:10:11.000 collection of arcs, and that when I need to know things about connections, I'll have to, kind of, use both 0:10:11.000,0:10:12.000 in tandem. 0:10:12.000,0:10:14.000 So if I want to know if A is connected to B, 0:10:14.000,0:10:18.000 I might actually have to just walk through the entire set of arcs trying to find 0:10:18.000,0:10:22.000 one that connects A and B as its endpoints. If I want to see if 0:10:22.000,0:10:25.000 C is connected to B, same thing, just walk down the whole set. 0:10:25.000,0:10:27.000 0:10:27.000,0:10:29.000 0:10:29.000,0:10:31.000 That's probably the simplest thing to do, right, is to just, kind of, 0:10:31.000,0:10:35.000 have them just be maintained as the two things that are used in conjunction. 0:10:35.000,0:10:37.000 More likely, what you're gonna want to do 0:10:37.000,0:10:39.000 is provide some access that 0:10:39.000,0:10:42.000 when at a current node, you're currently trying to do some processing 0:10:42.000,0:10:47.000 from the Node B, it would be convenient if you could easily access those nodes 0:10:47.000,0:10:51.000 that emanate or outgo from the B node. That's typically the thing you're trying 0:10:51.000,0:10:52.000 to do is 0:10:52.000,0:10:55.000 at B try to visit its neighbors, 0:10:55.000,0:10:58.000 and rather that having to, kind of, pluck them out of the entire collection, 0:10:58.000,0:11:01.000 it might be handy if they were already, kind of, stored in such a way to make it 0:11:01.000,0:11:03.000 easy for you to get to that. 0:11:03.000,0:11:06.000 So the two ways that try to represent adjacency 0:11:06.000,0:11:10.000 in a more efficient way are the adjacency list and the adjacency matrix. 0:11:10.000,0:11:14.000 So in an adjacency list representation, we have a way of associating with each node, 0:11:14.000,0:11:17.000 probably just by storing it in the node structure itself, 0:11:17.000,0:11:21.000 those arcs that outgo from there. So, in this case, A has as direct 0:11:21.000,0:11:22.000 connection to the B 0:11:22.000,0:11:23.000 and C Nodes, 0:11:23.000,0:11:26.000 and so I would have that information stored with the A Node, which is my 0:11:26.000,0:11:29.000 vector or set of outgoing connections. 0:11:29.000,0:11:33.000 Similarly, B has one outgoing connection, which goes to D. C has an outgoing connection to D, and 0:11:33.000,0:11:37.000 then D has outgoing connections that lead back to A and 0:11:37.000,0:11:40.000 to B. So at a particular node I could say, well, who's my neighbors? 0:11:40.000,0:11:41.000 It would be 0:11:41.000,0:11:46.000 easily accessible and not have to be retrieved, or searched, or filtered. 0:11:46.000,0:11:48.000 Another way of doing that 0:11:48.000,0:11:51.000 is to realize that, in a sense, what we have is this, kind of, 0:11:51.000,0:11:53.000 end by end grid 0:11:53.000,0:11:56.000 where each node is potentially connected to 0:11:56.000,0:12:00.000 the other N -1 nodes in that graph. And so if I just build 0:12:00.000,0:12:03.000 a 2x2 matrix that's labeled with 0:12:03.000,0:12:06.000 all the node names on this side and all the node names across the top, that the 0:12:06.000,0:12:08.000 intersection of this says 0:12:08.000,0:12:11.000 is there an arc that leads away from A and 0:12:11.000,0:12:14.000 ends at B? Yes, there is. Is there one from A to C? 0:12:14.000,0:12:18.000 And in the places where there is not an existing connection, a self loop or 0:12:18.000,0:12:19.000 0:12:19.000,0:12:23.000 just a connection that doesn't exist, right, I have an empty slot. So maybe these would be Booleans 0:12:23.000,0:12:25.000 true and false or something, or maybe there'd be some more information here 0:12:25.000,0:12:27.000 about the distance 0:12:27.000,0:12:30.000 or other associated arc properties for that 0:12:30.000,0:12:31.000 particular connection. 0:12:31.000,0:12:34.000 In this case, it's actually very easy then to actually do the really quick 0:12:34.000,0:12:35.000 search of 0:12:35.000,0:12:38.000 I'm at A and I want to know if I can get to B, then it's really just a 0:12:38.000,0:12:41.000 matter of reaching right into the 0:12:41.000,0:12:45.000 slot in constant time in that matrix to pull it out, and 0:12:45.000,0:12:47.000 so 0:12:47.000,0:12:49.000 this one involves a lot of searching to find things. This one involves, perhaps, a 0:12:49.000,0:12:52.000 little bit of searching. If I want to know if A is connected to B, I might have to look 0:12:52.000,0:12:55.000 through all its outgoing connections, right? This one gives me that direct 0:12:55.000,0:12:56.000 access, 0:12:56.000,0:13:00.000 but the tradeoffs here has to do with where the space starts building up, right? 0:13:00.000,0:13:03.000 A full NxN matrix could be very big, 0:13:03.000,0:13:06.000 and sometimes we're allocating space as though there are a full set of 0:13:06.000,0:13:09.000 connections between all the nodes, every node connected to every other, 0:13:09.000,0:13:12.000 and in the cases where a lot of those things are false, there's a lot of 0:13:12.000,0:13:13.000 capacity 0:13:13.000,0:13:16.000 that we have set aside that we're not really tapping into. 0:13:16.000,0:13:19.000 And so in the case of a graph that we call dense, 0:13:19.000,0:13:23.000 where there's a lot of connections, it will use a lot of that capacity, and it might make sense 0:13:23.000,0:13:27.000 in a graph that's more sparse, where there's a few connections. So you have 0:13:27.000,0:13:31.000 thousands of nodes, but maybe only three or four on average are connected to one. 0:13:31.000,0:13:35.000 Then having allocated 1,000 slots of which to only fill in three 0:13:35.000,0:13:36.000 might become 0:13:36.000,0:13:40.000 rather space inefficient. You might prefer an adjacency list representation that 0:13:40.000,0:13:41.000 can get you 0:13:41.000,0:13:44.000 to the ones you're actually using versus the other, 0:13:44.000,0:13:47.000 but in terms of the adjacency matrices is very fast, 0:13:47.000,0:13:48.000 right? A lot of space thrown at this 0:13:48.000,0:13:51.000 gives you this immediate 0,1 access to know 0:13:51.000,0:13:55.000 who your neighbors are. So 0:13:55.000,0:13:59.000 we're gonna use the adjacency list, kind of, strategy here 0:13:59.000,0:14:02.000 for the code we're gonna do today, and it's, kind 0:14:02.000,0:14:07.000 of, a good compromise between the two alternatives that we've seen there. 0:14:07.000,0:14:08.000 So I have a struct node. 0:14:08.000,0:14:11.000 It has some information for the node. Given that I'm not being very specific 0:14:11.000,0:14:14.000 about what the graph is, I'm just gonna, kind of, leave that unspecified. It might be that it's 0:14:14.000,0:14:17.000 the city name. It might be that it's a word. It might be that it's a person in a 0:14:17.000,0:14:19.000 social network, you know, 0:14:19.000,0:14:23.000 some information that represents what this node is an entity, 0:14:23.000,0:14:28.000 and then there's gonna be a set of arcs or a set of nodes it's connected to, 0:14:28.000,0:14:32.000 and so I'm using vector here. I could be using set. I could be using a 0:14:32.000,0:14:35.000 raw array, or a link list, or any of these things. 0:14:35.000,0:14:38.000 I need to use some unbounded collection though because there's no guarantee that 0:14:38.000,0:14:42.000 they will be 0, or 1, or 2 the way there is in a link list or a tree where there's a 0:14:42.000,0:14:45.000 specified number of outgoing links. There can be any number of them, so I'm just leaving 0:14:45.000,0:14:46.000 0:14:46.000,0:14:47.000 a 0:14:47.000,0:14:51.000 variable size collection here to do that work for me. 0:14:51.000,0:14:52.000 The graph itself, 0:14:52.000,0:14:55.000 so I have this idea of all the nodes in the graph, right, would each get a new 0:14:55.000,0:14:58.000 node structure and then be wired up to the other nodes that they are connected 0:14:58.000,0:15:00.000 to, 0:15:00.000,0:15:02.000 but then when I'm trying to operate on that graph, 0:15:02.000,0:15:06.000 I can't just take one pointer and say here's a pointer to some node in the 0:15:06.000,0:15:09.000 graph and say this is - and from here you have accessed everything. In the way 0:15:09.000,0:15:11.000 that a tree, 0:15:11.000,0:15:13.000 the only thing we need to keep track of is the pointer to the root, 0:15:13.000,0:15:17.000 or a link list, the only thing we keep a pointer to, typically, is that frontmost 0:15:17.000,0:15:19.000 cell, and from there, we can reach everything else. 0:15:19.000,0:15:23.000 There is no special head or root cell in a graph. 0:15:23.000,0:15:27.000 A graph is this 0:15:27.000,0:15:30.000 being, kind of, a crazy collection without a lot of rules about 0:15:30.000,0:15:35.000 how they are connected. In fact, it's not even guaranteed that they totally are 0:15:35.000,0:15:36.000 connected. 0:15:36.000,0:15:38.000 That I can't 0:15:38.000,0:15:41.000 guarantee that if I had a pointer, for example, to C, 0:15:41.000,0:15:44.000 right, I may or may not be able to reach all the nodes from there. 0:15:44.000,0:15:49.000 If I had a pointer to A, right, A can get to C and B, and then also 0:15:49.000,0:15:52.000 down to D, but, for example, there could just be an E node over here that was only 0:15:52.000,0:15:55.000 connected to C or not connected to anything at all for that matter, connected 0:15:55.000,0:15:57.000 to F in their own little island. 0:15:57.000,0:16:01.000 That there is no way to identify some special node from which you could reach 0:16:01.000,0:16:02.000 all the nodes. 0:16:02.000,0:16:04.000 There isn't a special root node, 0:16:04.000,0:16:07.000 and you can't even just arbitrarily pick one to be the root because actually there's no 0:16:07.000,0:16:10.000 guarantee that it will be connected to all of the others. So it would not give you 0:16:10.000,0:16:13.000 access to the full entirety of your graph if you just picked on and said I want anything 0:16:13.000,0:16:16.000 reachable from here, it might not get you the whole thing. So, 0:16:16.000,0:16:18.000 typically, what you need to really operate on is 0:16:18.000,0:16:21.000 the entire collection of node stars. You'll have a set or a vector 0:16:21.000,0:16:24.000 that contains all of the nodes that are in the graph, 0:16:24.000,0:16:26.000 and then within that, there's a bunch of other connections that are 0:16:26.000,0:16:35.000 being made on the graph connectivity that you're also exploring. 0:16:35.000,0:16:37.000 So let's make it a little bit more concrete 0:16:37.000,0:16:39.000 0:16:39.000,0:16:40.000 and talk a little bit about 0:16:40.000,0:16:43.000 what really is a node; what really is an arc? 0:16:43.000,0:16:46.000 There may be, actually, be some more information I need to store than just a 0:16:46.000,0:16:48.000 node is connected to other nodes. 0:16:48.000,0:16:51.000 So in the case of a list or a tree, 0:16:51.000,0:16:54.000 the next field and the left and the right field are really just pointers for the 0:16:54.000,0:16:56.000 purpose of organization. 0:16:56.000,0:16:58.000 That's the pointer to the next cell. 0:16:58.000,0:17:02.000 There's no data that is really associated with that link itself. 0:17:02.000,0:17:05.000 That's not true in the case of a graph. 0:17:05.000,0:17:09.000 That often what the links that connect up the nodes 0:17:09.000,0:17:12.000 actually do have a real role to play in terms of being data. 0:17:12.000,0:17:14.000 It may be that what they tell you is 0:17:14.000,0:17:18.000 what road you're taking here, or how long this path is, or how much this 0:17:18.000,0:17:22.000 flight costs, or what time this flight leaves, or how full this flight already is, 0:17:22.000,0:17:24.000 or who knows, you know, just other information 0:17:24.000,0:17:27.000 that is more than just this nodes connected to that one. It's, like, how is it 0:17:27.000,0:17:30.000 connected; what is it connected by, and what are the details of how that 0:17:30.000,0:17:31.000 connection 0:17:31.000,0:17:32.000 0:17:32.000,0:17:34.000 is being represented? 0:17:34.000,0:17:38.000 So it's likely that what you really want is not just pointers to other nodes but 0:17:38.000,0:17:42.000 information about that actual link stored with an arc structure. 0:17:42.000,0:17:44.000 So, typically, we're gonna have 0:17:44.000,0:17:45.000 an arc 0:17:45.000,0:17:48.000 which has information about that arc, how long, how much it costs, you know, 0:17:48.000,0:17:50.000 what flight number it is, 0:17:50.000,0:17:52.000 that will have pointers to the start and end node that it is connecting. 0:17:52.000,0:17:54.000 The node 0:17:54.000,0:17:57.000 then has a collection of arcs, 0:17:57.000,0:18:00.000 and those arcs are expected to, in the adjacency list form, will all be arcs 0:18:00.000,0:18:02.000 that start at this node. 0:18:02.000,0:18:05.000 So their start node is equal to the one that's holding onto to them, and then 0:18:05.000,0:18:06.000 they have an end 0:18:06.000,0:18:10.000 pointer that points to the node at the other end of the arc. Well, 0:18:10.000,0:18:13.000 this gets us into a little C++ bind 0:18:13.000,0:18:16.000 because I've described these data structures 0:18:16.000,0:18:17.000 in a way 0:18:17.000,0:18:22.000 that they both depend on each other; that an arc has pointers to the start and the end 0:18:22.000,0:18:22.000 node. 0:18:22.000,0:18:27.000 A node has a collection of arcs, which is probably a vector or a set of arc pointers, 0:18:27.000,0:18:28.000 and so 0:18:28.000,0:18:31.000 I'm starting to define the structure for arc, 0:18:31.000,0:18:33.000 and then I want to talk about node in it, 0:18:33.000,0:18:35.000 and I think, oh, okay, well, then I better define node first because C++ always likes to 0:18:35.000,0:18:38.000 see things in order. Remember, it always wants to see A, 0:18:38.000,0:18:42.000 and then B if B uses A, and so in terms of how your functions and data 0:18:42.000,0:18:45.000 structures work, you've always gotten this idea like, well, go do the one first. Well, if I say, 0:18:45.000,0:18:48.000 okay, arc's gonna need node. I start to write arc, and I say, oh, I 0:18:48.000,0:18:50.000 need to talk about node. Well, I better put node up here, 0:18:50.000,0:18:54.000 right? But then when I'm starting to write node, I start talking about arc, and they have a 0:18:54.000,0:18:57.000 circular reference to each other, right? They both 0:18:57.000,0:19:00.000 want to depend on the other one before we've gotten around to telling the 0:19:00.000,0:19:01.000 compiler about it. 0:19:01.000,0:19:04.000 One of them has to go first, right? What are we gonna do? 0:19:04.000,0:19:08.000 What we need to do is use the C++ forward reference 0:19:08.000,0:19:09.000 as a 0:19:09.000,0:19:12.000 little hint to the compiler 0:19:12.000,0:19:13.000 that 0:19:13.000,0:19:17.000 we are gonna be defining a Node T structure in a little bit; 0:19:17.000,0:19:18.000 we're gonna get to it, 0:19:18.000,0:19:22.000 but first we're gonna define the Arc T structure that uses that Node T, and then 0:19:22.000,0:19:25.000 we're gonna get around to it. So we're gonna give it, like, a little heads up that says 0:19:25.000,0:19:29.000 there later will be an Act 2 of this play. We're gonna introduce a 0:19:29.000,0:19:31.000 character called Node T. 0:19:31.000,0:19:33.000 So you can now 0:19:33.000,0:19:34.000 start talking about Node 0:19:34.000,0:19:37.000 T in some simple ways 0:19:37.000,0:19:40.000 based on your agreement to the compiler that you plan on telling it more 0:19:40.000,0:19:41.000 about Node T later. 0:19:41.000,0:19:45.000 So the struct Node T says there will be a struct called Node T. 0:19:45.000,0:19:47.000 Okay, the compiler says, okay, I'll write that down. 0:19:47.000,0:19:51.000 You start defining struct Arc T, and it says, oh, I see you have these pointers 0:19:51.000,0:19:54.000 to the node T. Okay. Well, you told me there'd be a struct like that; I guess that'll work out. 0:19:54.000,0:19:58.000 And now you told it what struct Node T is. It's, like, oh, I have a vector 0:19:58.000,0:19:59.000 with these pointers to arc, 0:19:59.000,0:20:02.000 and then it says, okay, now I see the whole picture, and 0:20:02.000,0:20:03.000 it all worked out. 0:20:03.000,0:20:08.000 So it's just a little bit of a 0:20:08.000,0:20:11.000 requirement of how the C++ compiler likes to see stuff in order without ever having 0:20:11.000,0:20:16.000 to go backwards to check anything out again. Okay. So 0:20:16.000,0:20:19.000 I got myself some node structures, got 0:20:19.000,0:20:21.000 some arc structures; they're all 0:20:21.000,0:20:22.000 working together. 0:20:22.000,0:20:24.000 I'm gonna try to do some traversals. 0:20:24.000,0:20:26.000 Try to do some operations that 0:20:26.000,0:20:29.000 work their way around the graph. 0:20:29.000,0:20:30.000 So 0:20:30.000,0:20:33.000 tree traversals, right, the pre, and post, and in order 0:20:33.000,0:20:36.000 are ways that you start at the root, and you visit all the nodes on the tree, 0:20:36.000,0:20:40.000 a very common need to, kind of, process, to delete nodes, 0:20:40.000,0:20:43.000 to print the nodes, to whatnot. So we might want to do the same thing on graphs. I've got a 0:20:43.000,0:20:47.000 graph out there, and I'd like to go exploring it. Maybe I just want to print, and if 0:20:47.000,0:20:50.000 I'm in the Gates building, what are all the places that I can reach from 0:20:50.000,0:20:53.000 the Gates building? Where can I go to; what 0:20:53.000,0:20:54.000 options do I have? 0:20:54.000,0:20:57.000 What I'm gonna do is a traversal 0:20:57.000,0:21:01.000 starting at a node and, kind of, working my way outward to the visible, reachable 0:21:01.000,0:21:04.000 nodes that I can follow those paths to get there. 0:21:04.000,0:21:07.000 We're gonna look at two different ways to do this. 0:21:07.000,0:21:09.000 One is Depth-first and one is Breadth-first, 0:21:09.000,0:21:13.000 and I'll show you both algorithms, and they 0:21:13.000,0:21:15.000 will visit the same nodes; 0:21:15.000,0:21:19.000 when all is done and said, they will visit all the reachable nodes starting from a 0:21:19.000,0:21:20.000 position, 0:21:20.000,0:21:21.000 but they will visit them in different order. 0:21:21.000,0:21:25.000 So just like the pre/post in order tree traversals, they visit all the same 0:21:25.000,0:21:28.000 notes. It's just a matter of when they get around to doing it 0:21:28.000,0:21:29.000 is how 0:21:29.000,0:21:32.000 we distinguish depth and breadth-first traversals. 0:21:32.000,0:21:34.000 Because we have this graph structure 0:21:34.000,0:21:40.000 that has this loose connectivity and the possibility of cycles and multiple paths 0:21:40.000,0:21:41.000 to the same node, 0:21:41.000,0:21:44.000 we are gonna have to be a little bit careful at how we do our work to make 0:21:44.000,0:21:48.000 sure that we don't end up getting stuck in some infinite loop when we keep 0:21:48.000,0:21:51.000 going around the ABC cycle in a particular graph. 0:21:51.000,0:21:55.000 That we need to realize when we're revisiting something we've seen before, 0:21:55.000,0:21:56.000 and then not 0:21:56.000,0:21:57.000 0:21:57.000,0:21:59.000 trigger, kind of, an 0:21:59.000,0:22:03.000 exploration we've already done. 0:22:03.000,0:22:06.000 So let's look at depth-first first. 0:22:06.000,0:22:09.000 This is probably the simpler of the two. They're both, actually, pretty simple. 0:22:09.000,0:22:13.000 Depth-first traversal uses recursion to do its work, 0:22:13.000,0:22:18.000 and the idea is that you pick a starting node - let's say I want to start at Gates, 0:22:18.000,0:22:21.000 and that maybe what I'm trying to find is all the reachable nodes from Gates. 0:22:21.000,0:22:24.000 If I don't have a starting node, then I can just pick one arbitrarily from the 0:22:24.000,0:22:27.000 graph because there is actually no root node of special status, 0:22:27.000,0:22:30.000 and the idea is to go deep. That 0:22:30.000,0:22:33.000 in the case of, for example, a maze, it might be that I 0:22:33.000,0:22:36.000 choose to go north, and then I just keep going north. I just go north, north, north, north, north until 0:22:36.000,0:22:40.000 I run into a dead wall, and then I say, oh, I have to go east, and I go east, east, east, east, east. The idea is to 0:22:40.000,0:22:42.000 just go as far away from 0:22:42.000,0:22:46.000 the original state as I can, just keep going outward, outward, outward, outward, 0:22:46.000,0:22:47.000 outward, outward, outward, 0:22:47.000,0:22:50.000 and eventually I'm gonna run into some dead end, and that dead end could be I 0:22:50.000,0:22:52.000 get to a node that has no outgoing arcs, 0:22:52.000,0:22:54.000 or it could be that I get to a node 0:22:54.000,0:22:57.000 that whose only arcs come back to places I've already been, 0:22:57.000,0:22:58.000 so it cycles back on itself, in 0:22:58.000,0:23:01.000 which case, that also is really a dead end. 0:23:01.000,0:23:03.000 So you go deep. 0:23:03.000,0:23:06.000 You go north, and you see as far as you can get, and 0:23:06.000,0:23:10.000 only after you've, kind of, fully explored everything reachable from starting in 0:23:10.000,0:23:11.000 the north direction 0:23:11.000,0:23:13.000 do you backtrack, 0:23:13.000,0:23:16.000 unmake that decision, and say, well, how about not north; how about east, right? And then 0:23:16.000,0:23:19.000 find everything you can get to from the east, and 0:23:19.000,0:23:22.000 after, kind of, going all through that, come back and try south, and so on. So all of the 0:23:22.000,0:23:23.000 options you have 0:23:23.000,0:23:26.000 exhaustively getting through all your 0:23:26.000,0:23:26.000 neighbors 0:23:26.000,0:23:29.000 in this, kind of, go-deep strategy. Well, 0:23:29.000,0:23:32.000 the important thing is you realize if you've got this recursion going, so what 0:23:32.000,0:23:35.000 you're actually doing, basically, is saying I'm gonna step to my neighbor to the 0:23:35.000,0:23:40.000 right, and I'm gonna explore all the things, so do a depth-first search from there 0:23:40.000,0:23:43.000 to find all the places you can reach, and that one says, well, I'm gonna step through this spot 0:23:43.000,0:23:45.000 and go to one of my neighbors. 0:23:45.000,0:23:48.000 And so we can't do that infinitely without getting into trouble. We need 0:23:48.000,0:23:53.000 to have a base case that stops that recursion, and the two cases that I've 0:23:53.000,0:23:56.000 talked about - one is just when you have no neighbors left to go. 0:23:56.000,0:23:57.000 So when you 0:23:57.000,0:23:59.000 0:23:59.000,0:24:04.000 get to a node that actually, kind of, is a dead end, or that neighbor only has visited 0:24:04.000,0:24:05.000 neighbors 0:24:05.000,0:24:09.000 that surround it. So a 0:24:09.000,0:24:10.000 very simple little piece of 0:24:10.000,0:24:13.000 code that depends on recursion doing its job. 0:24:13.000,0:24:17.000 In this case, we need to know which nodes we have visited, right? We're gonna have to 0:24:17.000,0:24:19.000 have some kind of strategy for saying I 0:24:19.000,0:24:21.000 have been here before. 0:24:21.000,0:24:24.000 In this case, I'm using just a set because that's an easy thing to do. 0:24:24.000,0:24:27.000 I make a set of node pointers, and I update it 0:24:27.000,0:24:29.000 each time I see a new node; I add it to 0:24:29.000,0:24:34.000 that set, and if I ever get to a node who is already previously visited 0:24:34.000,0:24:38.000 from that, there's no reason to do any work from there; we can immediately 0:24:38.000,0:24:41.000 stop. So 0:24:41.000,0:24:43.000 starting that step would be empty. 0:24:43.000,0:24:46.000 If the visit already contains the node that we're currently trying to visit, 0:24:46.000,0:24:50.000 then there's nothing to do. Otherwise, we go ahead and add it, and there's a little node 0:24:50.000,0:24:53.000 here that says we'll do something would occur. What am I trying to do here? Am I trying to print those nodes? Am 0:24:53.000,0:24:56.000 I trying to draw pictures on those nodes, highlight those nodes, you know, 0:24:56.000,0:25:00.000 something I'm doing with them. I don't know what it is, but this is the structure for the depth-first search 0:25:00.000,0:25:01.000 here, and 0:25:01.000,0:25:02.000 then 0:25:02.000,0:25:04.000 for all of the neighbors, 0:25:04.000,0:25:07.000 using the vector of outgoing connections, 0:25:07.000,0:25:08.000 then I run a depth-first search 0:25:08.000,0:25:11.000 on each of those neighbors 0:25:11.000,0:25:14.000 passing that visited set by reference 0:25:14.000,0:25:16.000 through the whole operation. So I will 0:25:16.000,0:25:20.000 always be adding to and modifying this one set, so I will 0:25:20.000,0:25:24.000 be able to know where I am and where I still need to go. 0:25:24.000,0:25:27.000 So if we trace this guy in action - 0:25:27.000,0:25:31.000 if I start arbitrarily from A, I say I'd like to begin with A. 0:25:31.000,0:25:35.000 Then the depth-first search is gonna pick a neighbor, and let's say I happen to just work 0:25:35.000,0:25:38.000 over my neighbors in alphabetical order. 0:25:38.000,0:25:41.000 It would say okay, well, I've picked B. Let's go to everywhere we can get to from B. So A 0:25:41.000,0:25:42.000 gets marked as visited. 0:25:42.000,0:25:44.000 I move onto B; I say, well, 0:25:44.000,0:25:48.000 B, search everywhere you can to from B, and B says okay, well, I need to pick a 0:25:48.000,0:25:50.000 neighbor, how about I pick 0:25:50.000,0:25:51.000 C? And says 0:25:51.000,0:25:53.000 and now, 0:25:53.000,0:25:56.000 having visited C, go everywhere you can get to from C. Well, 0:25:56.000,0:25:59.000 C has no neighbors, right, no outgoing connections whatsoever. 0:25:59.000,0:26:02.000 So C says, well, okay, everywhere you can get to from C is C, 0:26:02.000,0:26:04.000 you know, I'm done. 0:26:04.000,0:26:09.000 That will cause it to backtrack to this place, to B, and B says okay, I explored everywhere I can get to 0:26:09.000,0:26:10.000 from C; 0:26:10.000,0:26:13.000 what other neighbors do I have? B has no other neighbors. So, in fact, B backtracks all the way 0:26:13.000,0:26:15.000 back to A. 0:26:15.000,0:26:19.000 So now A says okay, well, I went everywhere I can get to from B, 0:26:19.000,0:26:20.000 right? Let me look at my next neighbor. 0:26:20.000,0:26:23.000 My next neighbor is C. Ah, 0:26:23.000,0:26:25.000 but C, already visited, 0:26:25.000,0:26:27.000 so there's no reason to 0:26:27.000,0:26:29.000 go crazy on that thing. So, in fact, I can 0:26:29.000,0:26:33.000 immediately see that I've visited that, so I don't have any further path to look at 0:26:33.000,0:26:33.000 there. 0:26:33.000,0:26:35.000 I'll hit D then 0:26:35.000,0:26:37.000 as the next one in sequence. So 0:26:37.000,0:26:40.000 at this point, I've done everything reachable from the B arm, everything 0:26:40.000,0:26:43.000 reachable from the C arm, and now I'm starting on the D arm. I said okay, where can I get to 0:26:43.000,0:26:44.000 from D? 0:26:44.000,0:26:46.000 Well, I can get to E. All right, 0:26:46.000,0:26:48.000 where can I get to from E? 0:26:48.000,0:26:52.000 E goes to F, so the idea is you think of it going deep. It's going as far away as it 0:26:52.000,0:26:56.000 can, as deep as possible, before it, kind of, unwinds 0:26:56.000,0:26:58.000 getting to the F that has no neighbors 0:26:58.000,0:27:01.000 and saying okay. Well, how about - where can I get to 0:27:01.000,0:27:04.000 also from E? I can get to G. 0:27:04.000,0:27:07.000 From G where can I get to? Nowhere, 0:27:07.000,0:27:09.000 so we, kind of, unwind the 0:27:09.000,0:27:12.000 D arm, and then we will eventually get back to and hit the H. 0:27:12.000,0:27:16.000 So the order they actually were hit in this case happened to be alphabetical. I 0:27:16.000,0:27:18.000 assigned them such that that would come out that way. 0:27:18.000,0:27:21.000 That's not really a property of what depth-first search does, 0:27:21.000,0:27:24.000 but think of it as like it's going as deep as possible, as far away as possible, 0:27:24.000,0:27:28.000 and so it makes a choice - it's pretty much like the recursive backtrackers that we've seen. In 0:27:28.000,0:27:31.000 fact, they make a choice, commit to it, and just move forward on it, never, kind of, 0:27:31.000,0:27:35.000 giving a backwards glance, and only if the whole process 0:27:35.000,0:27:35.000 bottoms out 0:27:35.000,0:27:39.000 does it come back and start unmaking decisions and try alternatives, 0:27:39.000,0:27:43.000 eventually exploring everything reachable from A 0:27:43.000,0:27:46.000 when all is done and said. So if 0:27:46.000,0:27:49.000 you, for example, use that on the maze, 0:27:49.000,0:27:52.000 right, the depth-first search on a maze is very much the same strategy we're talking about 0:27:52.000,0:27:53.000 here. 0:27:53.000,0:27:56.000 Instead of the way we did it was actually breadth-first search, if you 0:27:56.000,0:27:57.000 remember back to that assignment, 0:27:57.000,0:28:00.000 the depth-first search alternative is actually doing it the - 0:28:00.000,0:28:01.000 just go deep, 0:28:01.000,0:28:04.000 make a choice, go with it, run all the way until it bottoms out, 0:28:04.000,0:28:12.000 and only then back up and try some other alternatives. 0:28:12.000,0:28:15.000 The breadth-first traversal, gonna hit 0:28:15.000,0:28:16.000 the same nodes 0:28:16.000,0:28:19.000 but just gonna hit them in a different way. 0:28:19.000,0:28:21.000 If my goal were to actually hit all of them, 0:28:21.000,0:28:25.000 then either of them is gonna be a fine strategy. There may be some reason why 0:28:25.000,0:28:25.000 0:28:25.000,0:28:29.000 I'm hoping to stop the search early, and that actually might dictate why I would prefer 0:28:29.000,0:28:31.000 which ones to look at first. 0:28:31.000,0:28:35.000 The breadth-first traversal is going to explore things in terms of distance, 0:28:35.000,0:28:40.000 in this case, expressed in terms of number of hops away from the starting 0:28:40.000,0:28:40.000 node. 0:28:40.000,0:28:44.000 So it's gonna look at everything that's one hop away, and then go back and look at 0:28:44.000,0:28:46.000 everything two hops away, and then three hops away. 0:28:46.000,0:28:49.000 And so if the goal, for example, was to find a node 0:28:49.000,0:28:51.000 in a shortest path connection between it, 0:28:51.000,0:28:55.000 then breadth-first search, by looking at all the ones one hop away, if it was a one hop 0:28:55.000,0:28:56.000 path, it'll find it, 0:28:56.000,0:28:59.000 and if it's a two hop path, it'll find it on that next round, and then the three hop 0:28:59.000,0:29:00.000 path, 0:29:00.000,0:29:02.000 and there might be paths that are 10, and 20, and 100 0:29:02.000,0:29:03.000 steps long, 0:29:03.000,0:29:06.000 but if I find it earlier, I won't go looking at them. It actually might prove to 0:29:06.000,0:29:09.000 be a more efficient way to get to the shortest solution as opposed to depth-first 0:29:09.000,0:29:11.000 search which go off and try all these deep 0:29:11.000,0:29:14.000 paths that may not ever end up where I want it to be 0:29:14.000,0:29:16.000 before it eventually made its way to the short 0:29:16.000,0:29:20.000 solution I was wanting to find. So the thing 0:29:20.000,0:29:21.000 is we have the starting node. 0:29:21.000,0:29:24.000 We're gonna visit all the immediate neighbors. 0:29:24.000,0:29:27.000 So, basically, a loop over the 0:29:27.000,0:29:28.000 immediate one, 0:29:28.000,0:29:31.000 and then while we're doing that, we're gonna actually, kind of, be gathering up that 0:29:31.000,0:29:33.000 next generation. 0:29:33.000,0:29:33.000 0:29:33.000,0:29:37.000 Sometimes that's called the Frontier, sort of, advancing the frontier 0:29:37.000,0:29:39.000 of the nodes to explore next, 0:29:39.000,0:29:42.000 so all the nodes that are two hops away, 0:29:42.000,0:29:44.000 and then while we're processing the ones that are two hops away, we're gonna 0:29:44.000,0:29:47.000 be gathering the frontier 0:29:47.000,0:29:50.000 that is three hops away. And so at each stage we'll be, kind of, processing a 0:29:50.000,0:29:54.000 generation but while assembling the generation N+1 0:29:54.000,0:29:58.000 in case we need to go further to find those things. 0:29:58.000,0:30:00.000 We'll use an auxiliary data structure during 0:30:00.000,0:30:04.000 this. That management of the frontier, keeping track of the nodes that are, kind 0:30:04.000,0:30:05.000 of, 0:30:05.000,0:30:07.000 staged for the 0:30:07.000,0:30:09.000 subsequent iterations, 0:30:09.000,0:30:12.000 the queue is a perfect data structure for that. If I put the 0:30:12.000,0:30:14.000 initial neighbors into a queue, 0:30:14.000,0:30:19.000 as I dequeue them, if I put their neighbors on the back of the queue, so enqueue them 0:30:19.000,0:30:21.000 behind all the one hop neighbors, then 0:30:21.000,0:30:24.000 if I do that with code, as I'm processing Generation 1 off the front of the queue, I'll be 0:30:24.000,0:30:27.000 stacking Generation 2 in the back of the queue, 0:30:27.000,0:30:30.000 and then when all of Generation 1 is exhausted, I'll be processing Generation 0:30:30.000,0:30:31.000 2, 0:30:31.000,0:30:35.000 but, kind of, enqueuing Generation 3 behind it. 0:30:35.000,0:30:39.000 Same deal that we had with depth-first search though is we will have to be 0:30:39.000,0:30:41.000 careful about cycles will pull past that 0:30:41.000,0:30:44.000 when there's more than one possibility of how we get to something, we don't 0:30:44.000,0:30:46.000 want to, kind of, keep going around that cycle 0:30:46.000,0:30:52.000 or do a lot of redundant work visiting nodes that we've already seen. So 0:30:52.000,0:30:55.000 I have the same idea of keeping track of a set 0:30:55.000,0:30:57.000 that knows what 0:30:57.000,0:31:00.000 we've previously visited, 0:31:00.000,0:31:01.000 and the 0:31:01.000,0:31:05.000 initial node gets enqueued just, kind of, by itself, just like Generation 0 gets 0:31:05.000,0:31:07.000 put into the queue, 0:31:07.000,0:31:10.000 and then while the queue is not empty, so while there's still neighbors I haven't 0:31:10.000,0:31:11.000 yet visited, 0:31:11.000,0:31:13.000 I dequeue the frontmost one, 0:31:13.000,0:31:16.000 if it hasn't already been visited on some previous iteration, then I mark it as 0:31:16.000,0:31:17.000 visited, 0:31:17.000,0:31:22.000 and then I enqueue all of its children or neighbors at the back of the queue. 0:31:22.000,0:31:26.000 And so, as of right now, I'm processing Generation 0, that means all of Generation 1, so the 0:31:26.000,0:31:29.000 neighbors that are one hop away, get placed in the queue, and then 0:31:29.000,0:31:34.000 the next iteration coming out here will pull off a Generation 1, 0:31:34.000,0:31:37.000 and then enqueue all of these two hop neighbors. 0:31:37.000,0:31:40.000 Come back and other Generation 1's will get pulled off, enqueue all their two hop 0:31:40.000,0:31:44.000 neighbors, and so the two hop generation will, kind of, be built up behind, and then 0:31:44.000,0:31:48.000 once the last of the one hop generation gets managed, 0:31:48.000,0:31:51.000 start processing the Generation 2 and so on. 0:31:51.000,0:31:55.000 And so this will keep going while the queue has some 0:31:55.000,0:31:58.000 contents in there. So that means some neighbors 0:31:58.000,0:32:01.000 we have connections to that we haven't yet had a chance to explore, and 0:32:01.000,0:32:05.000 either we will find out that they have all been visited, so this will end when 0:32:05.000,0:32:09.000 all of the nodes that are in the queue have been visited or 0:32:09.000,0:32:14.000 there are no more neighbors to enqueue. So we've gotten to those dead ends or those cycles 0:32:14.000,0:32:25.000 that mean we've reach everything that was reachable from that start node. 0:32:25.000,0:32:28.000 And so I've traced this guy 0:32:28.000,0:32:30.000 to see it doing its work. 0:32:30.000,0:32:33.000 So if I start again with A, 0:32:33.000,0:32:36.000 and, again, assuming that I'm gonna process my nodes 0:32:36.000,0:32:40.000 in alphabetical order when I have children, so I will through and I 0:32:40.000,0:32:44.000 will enqueue, so the queue right now has just node A in it. I pull it off. I say 0:32:44.000,0:32:45.000 have I visited? No, I have not. 0:32:45.000,0:32:47.000 So then I mark it as visited, 0:32:47.000,0:32:52.000 and now for each of its neighboring nodes, B, C, D, H, I'm gonna put those into 0:32:52.000,0:32:54.000 the queue, 0:32:54.000,0:32:57.000 and so then they're loaded up, and then I come back around, I say do I still have stuff in the queue? I 0:32:57.000,0:33:01.000 do, so let's take out the first thing that was put in; it was B. Okay, 0:33:01.000,0:33:05.000 and I'll say okay, well, where can you get to from B? You can get to C, so I'm gonna put C in the queue. Now, 0:33:05.000,0:33:09.000 C is actually already in the queue, but we're gonna pull it out 0:33:09.000,0:33:12.000 earlier in terms it'll get handled by the 0:33:12.000,0:33:14.000 visiting status later. 0:33:14.000,0:33:16.000 So, okay, we'll put C, D, and E in 0:33:16.000,0:33:20.000 and then put C behind it. So right now, we'll have C, D, H, and C again. 0:33:20.000,0:33:24.000 It'll find the C that's in the front there. It'll say where can you get to from C? 0:33:24.000,0:33:24.000 Nowhere, 0:33:24.000,0:33:26.000 so no additional 0:33:26.000,0:33:28.000 Generation 2's are added by that one. 0:33:28.000,0:33:31.000 I'll look at D, and I'll say okay, well, what Generation 2's do we have from D? It says, well, I can 0:33:31.000,0:33:32.000 get to E. All right, so put E on the 0:33:32.000,0:33:34.000 back of the 0:33:34.000,0:33:36.000 0:33:36.000,0:33:39.000 queue. Look at H; where can H get to? H can get to G, 0:33:39.000,0:33:42.000 so go ahead and put G on the back of the queue. 0:33:42.000,0:33:45.000 So now, it comes back around to the things that are two hops away. 0:33:45.000,0:33:47.000 The first one it's gonna pull off is C. It's 0:33:47.000,0:33:49.000 gonna say, well, C is already visited, 0:33:49.000,0:33:53.000 so there's no need to redundantly visit that or do anything with that. 0:33:53.000,0:33:54.000 It passes over C. It 0:33:54.000,0:33:56.000 finds the E 0:33:56.000,0:33:58.000 that's behind that, 0:33:58.000,0:34:00.000 and then it finds the G that was behind the 0:34:00.000,0:34:03.000 H. The E also enqueued the F 0:34:03.000,0:34:07.000 that was four hops, and the very last one, right, output will be that F that was 0:34:07.000,0:34:09.000 only reachable by taking 0:34:09.000,0:34:12.000 three hops away from the thing. So this one's, kind of, 0:34:12.000,0:34:13.000 working radially. The 0:34:13.000,0:34:16.000 idea that I'm looking at all the things that I can get to, kind of, in one hop 0:34:16.000,0:34:19.000 are my first generation, then two hops, then three hops, 0:34:19.000,0:34:23.000 and so it's like throwing a stone in the water and watching it ripple out 0:34:23.000,0:34:26.000 that the visiting 0:34:26.000,0:34:29.000 is managed in terms of, kind of, growing length of path, 0:34:29.000,0:34:32.000 and number of hops that it took to get there. 0:34:32.000,0:34:37.000 So this is the strategy that you used for maze solving, if you remember. 0:34:37.000,0:34:37.000 That 0:34:37.000,0:34:42.000 what you were tracking was the queue of paths where the path was 0:34:42.000,0:34:44.000 AB, or ADC, or 0:34:44.000,0:34:46.000 ADE, and that they grew 0:34:46.000,0:34:47.000 0:34:47.000,0:34:50.000 step wise. That, kind of, each iteration through the 0:34:50.000,0:34:54.000 traversal there where you're saying, okay, well, I've seen all the paths that are one hop. Now I've seen all the ones of two, 0:34:54.000,0:34:56.000 now the ones three, and four, and five, 0:34:56.000,0:34:58.000 and when you keep doing that, eventually you get to the hop, 0:34:58.000,0:35:01.000 you know, 75, which leads you 0:35:01.000,0:35:05.000 all the way to the goal, and since you have looked at all the paths that were 0:35:05.000,0:35:09.000 74 and shorter, the first path that you dequeue that leads to your goal 0:35:09.000,0:35:12.000 must be the best or the shortest overall 0:35:12.000,0:35:13.000 because 0:35:13.000,0:35:15.000 of the order you processed them. 0:35:15.000,0:35:18.000 That's not true in depth-first search, right? That depth-first search might find a much 0:35:18.000,0:35:21.000 longer and more circuitous path that led to the goal 0:35:21.000,0:35:25.000 when there is a shorter one that would be eventually found if I had a graph that 0:35:25.000,0:35:28.000 looked, 0:35:28.000,0:35:36.000 you know, had this long path, let's say, 0:35:36.000,0:35:41.000 and so this is goal node, let's say, and this is the start node that 0:35:41.000,0:35:44.000 there could be a two hop path, kind of, off to this angle, 0:35:44.000,0:35:47.000 but if this was the first one explored in the depth-first, right, 0:35:47.000,0:35:50.000 it could eventually, kind of, work its way all the way around through this 0:35:50.000,0:35:52.000 eight hop path, and say, okay, well, I found it, right? 0:35:52.000,0:35:55.000 It would be the first one we found, but there's no guarantee that would be the 0:35:55.000,0:35:58.000 shortest in depth-first search, right? It would just happen to be the one that based on 0:35:58.000,0:36:01.000 its arbitrary choice of which 0:36:01.000,0:36:04.000 neighbors to pursue it happened to get to that there could be a shorter 0:36:04.000,0:36:08.000 one that would be found later in the traversal. That's not true in 0:36:08.000,0:36:12.000 the breadth-first search strategy because it will be looking at this, and then these two, 0:36:12.000,0:36:16.000 and then these two, and, kind of, working its way outward down those edges. 0:36:16.000,0:36:19.000 So as soon as we get to one that hits the goal, we 0:36:19.000,0:36:25.000 know we have found the shortest we can have. Question? 0:36:25.000,0:36:28.000 In that case, 0:36:28.000,0:36:33.000 you could return, okay, we need two 0:36:33.000,0:36:34.000 jumps to get to 0:36:34.000,0:36:35.000 the goal, 0:36:35.000,0:36:36.000 like, as far 0:36:36.000,0:36:40.000 winning the goal. Yeah. How do we remember which path led us there? 0:36:40.000,0:36:42.000 So if you were really doing that, you know, in terms of depth-first search, what 0:36:42.000,0:36:46.000 you'd probably be tracking is what's the path? Probably a stack that said here's the stack 0:36:46.000,0:36:49.000 that led to that path. Let me hold 0:36:49.000,0:36:53.000 onto this, right, and I will compare it to these alternatives. So I'll try it on Neighbor 1, 0:36:53.000,0:36:54.000 get the stack back from that; it's the best. 0:36:54.000,0:36:58.000 Try it on Neighbor 2, get that stack back, see which one is smaller and use 0:36:58.000,0:37:01.000 that as the better choice, and so, eventually, when I tried it out this side, I'd get 0:37:01.000,0:37:03.000 the stack back that was just two, 0:37:03.000,0:37:06.000 and I could say, yeah, that was better than this ten step path I had over here, so I 0:37:06.000,0:37:09.000 will prefer that. So just using your standard, kind of, backtracking with 0:37:09.000,0:37:13.000 using your return values, incorporating them, and deciding which was better. 0:37:13.000,0:37:14.000 And so in this case, 0:37:14.000,0:37:16.000 the depth-first search 0:37:16.000,0:37:19.000 can't really be pruned very easily 0:37:19.000,0:37:24.000 because it has to, kind of, explore the whole thing to know that this 0:37:24.000,0:37:26.000 short little path right over here happened to be just 0:37:26.000,0:37:29.000 arbitrarily examined last, whereas, the breadth-first search actually does, kind of, because 0:37:29.000,0:37:31.000 it's prioritizing by 0:37:31.000,0:37:35.000 order of length. If the goal was shortest path, it will find it sooner and be able to 0:37:35.000,0:37:35.000 stop 0:37:35.000,0:37:39.000 when it found it rather than look at these longer paths. So it won't end up - 0:37:39.000,0:37:42.000 if there were hundreds of other paths over here, breadth-first search won't actually 0:37:42.000,0:37:47.000 have to ever reach them or explore them. 0:37:47.000,0:37:52.000 So there are a lot of things that actually end up being graph search in disguise. That's why 0:37:52.000,0:37:54.000 a, kind of, a graph is a great data structure to, kind of, get to in 0:37:54.000,0:37:57.000 106b because there are lots of problems 0:37:57.000,0:38:01.000 that, in the end, if you can model them using the same 0:38:01.000,0:38:03.000 setup of nodes, and arcs, and traversals, 0:38:03.000,0:38:06.000 that you can answer interesting questions about that data by applying 0:38:06.000,0:38:09.000 the same graph search techniques. 0:38:09.000,0:38:12.000 So as long as you want to know things like, well, which nodes are reachable from this node at 0:38:12.000,0:38:12.000 all? 0:38:12.000,0:38:14.000 And then 0:38:14.000,0:38:17.000 that just means, okay, I'm here and I want to get to these places; what places could 0:38:17.000,0:38:18.000 I get to? Or 0:38:18.000,0:38:21.000 knowing what courses I could take 0:38:21.000,0:38:25.000 that would lead me to a CS degree, it's, kind of, like well, looking for the ones 0:38:25.000,0:38:29.000 that lead to having all your graduation requirements 0:38:29.000,0:38:32.000 satisfied, and each of those arcs could be a course you could take, right, how close did 0:38:32.000,0:38:35.000 that get you to the goal? How can I get to the 0:38:35.000,0:38:40.000 place I'd like to be, or what are the places that this course satisfies 0:38:40.000,0:38:41.000 requirements that lead to? 0:38:41.000,0:38:44.000 Knowing things about 0:38:44.000,0:38:47.000 connectivity is, kind of, useful in things like the, kind of, bacon numbers or 0:38:47.000,0:38:50.000 erdish numbers where people want to know, well, how close am I to greatness by - 0:38:50.000,0:38:54.000 have I been in a movie with somebody who was in a movie, who was in a movie with Kevin Bacon? And, 0:38:54.000,0:38:55.000 for me, it turns out 0:38:55.000,0:38:57.000 zero; I have been in no movies. But have 0:38:57.000,0:38:59.000 you written a paper with someone who then 0:38:59.000,0:39:02.000 connects through a chain to some other famous person, right? Tells you about 0:39:02.000,0:39:06.000 your, kind of, social network distance or something, or I'm linked in. You're trying to get a 0:39:06.000,0:39:09.000 new job, and you want to find somebody to introduce you to 0:39:09.000,0:39:13.000 the head of this cool company. It's, like, figuring who you know who knows them, right? 0:39:13.000,0:39:18.000 It's finding paths through a social network graph. 0:39:18.000,0:39:19.000 Finding out things like 0:39:19.000,0:39:22.000 are there paths with cycles? What's the shortest path 0:39:22.000,0:39:25.000 through the cycle, longest path through the cycle often tells you about things about 0:39:25.000,0:39:27.000 redundancies in that structure 0:39:27.000,0:39:31.000 that, for example, when you're building a network infrastructure, sometimes you do have 0:39:31.000,0:39:34.000 cycles in it because you actually want to have multiple ways to get to things in case 0:39:34.000,0:39:37.000 there's a breakdown. Like if there's a freeway accident, you want to have another way 0:39:37.000,0:39:38.000 to get around it, 0:39:38.000,0:39:41.000 but you are interested in, maybe, trying to keep your cost down is trying to find 0:39:41.000,0:39:47.000 out where are the right places to put in those redundancies, those cycles that 0:39:47.000,0:39:50.000 give you the best benefit with the least cost. 0:39:50.000,0:39:54.000 Trying to find things like a continuous path that visits all the nodes once and exactly 0:39:54.000,0:39:57.000 once, especially doing so efficiently, picking a short 0:39:57.000,0:40:00.000 overall path for that. We talked a little bit about that in terms of what's called the 0:40:00.000,0:40:02.000 traveling salesman problem. 0:40:02.000,0:40:04.000 You are trying to get from - you have ten 0:40:04.000,0:40:06.000 cities to visit on your book tour. You don't want to spend all your lifetime 0:40:06.000,0:40:08.000 on a plane. What's the 0:40:08.000,0:40:13.000 path of crisscrossing the country that gets you to all ten of those with the least 0:40:13.000,0:40:15.000 time painfully 0:40:15.000,0:40:17.000 spent on an airline? And so 0:40:17.000,0:40:20.000 those are just graph search problems. You have these nodes. You have these arcs. 0:40:20.000,0:40:21.000 How can you 0:40:21.000,0:40:23.000 traverse them and visit 0:40:23.000,0:40:25.000 and come up with 0:40:25.000,0:40:27.000 a good solution. There's other 0:40:27.000,0:40:30.000 problems I think that are kind of neat. Word letters, we used to give out an assignment on word letters. We 0:40:30.000,0:40:31.000 didn't this time, but I thought it was an 0:40:31.000,0:40:34.000 interesting problem to think about. The idea of 0:40:34.000,0:40:36.000 how it is you can transform 0:40:36.000,0:40:41.000 one letter at a time from one word to another is very much a 0:40:41.000,0:40:45.000 graph problem behind the scenes, and that largely a breadth-first 0:40:45.000,0:40:47.000 traversal is exactly what you're looking for there. How 0:40:47.000,0:40:51.000 can I transform this with the shortest number of steps is, kind of, 0:40:51.000,0:40:54.000 working radially out from the starting word. 0:40:54.000,0:40:54.000 0:40:54.000,0:40:58.000 Your boggle board turns out, really, is just a graph. If 0:40:58.000,0:41:01.000 you think of 0:41:01.000,0:41:07.000 having this sequence of letters 0:41:07.000,0:41:12.000 that you're trying to work through, 0:41:12.000,0:41:16.000 that really each of these letters can actually just be thought of as a node, 0:41:16.000,0:41:20.000 and then the connections it has are to its eight neighbors, and that finding 0:41:20.000,0:41:24.000 words in there, especially finding paths, starting from 0:41:24.000,0:41:27.000 this node that lead away, that don't revisit, so, in fact, doing breadth-first 0:41:27.000,0:41:31.000 or depth-first search. You most likely did depth-first search when you were writing the 0:41:31.000,0:41:32.000 implementation of boggle, 0:41:32.000,0:41:35.000 but that's, kind of, a deep search on I found an A. I found a P. I found a P, and, kind 0:41:35.000,0:41:40.000 of, keeps going as long as that path looks viable. So, in this case, that the arc 0:41:40.000,0:41:41.000 information we're using is 0:41:41.000,0:41:44.000 having a sub of those letters, do they form a prefix that could be leading 0:41:44.000,0:41:45.000 somewhere good, 0:41:45.000,0:41:47.000 and then the, kind of, recursion bottoming out when 0:41:47.000,0:41:52.000 we have run ourselves into a corner where we have no unvisited nodes 0:41:52.000,0:41:53.000 that neighbor us, 0:41:53.000,0:41:57.000 or only nodes that would extend to a prefix that no longer forms any 0:41:57.000,0:41:58.000 viable words 0:41:58.000,0:42:01.000 as a search problem. 0:42:01.000,0:42:04.000 So it feels like everything you've done this quarter has actually been a graph in disguise; I just didn't 0:42:04.000,0:42:08.000 tell you. The maze problem is very much a graph search problem, building a 0:42:08.000,0:42:10.000 graph and searching it. 0:42:10.000,0:42:13.000 Another one that I think is, kind of, neat to think about is, kind of, related to the idea of word letters 0:42:13.000,0:42:18.000 is that often when you mistype a word, a word 0:42:18.000,0:42:19.000 processor will suggest to you 0:42:19.000,0:42:22.000 here's a word that you might have meant instead. You type a word that it doesn't 0:42:22.000,0:42:24.000 have in its lexicon, and it says, well, 0:42:24.000,0:42:25.000 that could just be 0:42:25.000,0:42:28.000 a word I don't know, but it also could mean that you actually mistyped something, and so 0:42:28.000,0:42:31.000 what you want to look at is having neighboring nodes, 0:42:31.000,0:42:33.000 sort of, diagramming a graph 0:42:33.000,0:42:35.000 of all the words you know of 0:42:35.000,0:42:39.000 and then the, kind of, permutations, the tiny little twiddles to that word that 0:42:39.000,0:42:41.000 are neighbors to it so that 0:42:41.000,0:42:44.000 you could describe a set of suggestions 0:42:44.000,0:42:47.000 as being those things that are within a certain 0:42:47.000,0:42:50.000 hop distance from the original mistyped word. And 0:42:50.000,0:42:53.000 so you could say, yeah, things that have two letters different are probably 0:42:53.000,0:42:56.000 close enough, but things that have three or more letters different aren't actually 0:42:56.000,0:42:58.000 likely to be 0:42:58.000,0:43:00.000 the mistyped word, 0:43:00.000,0:43:04.000 and then that, kind of, leads to supporting things like wildcard searches 0:43:04.000,0:43:08.000 where you're trying to remember whether it's E-R-A-T or A-R-T-E at the 0:43:08.000,0:43:10.000 end of desperate. 0:43:10.000,0:43:14.000 Those things could be modeled as, kind of, like they search through a graph space where 0:43:14.000,0:43:16.000 you're modeling those letters, and then there's, like, a 0:43:16.000,0:43:20.000 branching where you try all of the letters coming out of D-E-S-P, seeing if any 0:43:20.000,0:43:23.000 of them lead to an R-A-T-E. 0:43:23.000,0:43:26.000 So the last thing I wanted 0:43:26.000,0:43:28.000 to just bring up, and this is the one that's going into this week's assignment. I'm just 0:43:28.000,0:43:31.000 gonna actually show it to you to, kind of, get you thinking about it because it's 0:43:31.000,0:43:35.000 actually the problem at hand that you're gonna be solving 0:43:35.000,0:43:36.000 is that 0:43:36.000,0:43:41.000 the one of where I'm interested in finding the shortest path, but I'm gonna add in 0:43:41.000,0:43:43.000 this new idea that 0:43:43.000,0:43:45.000 all hops are not created equal. 0:43:45.000,0:43:49.000 That a flight from San Francisco to Seattle is a shorter distance than the 0:43:49.000,0:43:52.000 one that goes from San Francisco to Boston, 0:43:52.000,0:43:53.000 and if I'm interested in getting 0:43:53.000,0:43:57.000 to someplace, it's not just that I would like to take a direct flight. I'd also 0:43:57.000,0:44:00.000 like to take the set of flights that involves the least, kind of, going out of my way, 0:44:00.000,0:44:03.000 and going through other cities, and other directions. 0:44:03.000,0:44:07.000 And so in this case, a breadth-first search is a little bit too simplistic. 0:44:07.000,0:44:10.000 That if I just look at all the places I can get to in one hop, 0:44:10.000,0:44:11.000 0:44:11.000,0:44:15.000 those are not really equally 0:44:15.000,0:44:16.000 weighted 0:44:16.000,0:44:19.000 in terms of my goal of minimizing my total path distance, 0:44:19.000,0:44:22.000 but if I, instead, prioritize - 0:44:22.000,0:44:25.000 prioritize in your mind, immediately conjure up images of the 0:44:25.000,0:44:26.000 priority queue, 0:44:26.000,0:44:28.000 to look at paths that are, 0:44:28.000,0:44:32.000 if I'm trying to minimize total distance, the shortest path, even if they 0:44:32.000,0:44:36.000 represent several hops, if they're still shorter - if I have three flights, and each are 0:44:36.000,0:44:37.000 100 miles, 0:44:37.000,0:44:40.000 then they represent a flight of 300 total miles, 0:44:40.000,0:44:44.000 and that's still preferred to one that actually is 2,000 miles or 7,000 miles 0:44:44.000,0:44:45.000 going to Asia 0:44:45.000,0:44:49.000 alternatively. And so if I worked on a breadth-first that's been modified by a total 0:44:49.000,0:44:50.000 distance, 0:44:50.000,0:44:51.000 that it would be a way 0:44:51.000,0:44:54.000 to work out the shortest path 0:44:54.000,0:44:55.000 in terms of total weight 0:44:55.000,0:44:58.000 from San Francisco to my destination 0:44:58.000,0:45:01.000 by a 0:45:01.000,0:45:03.000 modified breadth-first, kind of, weighted strategy, 0:45:03.000,0:45:06.000 and that is exactly what you're gonna be doing in this assignment. I'm not gonna 0:45:06.000,0:45:08.000 give away too much by, kind of, 0:45:08.000,0:45:11.000 delving too much in it, but the handout does a pretty good job of talking you through what 0:45:11.000,0:45:15.000 you're doing, but it's a, kind of, neat way to think about that's how the GPS, the MapQuest, and things like 0:45:15.000,0:45:16.000 that are working is 0:45:16.000,0:45:17.000 you're at a site; 0:45:17.000,0:45:21.000 you have a goal, and you have this information that you're trying to 0:45:21.000,0:45:24.000 guide that search, and you're gonna use heuristics about 0:45:24.000,0:45:28.000 appears to be the, kind of, optimal path to, kind of, pick that, and go with it, 0:45:28.000,0:45:31.000 and see where it leads you. So 0:45:31.000,0:45:32.000 that will be your 0:45:32.000,0:45:40.000 task for this week. What we'll talk about on Friday is caching.