YouTube

Got a YouTube account?

New: enable viewer-created translations and captions on your YouTube channel!

English subtitles

← Modifying the Search Engine Solution - Intro to Computer Science

Get Embed Code
2 Languages

Showing Revision 5 created 05/25/2016 by Udacity Robot.

  1. So the answers is, we need to change the three of these procedures.
  2. We need to change crawl web, we need to change add to
  3. index, and we need to change lookup. We don't need to change get
  4. all links at all, that we can keep exactly the same as it
  5. was. It just returns the list of links, it doesn't depend on the
  6. index. We don't need to change add page to index. This is
  7. a little more surprising since it depends on index, but because of the
  8. way we wrote add page to index, it calls add to index. So
  9. it doesn't depend how we actually represent the index. It's going to go through
  10. all the words add them to index by calling add to
  11. index. So we don't actually have to change that code. We
  12. do need to change the other two, so let's start with
  13. crawl web and figure out what we need to do to
  14. change this, to use a dictionary. And the change is actually
  15. going to be really small, right? The index is here, in the
  16. old version we initialize index to a list, to the empty
  17. list And all we do with index is pass it in
  18. to add page to index. So to change that to use
  19. a dictionary, all we need to do is change the square brackets
  20. to be curly brackets. So now, instead of starting with an empty
  21. list, we're going to start with an empty dictionary. So that's the only
  22. change we need to make to crawl_web. The change to add index
  23. is going to be a little more complicated. And we can see from
  24. the code to crawl_web, what happens with each page was that we
  25. call add to index, passing in index, which is now a dictionary.
  26. Let's look at add_page_to_index. I claimed that we didn't need to
  27. change that. Here's the code to add_page_to_index. And it takes the index.
  28. It goes through the words. It adds each word to the index.
  29. We can do this just the same whether index was a list
  30. or a dictionary. We don't need to change add page to
  31. index, we are going to need to change add to index. So
  32. we are going to need to change the code to add to
  33. index and lets try to figure out how. So before we had
  34. add to index, that takes an index, a keyword and a URL. We'll
  35. still take the same parameters but what we had to do when it
  36. was a list was go through all the entries in the index, check
  37. for each one, if it matches the keyword we're looking for. If we
  38. find that it does, then we add the URL. If we get to
  39. the end without finding it, then we append a new entry, which is
  40. the keyword with a list of URLs containing just the first URL. So
  41. let's figure out how to change this to work with the hash table index.
  42. So, the great thing about the hash table is we don't
  43. need to loop through anything now. We know exactly where it is
  44. from the hash table. With the dictionary, the built in in-operation
  45. gives us that. So instead of looping, now we can check right
  46. away if the keyword is in the index. So what's going to
  47. happen if we found the keyword in the index, that means that
  48. we can look it up. This will look up in the dictionary
  49. the entry that corresponds to the index. That's going to be the list
  50. of URLs that we have. And so all we need to do now is append to that entry the
  51. new URL. If it's not in the index already, well
  52. we need to do something different. What we did before
  53. was we added a new element to the index list
  54. using append. We don't want to do that now. We
  55. want to add a new key value paired to the
  56. dictionary. So we're going to do that by using the assignment.
  57. And the entry that we're adding is the list containing
  58. just this URL. So you can delete everything else here.
  59. Add the new entry to the keyword. So this is a
  60. lot simpler. We have less code. And it's going to run a lot
  61. faster. We don't have to loop through anything. Because of the hash
  62. table, we can right away look up whether the keyword is in
  63. the index, we can find if it is, what the value
  64. is, by using the dictionary lookup, and we can append the newer
  65. URL to the list of the URLs associated with that keyword. If
  66. it's not found, we can create a new entry, using the dictionary
  67. syntax, like this, that contains just that URL. So,
  68. now we've got a much simpler way to add
  69. to index. I hope you understand this. If you
  70. do, you should be able to define lookup yourself.