Got a YouTube account?

New: enable viewer-created translations and captions on your YouTube channel!

English subtitles

← Get All Links - Intro to Computer Science

Get Embed Code
3 Languages

Showing Revision 5 created 05/25/2016 by Udacity Robot.

  1. So let's recap the code we have at the end
  2. of unit two. So we defined a procedure, get_next_target, that
  3. would take a page, search for the first linked target
  4. on that page, return that as the value of URL.
  5. That would be the first output, and also return the
  6. position where the end of the quote is so we
  7. know how to continue. And then we define the procedure
  8. print all_links that keeps going as long as we can.
  9. As long as there are more URLs on the page. It
  10. will find the next target. Store these in the variables URL and
  11. endpos to keep track of the location end of string. If
  12. there is a URL, what we did was just print it out
  13. and then we update with the page to keep going. What
  14. we want to do to change this is instead of just printing out
  15. each URL as we find it, we want to collect the URLs. We
  16. want to have a way to use the URLs so we can use
  17. them to keep crawling to find new
  18. pages. The structure we've been learning about this
  19. unit is the way to do that. What we want to do is keep all the
  20. URLs in a list. At the end of this procedure, instead of printing the links
  21. as we go, we want to have a list of all the links that we found.
  22. So this is what the current print_all_links procedure does.
  23. It takes the page as its input and its output
  24. is nothing. It doesn't return anything. All it does
  25. is do some work, prints out all these links. But
  26. we can't actually use them at the end, because
  27. it doesn't return anything. So what we want to do is
  28. change this. Instead of print_all_links, what we want is to
  29. get_all_links. We want to actually have the links in a way
  30. that we can use them. So what we want,
  31. instead of printing all links is to actually get the
  32. links. So we'll change the name of our procedure
  33. to get_all_links. And instead of outputting none, what we want to
  34. do is output a list of links. And that
  35. should be the list that corresponds to the things that
  36. we were printing before, but now instead of just
  37. printing them, we want to output them as a list.