Got a YouTube account?

New: enable viewer-created translations and captions on your YouTube channel!

English subtitles

← Parse Function - Design of Computer Programs

Get Embed Code
1 Language

Showing Revision 2 created 05/24/2016 by Udacity Robot.

  1. Here we go. Here's a function "parse."
  2. It takes a start_symbol like expression.
  3. It takes a text like 3x + b.
  4. It takes a grammar defined with our grammar function.
  5. The first thing we do is define the tokenizer.
  6. The tokenizer has two jobs.
  7. First it has to handle white space that occurs before the token,
  8. and it does that by looking up in grammar under the key consisting of just a space.
  9. In the definition of grammar we arrange to store the white space parameter under that key.
  10. This says build up a regular expression that will parse off
  11. the appropriate amount of white space--some, all, none, whatever is appropriate for your grammar.
  12. Then parse off whatever was defined but the atom that we're looking at next.
  13. We'll see when tokenizer is used how we go ahead and do that.
  14. Then parse sequence says we're just going to go through a sequence.
  15. This is a of atoms. We're going to initialize our result to be the empty list.
  16. Then we're going to go through, try to parse an atom one at a time.
  17. If we get back nothing for a remainder, then Fail.
  18. Otherwise, append to the result the tree that we built up by doing that parse
  19. and continue on in the loop.
  20. Notice that we're updating the text variable,
  21. so we're taking the remainder each time and parsing the next atom
  22. from the remainder of the previous atom.
  23. Now parse_atom handles two cases.
  24. If the atom is something that's in the grammar like bxb expression,
  25. we map it into this tuple of alternatives by looking it up in the grammar,
  26. getting that list of alternatives.
  27. For each alternative, we try to parse.
  28. If we have a successful match--that is if the remainder is not None, indicating failure--
  29. if there's some sort of remainder left over, then we want to return the result.
  30. We return the remainder we got, and we build up a tree structure consisting
  31. of the atoms that we're trying to parse plus the tree of whatever we got back.
  32. If we exhaust all the alternatives and we can't come up with anything,
  33. then we return Fail, which says no tree and no remainder.
  34. Otherwise if the atom is not in the grammar,
  35. then it must be a regular expression.
  36. We take the tokenizer that we built up before.
  37. We insert the atom, which is a regular expression, into that tokenizer and match it against the text.
  38. If there's not a match, then we Fail.
  39. If there is a match, we pull out the matching part. That's going to be the tree.
  40. That's the token that we matched.
  41. We go ahead and we take the rest of the text after the match and return that as a remainder.
  42. This is the only place where the text actually advances,
  43. in this one spot where we're matching tokens against the input text.
  44. These two routines are internal routines inside the definition of parse,
  45. and here's the body of parse that just says parse this atom--
  46. the start symbol against the text.