Got a YouTube account?

New: enable viewer-created translations and captions on your YouTube channel!

English subtitles

← Parsing - Design of Computer Programs

Get Embed Code
1 Language

Showing Revision 2 created 05/24/2016 by Udacity Robot.

  1. Now I'm going to define the parser, and I want it to have this signature.
  2. I'm going to define a function pare, which takes a symbol,
  3. saying I want to parse something like an expression.
  4. It takes text that we're going to parse, and it takes the grammar
  5. that defines that symbol and all the other symbols.
  6. I want it to return--when we had regular expressions we looked at returning a set of remainders.
  7. I'm going to have this return a single remainder.
  8. The idea is that we want the author of the grammar to have some control over what's going on
  9. and make this grammar be unambiguous so that there is a single result for each parse
  10. rather than having to keep a set of results.
  11. The idea is that for each symbol we're going to consider the alternatives in left-to-right order.
  12. If we're asked to parse an expression, we'll say can we parse this alternative.
  13. If we can, if we can parse in succession a term and then a plus or minus
  14. then an expression and then we'll commit to that.
  15. We'll say that's the result we're going to return.
  16. That's the single remainder, and we're not going to even explore the other alternatives.
  17. Because we have this left-to-right choice, that's why we decide to write the expressions in this order.
  18. We don't write them in the opposite order because if we wrote the rule this way
  19. and we were trying to parse the expression a + 3,
  20. we'd be asked to parse this, we'd try to parse a term, we'd say, yes, a is a term,
  21. and then we'd stop, and we'd say that's the end.
  22. We wouldn't even consider parsing off the term plus the expression.
  23. This is no good.
  24. It's up to the author of the grammar to write rules in the correct order
  25. so that the longest thing gets tried first.
  26. Our regular expression functions are what is known as recognizers.
  27. They told us--yes or no--is a string part of a language.
  28. Here what we're doing is different. It's a parser.
  29. It tells you--yes and no--is it part of the language, but then it also gives you
  30. an internal structure, a tree structure of the parts of the expression.
  31. Here it would be a + 3.
  32. If we had m * x + b,
  33. then that would parse into a structure that had m * x + b.
  34. Here I said we were a remainder, but actually I want to return a two element tuple
  35. of the tree followed by the remainder.
  36. Here's what we're going for.
  37. If I asked to parse the expression a * x with the grammar G,
  38. I want to get back this tree structure. It looks kind of complicated.
  39. All it's really saying is that we have an a in the first element, then the multiply sign,
  40. and then an x in the third element, and there's no remainder.
  41. We parse the whole string.
  42. That's what it says, but there's all these intermediate parts here because that's the way the grammar is defined.
  43. I should say here that this is a tree, remainder result.
  44. We're going to use the convention that Fail corresponds to a value of None.
  45. Let's think then about what we have to do to be able to parse a text.
  46. I can think of four cases that we have to deal with.
  47. One, we want to be able to parse an expression or a symbol like the expression keyword.
  48. To do that we've got to be able to look that up in the grammar G.
  49. We've also got to be able to parse a regular expression, plus or minus,
  50. and we've seen how to do that before, so we reduce that to a solved problem.
  51. We're going to have to be able to parse a tuple of alternatives.
  52. Here one alternative or another alternative done in left-to-right order.
  53. Finally, we're going to have to be able to parse a list of atoms representing a sequence--1, 2, 3.
  54. Now, I'm going to tell you the plan for how we're going to implement this as code
  55. within the function parse.
  56. So this first case I'm going to handle with a subroutine called "parse_atom."
  57. This is an atomic expression. We should be able to handle that.
  58. The second part, the regular expression, is a type of atom so it occurs within the parse_atom routine.
  59. We're going to define a variable called "tokenizer" to help us do that.
  60. Then we're going to use the built-in re.match along with the tokenizer
  61. to recognize those regular expressions.
  62. Then for the alternatives, a tuple of alternatives is what you get when you look up
  63. the symbol, like exp, in the grammar.
  64. I could have had a separate function called parse_alternatives
  65. but instead I just decided to have that be part of parse_atom,
  66. because it was so simple it was just an immediate step
  67. to go from the symbol to the list of alternatives.
  68. Finally, to parse this sequence of atoms, I have a subroutine or function called "parse_sequence."