Return to Video

04-46 Parse Trees

  • 0:00 - 0:02
    So now we have all the machinery we need
  • 0:02 - 0:04
    to tell if a string is valid.
  • 0:04 - 0:06
    However, it's going to turn out that's not enough.
  • 0:06 - 0:09
    Remember those upside down parse trees we talked about earlier?
  • 0:09 - 0:12
    We really wanted those, as well,
  • 0:12 - 0:15
    for our HTML and JavaScript programs
  • 0:15 - 0:19
    in order to interpret them--to get at their meaning correctly.
  • 0:19 - 0:22
    So here, I've written a pretty standard arithmetic expression grammar.
  • 0:22 - 0:24
    An expression can be a number
  • 0:24 - 0:26
    or an expression plus an expression
  • 0:26 - 0:28
    or an expression minus an expression or maybe
  • 0:28 - 0:30
    a negated expression, like negative 3.
  • 0:30 - 0:32
    And we'll want to build up parse trees for this.
  • 0:32 - 0:35
    Now this time, I've written the tokens as plus and minus and not
  • 0:35 - 0:37
    instead of the symbols, + or -.
  • 0:37 - 0:39
    That's our choice; we can do it either way we want.
  • 0:39 - 0:41
    And the particular format I'm going to pick
  • 0:41 - 0:44
    for our abstract syntax tree is
  • 0:44 - 0:47
    nested tuples--or nested lists, in Python.
  • 0:47 - 0:51
    So if we end up using this rule: expression goes to number,
  • 0:51 - 0:54
    we're just going to return the tuple:
  • 0:54 - 0:56
    ("number", 5)--if the input was 5.
  • 0:56 - 1:00
    Similarly, if the input is something like: not 5,
  • 1:00 - 1:05
    We'll end up returning: ("not", of the "number", 5).
  • 1:05 - 1:07
    Note the nesting.
  • 1:07 - 1:09
    So let's say I call this number: number 1.
  • 1:09 - 1:11
    We really want to return this tuple:
  • 1:11 - 1:14
    "number"--in quotes, just as a string, so we know what it is--
  • 1:14 - 1:16
    followed by the value of the token.
  • 1:16 - 1:20
    If this was Thing Number 2 in our reduction rule--
  • 1:20 - 1:25
    not expression--I'd really want this to be filled with a 2.
  • 1:25 - 1:27
    If over here, this was a 3,
  • 1:27 - 1:29
    I would want to return "binop".
  • 1:29 - 1:32
    That stands for Binary Operator, binary just meaning "two things".
  • 1:32 - 1:34
    So things like: Plus, Minus, Times, and Divide--
  • 1:34 - 1:36
    those are arithmetic operations that take two arguments--
  • 1:36 - 1:38
    one on the left, and one on the right.
  • 1:38 - 1:42
    We call those Binary Operators, as a class, just to save space.
  • 1:42 - 1:44
    But whatever this third expression was,
  • 1:44 - 1:47
    that's what I'd want this subtree--
  • 1:47 - 1:49
    this subpart of my tuple--to be.
  • 1:49 - 1:51
    So just as we've seen before how we can
  • 1:51 - 1:55
    encode token rules in Python
  • 1:55 - 1:58
    and do some processing, like chopping off the quotes
  • 1:58 - 2:00
    after we've specified how the token works,
  • 2:00 - 2:02
    using regular expressions,
  • 2:02 - 2:05
    it's going to turn out that there's a similar way
  • 2:05 - 2:07
    for us to do that for grammar rules in Python.
  • 2:07 - 2:10
    Now let me explain a little bit about what's going on.
  • 2:10 - 2:12
    This format is totally arbitrary,
  • 2:12 - 2:14
    but it's going to be very easy for us to use
  • 2:14 - 2:16
    for assignments and to test your knowledge.
  • 2:16 - 2:19
    For tokens, we used a "t_" to mean
  • 2:19 - 2:21
    I'm defining a rule for a token.
  • 2:21 - 2:23
    For parsing rules,
  • 2:23 - 2:25
    we're going to use a "p_"
  • 2:25 - 2:27
    to define the name of a parsing rule.
  • 2:27 - 2:29
    And then here--just to help us out--
  • 2:29 - 2:32
    we're going to write down what the left-hand side of the rule is.
  • 2:32 - 2:34
    This is how you parse an expression
  • 2:34 - 2:36
    when that expression is a number.
  • 2:36 - 2:39
    And just as out token rules were, in some sense,
  • 2:39 - 2:42
    under-the-hood functions of this object, (t),
  • 2:42 - 2:46
    our parsing rules are under-the-hood functions of this object, (p).
  • 2:46 - 2:49
    And this is the parse tree--
  • 2:49 - 2:51
    or, more accurately, a number of parse trees.
  • 2:51 - 2:53
    Here's our rule, written out,
  • 2:53 - 2:55
    and this is very similar to:
  • 2:55 - 3:00
    (exp --> number)--except that there's no great way to write the arrow,
  • 3:00 - 3:02
    so instead, we'll just write a colon, by convention.
  • 3:02 - 3:05
    But you should view this as the arrow.
  • 3:05 - 3:08
    So this is: expression can be rewritten as NUMBER,
  • 3:08 - 3:10
    and we just put it in quotes, like a string, and then down here
  • 3:10 - 3:13
    we have to tell Python--or tell our parsing library--
  • 3:13 - 3:16
    how to build up the abstract syntax tree.
  • 3:16 - 3:20
    p[0] is our returned parse tree.
  • 3:20 - 3:25
    The numbering here is every one of these elements of our grammar rule--
  • 3:25 - 3:28
    except the colon gets a number.
  • 3:28 - 3:30
    So the expression on the left is zero,
  • 3:30 - 3:32
    This NUMBER over here is 1.
  • 3:32 - 3:35
    So the parse tree I want associated with this expression,
  • 3:35 - 3:37
    when we're all done,
  • 3:37 - 3:41
    is a tuple that I make, by combining the word "number"
  • 3:41 - 3:43
    with the value of this actual token.
  • 3:43 - 3:45
    Let me show you another one of these,
  • 3:45 - 3:47
    and then it'll be a little clearer.
  • 3:47 - 3:49
    So here, once again, I start with the (p_).
  • 3:49 - 3:51
    We're going to do that for all of our parsing rules.
  • 3:51 - 3:54
    Here's what I'm telling you how to parse; I'm telling you how to parse an expression.
  • 3:54 - 3:57
    There might be multiple different ways to parse an expression.
  • 3:57 - 3:59
    It could be a number, it could be a (not) expression.
  • 3:59 - 4:02
    So we use another underscore, in being a little more specific.
  • 4:02 - 4:05
    And then down here I've written out my grammar rule
  • 4:05 - 4:07
    in almost English--and again, this colon
  • 4:07 - 4:10
    is like the arrow that we would normally draw.
  • 4:10 - 4:12
    And then below that, I have written out
  • 4:12 - 4:15
    how to construct the final abstract syntax tree.
  • 4:15 - 4:18
    This expression is number zero, this (not) is number 1,
  • 4:18 - 4:21
    This expression is number 2, so we want our
  • 4:21 - 4:23
    parse tree for number zero
  • 4:23 - 4:26
    to be the tuple I make, by putting the word "not"--
  • 4:26 - 4:30
    so that I know what it is--in front of the parse tree for number 2.
  • 4:30 - 4:34
    If we were to see the input: NOT 5
  • 4:34 - 4:38
    executing these two rules, in the right order--
  • 4:38 - 4:40
    this one first, and then that one--
  • 4:40 - 4:42
    would give us this tree:
  • 4:42 - 4:45
    "not", ("number", 5).
  • 4:45 - 4:47
    Note the nesting.
  • 4:47 - 4:49
    I could alternatively draw it as--
  • 4:49 -
    This is just a Python was of encoding this visual representation.
Title:
04-46 Parse Trees
Description:

dummy description

more » « less
Video Language:
English
Team:
Udacity
Project:
CS262 - Programming Languages
Duration:
04:54
Amara Bot added a translation

English subtitles

Revisions