English subtitles

← 03-05 Concept Inventory

dummy description

Get Embed Code
1 Language

Showing Revision 1 created 04/29/2012 by Amara Bot.

  1. I would like to start out with an inventory of the concepts that are going to be used.
  2. So far we have the notion of a pattern or a regular expression,
  3. of a text to match against, of a result, which will also be a string of some kind.
  4. It doesn't seem like there's all that many more concepts.
  5. One thing that it looks like we'll need is some notion of a partial result,
  6. and some notion of control over the iteration.
  7. What do I mean by that?
  8. Well, some of the matches are easy.
  9. If we match search for a literal 'def' within 'abcdef,'
  10. it's easy to imagine just going down the line of this string and saying
  11. does it match here? No. Here? No. Here? No. Here? Yes. We'll return that result.
  12. But what if we're matching with the pattern--let's say we have the expression 'a*b+'--
  13. any number of a's followed by one or more b's.
  14. In our API notation, we would write that as seq(start(lit('a') plus(lit('b')).
  15. Let's say we're matching that against the string 'aaab.'
  16. Now if we had a control structure that says sequence, look to match the first,
  17. and then look at the second, and if the first--star(lit('a'))--only had one possible result,
  18. then it would say, yes, it matches here right at the start, now look for something after that.
  19. Does it match plus(lit('b'))? No, it doesn't.
  20. I've got to have iteration over the possible results.
  21. I have to say that star(lit('a')) can match in more than one location.
  22. It can match with zero instances of a, with 1, with 2, with 3,
  23. and it's only after 3 that then we can match the second part, find the b,
  24. and then find that the whole expression matches.
  25. That's going to be the tricky part is worrying about this control
  26. when one part of a sequence doesn't match or similarly when we have an alternative
  27. between two possibilities--a or b or alt(a, b).
  28. This trickiness seems like it's going to be difficult,
  29. but it all resolves itself after we make one good choice.
  30. It takes some experience to see what that choice can be,
  31. but if we decide to represent these partial results as a set of remainders of the text,
  32. then everything falls into place. What do I mean by remainder?
  33. I mean everything else after the match.
  34. If we match a literal "a" the remainder after we match zero characters of this string
  35. would be "aaab"--three a's followed by a b.
  36. The remainder after we match one a would be two a's followed by a b and so on.
  37. What I'm going to do is define an auxiliary function called "match set,"
  38. and it takes a pattern and a text, and it returns this set of remainders.
  39. When given just this pattern here as the input, star(lit('a')), and this text as the text
  40. then the remainder would be the set consisting of three a's and a b,
  41. two a's and a b, one a and a b, or just b.
  42. In other words, star(lit('a')) could have consumed 0, 1, 2, or 3 a's,
  43. and that's the remainder that's left over.
  44. So the result will be this set.