English subtitles

← 03-03 Regular Expressions

dummy description

Get Embed Code
1 Language

Showing Revision 1 created 04/29/2012 by Amara Bot.

  1. We'll start with a language that many of you may be familiar with--
  2. the language of regular expressions.
  3. You've seen them if you've taken CS101.
  4. Maybe you've seen them elsewhere.
  5. In any event, we'll give an overview of them.
  6. There's a language for regular expressions.
  7. They can be expressed as strings.
  8. For example, the string abc* describes a language,
  9. and that language consists of any number of a's followed by any number of b's
  10. followed by any number of c's.
  11. Elements of this language include the strings abc, aaa, bcc.
  12. Stars can be any number, so it could be zero of them.
  13. Say, just b would be an example.
  14. The empty string all by itself would be an example.
  15. Ccccc would be an example. An so on.
  16. Now, there's a whole language of symbols like + and ? and so on for regular expressions.
  17. To make sense of them, we have to be able to describe what are the possible grammars
  18. and then what are the possible languages that those grammar correspond to?
  19. A grammar is a description of a language,
  20. and a language is a set of strings.
  21. Now, this form of description of the grammar as a long sequence of characters
  22. is convenient when you're quickly typing something in,
  23. but it can be difficult to work with. Grammar expressions get long.
  24. So we're going to describe the possible grammars in a format that's more compositional.
  25. In other words, what I'm going to describe is an API,
  26. which stands for application programming interface.
  27. This is the interface that a programmer uses rather than the UI or user interface
  28. that a user uses when you click with your mouse.
  29. We'll describe a series of functions calls that can be used
  30. to describe the grammar of a regular expression.
  31. We'll say that a regular expression can be built up by these types of calls.
  32. First, a literal of some string S.
  33. For example, if we say lit(a) then that describes the language consisting
  34. of just the character string "a" and nothing else.
  35. We have the API call seq(x, y).
  36. We could say seq(lit(a), lit(b)), and that would consist of just the string "ab."
  37. So far not very interesting.
  38. Then we could say alt(x, y,).
  39. Similarly, alt(lit(a), lit(b)), and that would consist of two possibilities--
  40. either the string "a" or the string "b."
  41. We'll use the standard notation for the name of our API call here.
  42. Star(x) stands for any number of repetitions--zero or more.
  43. Star(lit(a)) would be the empty string or "a" or "aa" and so on.
  44. We can say oneof(c) and then string of possible characters.
  45. That's that same as the alternative of all the individual characters.
  46. Oneof('abc') matches "a" or "b" or "c."
  47. It's a constrained version of the alt function.
  48. We'll use the symbol "eol," standing for "end of line"
  49. to match only the end of a character string and nowhere else.
  50. What matches is the empty string, but it matches only at the end.
  51. The only example we can give is "eol" itself,
  52. and we can give an example of seq(lit('a'), eol),
  53. and that matches exactly the "a" and nothing else at the end.
  54. Then we'll add "dot," which matches any possible character--
  55. a, b, c, any other character in the alphabet.