1 00:00:00,000 --> 00:00:02,000 As we saw in that last quiz, 2 00:00:02,000 --> 00:00:08,000 it's not quite clear what to do when our token definitions overlap. 3 00:00:08,000 --> 00:00:13,000 The 7-character sequence "hello" 4 00:00:13,000 --> 00:00:15,000 matches our regular expression for word 5 00:00:15,000 --> 00:00:20,000 but also matches our regular expression for string. 6 00:00:20,000 --> 00:00:23,000 This is a problem not just with computer languages 7 00:00:23,000 --> 00:00:25,000 but also with natural languages. 8 00:00:25,000 --> 00:00:29,000 As the hypothetical owner of this restaurant would notice, 9 00:00:29,000 --> 00:00:31,000 we don't just serve hamburgers, we serve people 10 00:00:31,000 --> 00:00:34,000 could be interpreted the wrong way. 11 00:00:34,000 --> 00:00:37,000 Presumably those hamburgers are soylent green flavored. 12 00:00:37,000 --> 00:00:40,000 We want to have definitive rules for figuring out 13 00:00:40,000 --> 00:00:43,000 which of these we prefer. 14 00:00:43,000 --> 00:00:46,000 In fact, we're going to use a very simple rule. 15 00:00:46,000 --> 00:00:50,000 The first one you list wins, 16 00:00:50,000 --> 00:00:53,000 the one closer to the top of the file, 17 00:00:53,000 --> 00:00:58,000 so this is our big winner and is going to take priority over string. 18 00:00:58,000 --> 00:01:02,000 If you're making a lexical analyzer for HTML or JavaScript, 19 00:01:02,000 --> 00:01:08,000 ordering your token definitions is of prime importance. 20 00:01:08,000 --> 00:01:12,000 Let's investigate this issue in the form of a quiz. 21 00:01:12,000 --> 00:01:16,000 Suppose we have the input string hello, "world," 22 00:01:16,000 --> 00:01:19,000 and we really want that to yield word, 23 00:01:19,000 --> 00:01:23,000 the word hello, followed by a string. 24 00:01:23,000 --> 00:01:25,000 I'm going to list 3 rules for you, 25 00:01:25,000 --> 00:01:28,000 and I want you to tell me which one has to come last 26 00:01:28,000 --> 00:01:31,000 for us to get the desired effect. 27 00:01:31,000 --> 00:01:34,000 And here, because you've seen it all before, I'm eliding some of the details 28 00:01:34,000 --> 00:01:38,000 like the colon, token, blah, blah, blah. 29 00:01:38,000 --> 00:01:41,000 Instead what I'd like you to do is tell me 30 00:01:41,000 --> 00:01:44,000 which one of these functions, which one of these rules, 31 00:01:44,000 --> 00:01:49,000 would have to come last, bearing in mind that the one that comes first 32 00:01:49,000 --> 00:01:54,000 wins all ties in order for hello, "world" to break down into 33 00:01:54,000 --> 00:01:57,000 a word followed by a string.