1 00:00:06,833 --> 00:00:08,725 "Some are born great, 2 00:00:08,725 --> 00:00:10,620 some achieve greatness, 3 00:00:10,620 --> 00:00:15,711 and others have greatness thrust upon them", quoth William Shakespeare. 4 00:00:15,711 --> 00:00:17,301 Or did he? 5 00:00:17,301 --> 00:00:21,969 Some people question whether Shakespeare really wrote the works that bear his name, 6 00:00:21,969 --> 00:00:24,889 or whether he even existed at all. 7 00:00:24,889 --> 00:00:28,809 They speculate that Shakespeare was a pseudonym for another writer, 8 00:00:28,809 --> 00:00:30,236 or a group of writers. 9 00:00:30,236 --> 00:00:32,348 Proposed candidates for the real Shakespeare 10 00:00:32,348 --> 00:00:37,945 include other famous playwrights, politicians and even some prominent women. 11 00:00:37,945 --> 00:00:41,436 Could it be true that the greatest writer in the English language 12 00:00:41,436 --> 00:00:44,941 was as fictional as his plays? 13 00:00:44,941 --> 00:00:47,867 Most Shakespeare scholars dismiss these theories 14 00:00:47,867 --> 00:00:51,439 based on historical and biographical evidence. 15 00:00:51,439 --> 00:00:55,514 But there is another way to test whether Shakespeare's famous lines 16 00:00:55,514 --> 00:00:58,500 were actually written by someone else. 17 00:00:58,500 --> 00:01:00,689 Linguistics, the study of language, 18 00:01:00,689 --> 00:01:04,090 can tell us a great deal about the way we speak and write 19 00:01:04,090 --> 00:01:09,585 by examining syntax, grammar, semantics and vocabulary. 20 00:01:09,585 --> 00:01:11,416 And in the late 1800s, 21 00:01:11,416 --> 00:01:15,447 a Polish philosopher named Wincenty Lutosławski 22 00:01:15,447 --> 00:01:18,226 formalized a method known as stylometry, 23 00:01:18,226 --> 00:01:23,428 applying this knowledge to investigate questions of literary authorship. 24 00:01:23,428 --> 00:01:25,395 So how does stylometry work? 25 00:01:25,395 --> 00:01:29,279 The idea is that each writer's style has certain characteristics 26 00:01:29,279 --> 00:01:33,613 that remain fairly uniform among individual works. 27 00:01:33,613 --> 00:01:37,094 Examples of characteristics include average sentence length, 28 00:01:37,094 --> 00:01:38,953 the arrangement of words, 29 00:01:38,953 --> 00:01:42,487 and even the number of occurrences of a particular word. 30 00:01:42,487 --> 00:01:47,566 Let's look at use of the word thee and visualize it as a dimension, or axis. 31 00:01:47,566 --> 00:01:50,554 Each of Shakespeare's works can be placed on that axis, 32 00:01:50,554 --> 00:01:54,668 like a data point, based on the number of occurrences of that word. 33 00:01:54,668 --> 00:01:57,235 In statistics, the tightness of these points 34 00:01:57,235 --> 00:02:02,498 gives us what is known as the variance, an expected range for our data. 35 00:02:02,498 --> 00:02:07,995 But, this is only a single characteristic in a very high-dimensional space. 36 00:02:07,995 --> 00:02:11,340 With a clustering tool called Principal Component Analysis, 37 00:02:11,340 --> 00:02:16,131 we can reduce the multidimensional space into simple principal components 38 00:02:16,131 --> 00:02:19,905 that collectively measure the variance in Shakespeare's works. 39 00:02:19,905 --> 00:02:22,396 We can then test the works of our candidates 40 00:02:22,396 --> 00:02:24,867 against those principal components. 41 00:02:24,867 --> 00:02:26,055 For example, 42 00:02:26,055 --> 00:02:30,394 if enough works of Francis Bacon fall within the Shakespearean variance, 43 00:02:30,394 --> 00:02:32,263 that would be pretty strong evidence 44 00:02:32,263 --> 00:02:37,045 that Francis Bacon and Shakespeare are actually the same person. 45 00:02:37,045 --> 00:02:39,161 What did the results show? 46 00:02:39,161 --> 00:02:42,477 Well, the stylometrists who carried this out have concluded 47 00:02:42,477 --> 00:02:46,557 that Shakespeare is none other than Shakespeare. 48 00:02:46,557 --> 00:02:49,191 The Bard is the Bard. 49 00:02:49,191 --> 00:02:54,370 The pretender's works just don't match up with Shakespeare's signature style. 50 00:02:54,370 --> 00:02:57,642 However, our intrepid statisticians did find 51 00:02:57,642 --> 00:03:00,884 some compelling evidence of collaborations. 52 00:03:00,884 --> 00:03:03,138 For instance, one recent study concluded 53 00:03:03,138 --> 00:03:08,216 that Shakespeare worked with playwright Christopher Marlowe on "Henry VI," 54 00:03:08,216 --> 00:03:10,624 parts one and two. 55 00:03:10,624 --> 00:03:15,642 Shakespeare's identity is only one of the many problems stylometry can resolve. 56 00:03:15,642 --> 00:03:18,308 It can help us determine when a work was written, 57 00:03:18,308 --> 00:03:21,040 whether an ancient text is a forgery, 58 00:03:21,040 --> 00:03:23,685 whether a student has committed plagiarism, 59 00:03:23,685 --> 00:03:29,020 or if that email you just received is of a high priority or spam. 60 00:03:29,020 --> 00:03:31,551 And does the timeless poetry of Shakespeare's lines 61 00:03:31,551 --> 00:03:34,475 just boil down to numbers and statistics? 62 00:03:34,475 --> 00:03:35,885 Not quite. 63 00:03:35,885 --> 00:03:40,900 Stylometric analysis may reveal what makes Shakespeare's works structurally distinct, 64 00:03:40,900 --> 00:03:45,525 but it cannot capture the beauty of the sentiments and emotions they express, 65 00:03:45,525 --> 00:03:48,509 or why they affect us the way they do. 66 00:03:48,509 --> 00:03:50,826 At least, not yet.