WEBVTT 00:00:06.833 --> 00:00:08.725 "Some are born great, 00:00:08.725 --> 00:00:10.620 some achieve greatness, 00:00:10.620 --> 00:00:15.711 and others have greatness thrust upon them", quoth William Shakespeare. 00:00:15.711 --> 00:00:17.301 Or did he? 00:00:17.301 --> 00:00:21.969 Some people question whether Shakespeare really wrote the works that bear his name, 00:00:21.969 --> 00:00:24.889 or whether he even existed at all. 00:00:24.889 --> 00:00:28.809 They speculate that Shakespeare was a pseudonym for another writer, 00:00:28.809 --> 00:00:30.236 or a group of writers. 00:00:30.236 --> 00:00:32.348 Proposed candidates for the real Shakespeare 00:00:32.348 --> 00:00:37.945 include other famous playwrights, politicians and even some prominent women. 00:00:37.945 --> 00:00:41.436 Could it be true that the greatest writer in the English language 00:00:41.436 --> 00:00:44.941 was as fictional as his plays? 00:00:44.941 --> 00:00:47.867 Most Shakespeare scholars dismiss these theories 00:00:47.867 --> 00:00:51.439 based on historical and biographical evidence. 00:00:51.439 --> 00:00:55.514 But there is another way to test whether Shakespeare's famous lines 00:00:55.514 --> 00:00:58.500 were actually written by someone else. 00:00:58.500 --> 00:01:00.689 Linguistics, the study of language, 00:01:00.689 --> 00:01:04.090 can tell us a great deal about the way we speak and write 00:01:04.090 --> 00:01:09.585 by examining syntax, grammar, semantics and vocabulary. 00:01:09.585 --> 00:01:11.416 And in the late 1800s, 00:01:11.416 --> 00:01:15.447 a Polish philosopher named Wincenty Lutosławski 00:01:15.447 --> 00:01:18.226 formalized a method known as stylometry, 00:01:18.226 --> 00:01:23.428 applying this knowledge to investigate questions of literary authorship. 00:01:23.428 --> 00:01:25.395 So how does stylometry work? 00:01:25.395 --> 00:01:29.279 The idea is that each writer's style has certain characteristics 00:01:29.279 --> 00:01:33.613 that remain fairly uniform among individual works. 00:01:33.613 --> 00:01:37.094 Examples of characteristics include average sentence length, 00:01:37.094 --> 00:01:38.953 the arrangement of words, 00:01:38.953 --> 00:01:42.487 and even the number of occurrences of a particular word. 00:01:42.487 --> 00:01:47.566 Let's look at use of the word thee and visualize it as a dimension, or axis. 00:01:47.566 --> 00:01:50.554 Each of Shakespeare's works can be placed on that axis, 00:01:50.554 --> 00:01:54.668 like a data point, based on the number of occurrences of that word. 00:01:54.668 --> 00:01:57.235 In statistics, the tightness of these points 00:01:57.235 --> 00:02:02.498 gives us what is known as the variance, an expected range for our data. 00:02:02.498 --> 00:02:07.995 But, this is only a single characteristic in a very high-dimensional space. 00:02:07.995 --> 00:02:11.340 With a clustering tool called Principal Component Analysis, 00:02:11.340 --> 00:02:16.131 we can reduce the multidimensional space into simple principal components 00:02:16.131 --> 00:02:19.905 that collectively measure the variance in Shakespeare's works. 00:02:19.905 --> 00:02:22.396 We can then test the works of our candidates 00:02:22.396 --> 00:02:24.867 against those principal components. 00:02:24.867 --> 00:02:26.055 For example, 00:02:26.055 --> 00:02:30.394 if enough works of Francis Bacon fall within the Shakespearean variance, 00:02:30.394 --> 00:02:32.263 that would be pretty strong evidence 00:02:32.263 --> 00:02:37.045 that Francis Bacon and Shakespeare are actually the same person. 00:02:37.045 --> 00:02:39.161 What did the results show? 00:02:39.161 --> 00:02:42.477 Well, the stylometrists who carried this out have concluded 00:02:42.477 --> 00:02:46.557 that Shakespeare is none other than Shakespeare. 00:02:46.557 --> 00:02:49.191 The Bard is the Bard. 00:02:49.191 --> 00:02:54.370 The pretender's works just don't match up with Shakespeare's signature style. 00:02:54.370 --> 00:02:57.642 However, our intrepid statisticians did find 00:02:57.642 --> 00:03:00.884 some compelling evidence of collaborations. 00:03:00.884 --> 00:03:03.138 For instance, one recent study concluded 00:03:03.138 --> 00:03:08.216 that Shakespeare worked with playwright Christopher Marlowe on "Henry VI," 00:03:08.216 --> 00:03:10.624 parts one and two. 00:03:10.624 --> 00:03:15.642 Shakespeare's identity is only one of the many problems stylometry can resolve. 00:03:15.642 --> 00:03:18.308 It can help us determine when a work was written, 00:03:18.308 --> 00:03:21.040 whether an ancient text is a forgery, 00:03:21.040 --> 00:03:23.685 whether a student has committed plagiarism, 00:03:23.685 --> 00:03:29.020 or if that email you just received is of a high priority or spam. 00:03:29.020 --> 00:03:31.551 And does the timeless poetry of Shakespeare's lines 00:03:31.551 --> 00:03:34.475 just boil down to numbers and statistics? 00:03:34.475 --> 00:03:35.885 Not quite. 00:03:35.885 --> 00:03:40.900 Stylometric analysis may reveal what makes Shakespeare's works structurally distinct, 00:03:40.900 --> 00:03:45.525 but it cannot capture the beauty of the sentiments and emotions they express, 00:03:45.525 --> 00:03:48.509 or why they affect us the way they do. 00:03:48.509 --> 00:03:50.826 At least, not yet.