[Script Info] Title: [Events] Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text Dialogue: 0,0:00:00.41,0:00:03.99,Default,,0000,0000,0000,,In this segment I'm going to show you that dependency syntax Dialogue: 0,0:00:03.99,0:00:09.04,Default,,0000,0000,0000,,is a very natural representation for relation extraction applications. Dialogue: 0,0:00:10.70,0:00:16.50,Default,,0000,0000,0000,,One domain in which a lot of work has been done on relation extraction is in the biomedical text domain. Dialogue: 0,0:00:16.50,0:00:19.41,Default,,0000,0000,0000,,So here for example, we have the sentence Dialogue: 0,0:00:19.41,0:00:26.20,Default,,0000,0000,0000,,“The results demonstrated that KaiC interacts rhythmically with SasA, KaiA, and KaiB.” Dialogue: 0,0:00:26.20,0:00:30.56,Default,,0000,0000,0000,,And what we’d like to get out of that is a protein interaction event. Dialogue: 0,0:00:30.56,0:00:34.63,Default,,0000,0000,0000,,So here’s the “interacts” that indicates the relation, Dialogue: 0,0:00:34.63,0:00:36.75,Default,,0000,0000,0000,,and these are the proteins involved. Dialogue: 0,0:00:36.75,0:00:40.16,Default,,0000,0000,0000,,And there are a bunch of other proteins involved as well. Dialogue: 0,0:00:40.54,0:00:48.22,Default,,0000,0000,0000,,Well, the point we get out of here is that if we can have this kind of dependency syntax, Dialogue: 0,0:00:48.22,0:00:55.21,Default,,0000,0000,0000,,then it's very easy starting from here to follow along the arguments of the subject and the preposition “with” Dialogue: 0,0:00:55.21,0:00:59.37,Default,,0000,0000,0000,,and to easily see the relation that we’d like to get out. Dialogue: 0,0:00:59.37,0:01:01.71,Default,,0000,0000,0000,,And if we're just a little bit cleverer, Dialogue: 0,0:01:01.71,0:01:05.81,Default,,0000,0000,0000,,we can then also follow along the conjunction relations Dialogue: 0,0:01:05.81,0:01:12.97,Default,,0000,0000,0000,,and see that KaiC is also interacting with these other two proteins. Dialogue: 0,0:01:14.26,0:01:17.36,Default,,0000,0000,0000,,And that's something that a lot of people have worked on. Dialogue: 0,0:01:17.36,0:01:24.36,Default,,0000,0000,0000,,In particular, one representation that’s being widely used for relation extraction applications in biomedicine Dialogue: 0,0:01:24.36,0:01:27.80,Default,,0000,0000,0000,,is the Stanford dependencies representation. Dialogue: 0,0:01:27.80,0:01:33.64,Default,,0000,0000,0000,,So the basic form of this representation is as a projective dependency tree. Dialogue: 0,0:01:33.64,0:01:40.70,Default,,0000,0000,0000,,And it was designed that way so it could be easily generated by postprocessing of phrase structure trees. Dialogue: 0,0:01:40.70,0:01:44.08,Default,,0000,0000,0000,,So if you have a notion of headedness in the phrase structure tree, Dialogue: 0,0:01:44.08,0:01:49.64,Default,,0000,0000,0000,,the Stanford dependency software provides a set of matching pattern rules Dialogue: 0,0:01:49.64,0:01:55.29,Default,,0000,0000,0000,,that will then type the dependency relations and give you out a Stanford dependency tree. Dialogue: 0,0:01:55.29,0:02:01.100,Default,,0000,0000,0000,,But Stanford dependencies can also be, and now increasingly are generated directly Dialogue: 0,0:02:01.100,0:02:06.75,Default,,0000,0000,0000,,by dependency parsers such as the MaltParser that we looked at recently. Dialogue: 0,0:02:07.32,0:02:11.47,Default,,0000,0000,0000,,Okay, so this is roughly what the representation looks like. Dialogue: 0,0:02:11.47,0:02:13.30,Default,,0000,0000,0000,,So it's just as we saw before, Dialogue: 0,0:02:13.30,0:02:17.86,Default,,0000,0000,0000,,with the words connected by type dependency arcs. Dialogue: 0,0:02:19.66,0:02:24.24,Default,,0000,0000,0000,,But something that has been explored in the Stanford dependencies framework Dialogue: 0,0:02:24.24,0:02:27.77,Default,,0000,0000,0000,,is, starting from that basic dependencies representation, Dialogue: 0,0:02:27.77,0:02:34.05,Default,,0000,0000,0000,,let’s make some changes to it to facilitate relation extraction applications. Dialogue: 0,0:02:34.05,0:02:38.48,Default,,0000,0000,0000,,And the idea here is to emphasize the relationships Dialogue: 0,0:02:38.48,0:02:43.30,Default,,0000,0000,0000,,between content words that are useful for relation extraction applications. Dialogue: 0,0:02:43.30,0:02:45.39,Default,,0000,0000,0000,,Let me give a couple of examples. Dialogue: 0,0:02:45.39,0:02:51.55,Default,,0000,0000,0000,,So, one example is that commonly you’ll have a content word like “based” Dialogue: 0,0:02:51.55,0:02:56.60,Default,,0000,0000,0000,,and where the company here is based—Los Angeles— Dialogue: 0,0:02:56.60,0:03:01.03,Default,,0000,0000,0000,,and it’s separated by this preposition “in”, a function word. Dialogue: 0,0:03:01.03,0:03:07.10,Default,,0000,0000,0000,,And you can think of these function words as really functioning like case markers in a lot of other languages. Dialogue: 0,0:03:07.10,0:03:11.41,Default,,0000,0000,0000,,So it’d seem more useful if we directly connected “based” and “LA”, Dialogue: 0,0:03:11.41,0:03:15.03,Default,,0000,0000,0000,,and we introduced the relationship of “prep_in”. Dialogue: 0,0:03:15.91,0:03:20.73,Default,,0000,0000,0000,,And so that’s what we do, and we simplify the structure. Dialogue: 0,0:03:20.73,0:03:22.98,Default,,0000,0000,0000,,But there are some other places, too, Dialogue: 0,0:03:22.98,0:03:29.65,Default,,0000,0000,0000,,in which we can do a better job at representing the semantics with some modifications of the graph structure. Dialogue: 0,0:03:29.65,0:03:34.87,Default,,0000,0000,0000,,And so a particular place of that is these coordination relationships. Dialogue: 0,0:03:34.87,0:03:40.39,Default,,0000,0000,0000,,So we very directly got here that “Bell makes products”. Dialogue: 0,0:03:40.39,0:03:44.16,Default,,0000,0000,0000,,But we’d also like to get out that Bell distributes products, Dialogue: 0,0:03:44.16,0:03:51.82,Default,,0000,0000,0000,,and one way we could do that is by recognizing this “and” relationship Dialogue: 0,0:03:51.82,0:04:01.82,Default,,0000,0000,0000,,and saying “Okay, well that means that ‘Bell’ should also be the subject of ‘distributing’ Dialogue: 0,0:04:03.16,0:04:07.49,Default,,0000,0000,0000,,and what they distribute is ‘products.’” Dialogue: 0,0:04:09.43,0:04:11.32,Default,,0000,0000,0000,,And similarly down here, Dialogue: 0,0:04:11.32,0:04:21.10,Default,,0000,0000,0000,,we can recognize that they’re computer products as well as electronic products. Dialogue: 0,0:04:21.78,0:04:24.61,Default,,0000,0000,0000,,So we can make those changes to the graph, Dialogue: 0,0:04:24.61,0:04:28.12,Default,,0000,0000,0000,,and get a reduced graph representation. Dialogue: 0,0:04:28.60,0:04:33.49,Default,,0000,0000,0000,,Now, once you do this, there are some things that are not as simple. Dialogue: 0,0:04:33.49,0:04:38.86,Default,,0000,0000,0000,,In particular, if you look at this structure, it’s no longer a dependency tree Dialogue: 0,0:04:38.86,0:04:43.02,Default,,0000,0000,0000,,because we have multiple arcs pointing at this node, Dialogue: 0,0:04:43.02,0:04:46.13,Default,,0000,0000,0000,,and multiple arcs pointing at this node. Dialogue: 0,0:04:47.25,0:04:48.57,Default,,0000,0000,0000,,But on the other hand, Dialogue: 0,0:04:48.57,0:04:54.59,Default,,0000,0000,0000,,the relations that we’d like to extract are represented much more directly. Dialogue: 0,0:04:54.59,0:04:58.01,Default,,0000,0000,0000,,And let me just show you one graph that gives an indication of this. Dialogue: 0,0:04:58.65,0:05:06.42,Default,,0000,0000,0000,,So, this was a graph that was originally put together by Jari Björne et al, Dialogue: 0,0:05:06.42,0:05:12.46,Default,,0000,0000,0000,,who were the team that won the BioNLP 2009 shared tasks in relation extraction Dialogue: 0,0:05:12.46,0:05:17.50,Default,,0000,0000,0000,,using, as the representational substrate, Stanford dependencies. Dialogue: 0,0:05:17.50,0:05:20.68,Default,,0000,0000,0000,,And what they wanted to illustrate with this graph Dialogue: 0,0:05:20.68,0:05:25.23,Default,,0000,0000,0000,,is how much more effective dependency structures were Dialogue: 0,0:05:25.23,0:05:30.86,Default,,0000,0000,0000,,at linking up the words that you wanted to extract in a relation, Dialogue: 0,0:05:30.86,0:05:34.76,Default,,0000,0000,0000,,than simply looking for words in the linear context. Dialogue: 0,0:05:35.43,0:05:40.40,Default,,0000,0000,0000,,So, here what we have is that this is the distance Dialogue: 0,0:05:40.92,0:05:45.89,Default,,0000,0000,0000,,which can be measured either by just counting words to the left or right, Dialogue: 0,0:05:45.89,0:05:50.04,Default,,0000,0000,0000,,or by counting the number of dependency arcs that you have to follow. Dialogue: 0,0:05:50.04,0:05:53.32,Default,,0000,0000,0000,,And this is the percent of time that it occurred. Dialogue: 0,0:05:53.32,0:05:56.34,Default,,0000,0000,0000,,And so what you see is, if you just look at linear distance, Dialogue: 0,0:05:56.34,0:06:02.89,Default,,0000,0000,0000,,there are lots of times that there are arguments and relations that you want to connect out Dialogue: 0,0:06:02.89,0:06:06.22,Default,,0000,0000,0000,,that are four, five, six, seven, eight words away. Dialogue: 0,0:06:06.22,0:06:11.73,Default,,0000,0000,0000,,In fact, there’s even a pretty large residue here of well over ten percent Dialogue: 0,0:06:11.73,0:06:16.77,Default,,0000,0000,0000,,where the linear distance away in words is greater than ten words. Dialogue: 0,0:06:16.77,0:06:21.18,Default,,0000,0000,0000,,If on the other hand though, you are trying to identify, Dialogue: 0,0:06:21.18,0:06:25.64,Default,,0000,0000,0000,,relate the arguments of relations by looking at the dependency distance, Dialogue: 0,0:06:25.64,0:06:30.46,Default,,0000,0000,0000,,then what you’d discover is that the vast majority of the arguments Dialogue: 0,0:06:30.46,0:06:35.43,Default,,0000,0000,0000,,are very close-by neighbors in terms of dependency distance. Dialogue: 0,0:06:35.43,0:06:42.07,Default,,0000,0000,0000,,So, about 47 percent of them are direct dependencies and another 30 percent of distance too. Dialogue: 0,0:06:42.07,0:06:48.51,Default,,0000,0000,0000,,So take those together and that’s greater than three quarters of the dependencies that you want to find. Dialogue: 0,0:06:48.51,0:06:51.54,Default,,0000,0000,0000,,And then this number trails away quickly. Dialogue: 0,0:06:51.54,0:06:59.43,Default,,0000,0000,0000,,So there are virtually no arguments of relations that aren’t fairly close together in dependency distance Dialogue: 0,0:06:59.43,0:07:02.62,Default,,0000,0000,0000,,and it’s precisely because of this reason that you can get Dialogue: 0,0:07:02.62,0:07:09.62,Default,,0000,0000,0000,,a lot of mileage in doing relation extraction by having a representation-like dependency syntax. Dialogue: 0,0:07:11.45,0:07:16.05,Default,,0000,0000,0000,,Okay, I hope that’s given you some idea of why knowing about syntax is useful, Dialogue: 0,0:07:16.05,9:59:59.99,Default,,0000,0000,0000,,when you want to do various semantic tasks in natural language processing.