-
In this segment I'm going to show you that dependency syntax
-
is a very natural representation for relation extraction applications.
-
One domain in which a lot of work has been done on relation extraction is in the biomedical text domain.
-
So here for example, we have the sentence
-
“The results demonstrated that KaiC interacts rhythmically with SasA, KaiA, and KaiB.”
-
And what we’d like to get out of that is a protein interaction event.
-
So here’s the “interacts” that indicates the relation,
-
and these are the proteins involved.
-
And there are a bunch of other proteins involved as well.
-
Well, the point we get out of here is that if we can have this kind of dependency syntax,
-
then it's very easy starting from here to follow along the arguments of the subject and the preposition “with”
-
and to easily see the relation that we’d like to get out.
-
And if we're just a little bit cleverer,
-
we can then also follow along the conjunction relations
-
and see that KaiC is also interacting with these other two proteins.
-
And that's something that a lot of people have worked on.
-
In particular, one representation that’s being widely used for relation extraction applications in biomedicine
-
is the Stanford dependencies representation.
-
So the basic form of this representation is as a projective dependency tree.
-
And it was designed that way so it could be easily generated by postprocessing of phrase structure trees.
-
So if you have a notion of headedness in the phrase structure tree,
-
the Stanford dependency software provides a set of matching pattern rules
-
that will then type the dependency relations and give you out a Stanford dependency tree.
-
But Stanford dependencies can also be, and now increasingly are generated directly
-
by dependency parsers such as the MaltParser that we looked at recently.
-
Okay, so this is roughly what the representation looks like.
-
So it's just as we saw before,
-
with the words connected by type dependency arcs.
-
But something that has been explored in the Stanford dependencies framework
-
is, starting from that basic dependencies representation,
-
let’s make some changes to it to facilitate relation extraction applications.
-
And the idea here is to emphasize the relationships
-
between content words that are useful for relation extraction applications.
-
Let me give a couple of examples.
-
So, one example is that commonly you’ll have a content word like “based”
-
and where the company here is based—Los Angeles—
-
and it’s separated by this preposition “in”, a function word.
-
And you can think of these function words as really functioning like case markers in a lot of other languages.
-
So it’d seem more useful if we directly connected “based” and “LA”,
-
and we introduced the relationship of “prep_in”.
-
And so that’s what we do, and we simplify the structure.
-
But there are some other places, too,
-
in which we can do a better job at representing the semantics with some modifications of the graph structure.
-
And so a particular place of that is these coordination relationships.
-
So we very directly got here that “Bell makes products”.
-
But we’d also like to get out that Bell distributes products,
-
and one way we could do that is by recognizing this “and” relationship
-
and saying “Okay, well that means that ‘Bell’ should also be the subject of ‘distributing’
-
and what they distribute is ‘products.’”
-
And similarly down here,
-
we can recognize that they’re computer products as well as electronic products.
-
So we can make those changes to the graph,
-
and get a reduced graph representation.
-
Now, once you do this, there are some things that are not as simple.
-
In particular, if you look at this structure, it’s no longer a dependency tree
-
because we have multiple arcs pointing at this node,
-
and multiple arcs pointing at this node.
-
But on the other hand,
-
the relations that we’d like to extract are represented much more directly.
-
And let me just show you one graph that gives an indication of this.
-
So, this was a graph that was originally put together by Jari Björne et al,
-
who were the team that won the BioNLP 2009 shared tasks in relation extraction
-
using, as the representational substrate, Stanford dependencies.
-
And what they wanted to illustrate with this graph
-
is how much more effective dependency structures were
-
at linking up the words that you wanted to extract in a relation,
-
than simply looking for words in the linear context.
-
So, here what we have is that this is the distance
-
which can be measured either by just counting words to the left or right,
-
or by counting the number of dependency arcs that you have to follow.
-
And this is the percent of time that it occurred.
-
And so what you see is, if you just look at linear distance,
-
there are lots of times that there are arguments and relations that you want to connect out
-
that are four, five, six, seven, eight words away.
-
In fact, there’s even a pretty large residue here of well over ten percent
-
where the linear distance away in words is greater than ten words.
-
If on the other hand though, you are trying to identify,
-
relate the arguments of relations by looking at the dependency distance,
-
then what you’d discover is that the vast majority of the arguments
-
are very close-by neighbors in terms of dependency distance.
-
So, about 47 percent of them are direct dependencies and another 30 percent of distance too.
-
So take those together and that’s greater than three quarters of the dependencies that you want to find.
-
And then this number trails away quickly.
-
So there are virtually no arguments of relations that aren’t fairly close together in dependency distance
-
and it’s precisely because of this reason that you can get
-
a lot of mileage in doing relation extraction by having a representation-like dependency syntax.
-
Okay, I hope that’s given you some idea of why knowing about syntax is useful,
-
when you want to do various semantic tasks in natural language processing.