-
Title:
Unit 22 15 Translation Examplemp4
-
Description:
-
Here's an example of translation. And this is a phrase-based translation that doesn't happen to use any syntactic trees.
-
And in this case we're using an example going from German to English.
-
And in this model we break up the probability of the translation into three components.
-
First, a segmentation model of how do we break up the German into phrases?
-
Here the sentence has been broken up into 1-2-3-4-5 phrases.
-
Then a translation model. For each phrase, what's a good translation into English?
-
And then a distortion model, saying, "Each of these phrases, what order would be a good order to put them into?"
-
So let's look at each of those in turn. First, the segmentation model.
-
We have a database of phrases that we picked out--maybe through a similar process to what went on in the Chinese menu,
-
where we looked for coherent phrases that occurred frequently, and so we're able to apply probabilities to them.
-
So now we have a probability. What's the probability that "morgen" is a phrase and that "fliege" is a phrase by itself?
-
We would also consider the probability that they're considered a phrase together,
-
and come up with a high probability segmentation.
-
Next, the translation model. That's going between the two sides of the Chinese menu.
-
How often, when we saw the phrase "morgen", did it correspond to the phrase "tomorrow" in English?
-
And so on for the other phrases. So far, that's all pretty straight-forward.
-
And the we have the distortion model, saying, "In what order should we put these phrases? Should we swap them around in any order?"
-
And we measure that just by looking at the beginning and the ending of each phrase.
-
So Ai is the beginning of the i phrase, and Bi minus one is the ending of the i minus one phrase.
-
But we measure those in the German, but we consider the indexes, the I's, by looking at the English.
-
So we say, "This is the last phrase, is this phrase, 'in Canada,'" but that corresponds to this one here,
-
and so the beginning of that phrase is at number three, and the next to last phrase is this one, that corresponds to "zero confidence."
-
And the end of that phrase is at seven. And so the distortion there from three to seven is a distortion of four.
-
And our distortion model, then, would just be a probability distribution over those integers.
-
So it's not doing anything fancy in terms of saying what type of phrases occur before or after what other types of phrases.
-
It's just saying, "Are they shifted to the right or to the left? And are they shifted a small amount or a large amount?"
-
And I should note that in this model, if we had a one to one translation where things weren't switched--
-
So say, if the original German sentence had "zur Konferenz" before "nach Kanada," and we translated it into English like this,
-
then the Bi minus one would be five, and the Ai--imagine this being swapped over here--would also be five.
-
In that case the distortion would be zero. And so for a language where the words line up
-
very closely between the source and the target language--for those pairs of languages--then we'd have a high probability
-
mass under a zero distortion, and lower probability for other distortions.
-
In a language where lots of things are swapped far, a more volatile type of translation between the language pairs,
-
then we'd expect the probability mass to be lower for zero distortion, and higher for higher distortions.
-
So this is a very simple model. It takes only into account segmentation, translation between phrases,
-
and just the simplest model of distortion. You can imagine a more complex model based on trees and other components.
-
And I should note that this is just the translation part of the model.
-
And then to make the final choice, we would want to multiply out all these probabilities,
-
but we would also want to take into account the probability of the generated English sentence.
-
Is this a good sentence in English? And we have a probability model for that.
-
That's a monolingual model rather than a bilingual model. And the process of coming up with the best translation, then,
-
is just a search through all possible segmentations, all possible translations, all possible distortions,
-
multiply up these probabilities times the monolingual probability, and find the one that gives you the highest value,
-
and that'll be your best translation. And the tricky part is just coming up with a search technique that can enumerate
-
through many of those possibilities quickly, and choose a good one.