Here's an example of translation. And this is a phrase-based translation that doesn't happen to use any syntactic trees.
And in this case we're using an example going from German to English.
And in this model we break up the probability of the translation into three components.
First, a segmentation model of how do we break up the German into phrases?
Here the sentence has been broken up into 1-2-3-4-5 phrases.
Then a translation model. For each phrase, what's a good translation into English?
And then a distortion model, saying, "Each of these phrases, what order would be a good order to put them into?"
So let's look at each of those in turn. First, the segmentation model.
We have a database of phrases that we picked out--maybe through a similar process to what went on in the Chinese menu,
where we looked for coherent phrases that occurred frequently, and so we're able to apply probabilities to them.
So now we have a probability. What's the probability that "morgen" is a phrase and that "fliege" is a phrase by itself?
We would also consider the probability that they're considered a phrase together,
and come up with a high probability segmentation.
Next, the translation model. That's going between the two sides of the Chinese menu.
How often, when we saw the phrase "morgen", did it correspond to the phrase "tomorrow" in English?
And so on for the other phrases. So far, that's all pretty straight-forward.
And the we have the distortion model, saying, "In what order should we put these phrases? Should we swap them around in any order?"
And we measure that just by looking at the beginning and the ending of each phrase.
So Ai is the beginning of the i phrase, and Bi minus one is the ending of the i minus one phrase.
But we measure those in the German, but we consider the indexes, the I's, by looking at the English.
So we say, "This is the last phrase, is this phrase, 'in Canada,'" but that corresponds to this one here,
and so the beginning of that phrase is at number three, and the next to last phrase is this one, that corresponds to "zero confidence."
And the end of that phrase is at seven. And so the distortion there from three to seven is a distortion of four.
And our distortion model, then, would just be a probability distribution over those integers.
So it's not doing anything fancy in terms of saying what type of phrases occur before or after what other types of phrases.
It's just saying, "Are they shifted to the right or to the left? And are they shifted a small amount or a large amount?"
And I should note that in this model, if we had a one to one translation where things weren't switched--
So say, if the original German sentence had "zur Konferenz" before "nach Kanada," and we translated it into English like this,
then the Bi minus one would be five, and the Ai--imagine this being swapped over here--would also be five.
In that case the distortion would be zero. And so for a language where the words line up
very closely between the source and the target language--for those pairs of languages--then we'd have a high probability
mass under a zero distortion, and lower probability for other distortions.
In a language where lots of things are swapped far, a more volatile type of translation between the language pairs,
then we'd expect the probability mass to be lower for zero distortion, and higher for higher distortions.
So this is a very simple model. It takes only into account segmentation, translation between phrases,
and just the simplest model of distortion. You can imagine a more complex model based on trees and other components.
And I should note that this is just the translation part of the model.
And then to make the final choice, we would want to multiply out all these probabilities,
but we would also want to take into account the probability of the generated English sentence.
Is this a good sentence in English? And we have a probability model for that.
That's a monolingual model rather than a bilingual model. And the process of coming up with the best translation, then,
is just a search through all possible segmentations, all possible translations, all possible distortions,
multiply up these probabilities times the monolingual probability, and find the one that gives you the highest value,
and that'll be your best translation. And the tricky part is just coming up with a search technique that can enumerate
through many of those possibilities quickly, and choose a good one.