Here's an example of translation. And this is a phrase-based translation that doesn't happen to use any syntactic trees.
And in this case we're using an example going from German to English.
And in this model we break up the probability of the translation into three components.
First, a segmentation model of how do we break up the German into phrases?
Here the sentence has been broken up into 1-2-3-4-5 phrases.
Then a translation model. For each phrase, what's a good translation into English?
And then a distortion model, saying, "Each of these phrases, what order would be a good order to put them into?"
So let's look at each of those in turn. First, the segmentation model.
We have a database of phrases that we picked out--maybe through a similar process to what went on in the Chinese menu,
where we looked for coherent phrases that occurred frequently, and so we're able to apply probabilities to them.
So now we have a probability. What's the probability that "morgen" is a phrase and that "fliege" is a phrase by itself?
We would also consider the probability that they're considered a phrase together,
and come up with a high probability segmentation.
Next, the translation model. That's going between the two sides of the Chinese menu.
How often, when we saw the phrase "morgen", did it correspond to the phrase "tomorrow" in English?
And so on for the other phrases. So far, that's all pretty straight-forward.
And the we have the distortion model, saying, "In what order should we put these phrases? Should we swap them around in any order?"
And we measure that just by looking at the beginning and the ending of each phrase.
So Ai is the beginning of the i phrase, and Bi minus one is the ending of the i minus one phrase.
But we measure those in the German, but we consider the indexes, the I's, by looking at the English.
So we say, "This is the last phrase, is this phrase, 'in Canada,'" but that corresponds to this one here,
and so the beginning of that phrase is at number three, and the next to last phrase is this one, that corresponds to "zero confidence."
And the end of that phrase is at seven. And so the distortion there from three to seven is a distortion of four.
And our distortion model, then, would just be a probability distribution over those integers.
So it's not doing anything fancy in terms of saying what type of phrases occur before or after what other types of phrases.
It's just saying, "Are they shifted to the right or to the left? And are they shifted a small amount or a large amount?"
And I should note that in this model, if we had a one to one translation where things weren't switched--
So say, if the original German sentence had "zur Konferenz" before "nach Kanada," and we translated it into English like this,
then the Bi minus one would be five, and the Ai--imagine this being swapped over here--would also be five.
In that case the distortion would be zero. And so for a language where the words line up
very closely between the source and the target language--for those pairs of languages--then we'd have a high probability
mass under a zero distortion, and lower probability for other distortions.
In a language where lots of things are swapped far, a more volatile type of translation between the language pairs,
then we'd expect the probability mass to be lower for zero distortion, and higher for higher distortions.
So this is a very simple model. It takes only into account segmentation, translation between phrases,
and just the simplest model of distortion. You can imagine a more complex model based on trees and other components.
And I should note that this is just the translation part of the model.
And then to make the final choice, we would want to multiply out all these probabilities,
but we would also want to take into account the probability of the generated English sentence.
Is this a good sentence in English? And we have a probability model for that.
That's a monolingual model rather than a bilingual model. And the process of coming up with the best translation, then,
is just a search through all possible segmentations, all possible translations, all possible distortions,
multiply up these probabilities times the monolingual probability, and find the one that gives you the highest value,
and that'll be your best translation. And the tricky part is just coming up with a search technique that can enumerate
through many of those possibilities quickly, and choose a good one.
これは構文木を使わない句単位の翻訳の例です
ドイツ語を英語に翻訳しています
このモデルで翻訳の確率を
3つの構成要素に分けます
まずドイツ語を句に分ける
分割モデルを見てみましょう
文章を1、2、3、4、5の句に分けます
翻訳モデルとしてそれぞれの句が
うまく英語に翻訳されているのはどれでしょう?
また並び替えモデルとしてそれぞれの句において
正しい順序とは何でしょうか?
順番に見ていきましょう まずは分割モデルです
句のデータベースがあります 中国語のメニューで
行ったのと同じプロセスを踏みましょう
頻繁に発生する句を探せば
その確率を句に適用できます
morgenという句と
fliegeという句の確率はいくつでしょうか?
確率も考慮して句を考えてみます
そして高い確率の区分が分かります
次は翻訳モデルです
中国語のメニューの両側にいきます
morgenという句はどのくらいtomorrowという
英語の句に対応していますか?
他の句に関ついても同じことです
今のところ簡単ですね
句の順番を考える
並び替えモデルというものがあります
それぞれの句の初めと終わりを見て考えましょう
aiはi句の始まりで
b(iー1)はiの最後から1句前のことです
ドイツ語で比べていますが英語を見ながら
iのインデックスを考えます
最後の句である“in Canada”は
この句に対応します
この句の初めは3番です 最後の句となるのは
これで“zur Konfernz”に対応しています
句の最後は7番です
つまり3から7なので歪みは4になります
並び替えモデルでは
これらの数を見て確率分布を出します
他の句の前後で
何の句が派生していても変わることはありません
単に位置がどう変わったかということだけを
示しているのです
1対1で翻訳する場合は
句の場所が変わることはありません
ドイツ語の文章で“nach Kanada”の前に“zur
Konferenz”があればそれをこのように英訳します
つまりb(i-1)は5となり
aiとここを交換するとaiも5となります
この場合の歪みは0です
ここに書いてある単語の言語に関していうと
ソースとターゲットの言語が
とても似ている言語の組み合わせです
歪みは0以下なので高い確率質量になります
他の並び替えに関しては低い確率になります
言語を変換する方法は数多くあり
組み合わせによって翻訳の種類も変わります
歪み0のように低くなったり大きな歪みで
高くなったりする確率質量が考えられます
これはとても単純なモデルで
句間の翻訳や区分や並び替えだけを考慮します
木や他の構成要素に基づく複雑なモデルも
考えられるかもしれません
これはモデルの翻訳部分です
最終的に選ぶためにすべての確率を掛けます
しかし派生した英語の文章の確率も考慮します
これは正しい英語の文章でしょうか?
そのための確率モデルです
バイリンガルモデルというより
モノリンガルモデルでよい翻訳プロセスになります
すべての可能な分類、翻訳、並び替えを通して
探索します
この確率にモノリンガルの確率を掛けると
最も高い値を取得することができます
これで満足のいく翻訳ができます
難しいのはこのたくさんある可能性から
すぐに表にできる探索技術を考え
そしてよい翻訳を選ぶことです