Gap Between Theory and Practice: Noise Sensitive Word Alignment in Machine Translation

Tsuyoshi Okita; Yvette Graham; Andy Way

Gap Between Theory and Practice: Noise Sensitive Word Alignment in Machine Translation

Tsuyoshi Okita, Yvette Graham, Andy Way

Proceedings of the First Workshop on Applications of Pattern Analysis, PMLR 11:119-126, 2010.

Abstract

Word alignment is to estimate a lexical translation probability \emphp(\emphe|\emphf), or to estimate the correspondence \emphg(\emphe,\emphf) where a function \emphg outputs either 0 or 1, between a source word \emphf and a target word \emphe for given bilingual sentences. In practice, this formulation does not consider the existence of ’noise’ (or outlier) which may cause problems depending on the corpus. \emphN-to-\emphm mapping objects, such as paraphrases, non-literal translations, and multi-word expressions, may appear as both noise and also as valid training data. From this perspective, this paper tries to answer the following two questions: 1) how to detect stable patterns where noise seems legitimate, and 2) how to reduce such noise, where applicable, by supplying extra information as prior knowledge to a word aligner.

Cite this Paper

BibTeX


@InProceedings{pmlr-v11-okita10a,
  title = 	 {Gap Between Theory and Practice: Noise Sensitive Word Alignment in Machine Translation},
  author = 	 {Okita, Tsuyoshi and Graham, Yvette and Way, Andy},
  booktitle = 	 {Proceedings of the First Workshop on Applications of Pattern Analysis},
  pages = 	 {119--126},
  year = 	 {2010},
  editor = 	 {Diethe, Tom and Cristianini, Nello and Shawe-Taylor, John},
  volume = 	 {11},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Cumberland Lodge, Windsor, UK},
  month = 	 {01--03 Sep},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v11/okita10a/okita10a.pdf},
  url = 	 {https://proceedings.mlr.press/v11/okita10a.html},
  abstract = 	 {Word alignment is to estimate a lexical translation probability \emphp(\emphe|\emphf), or to estimate the correspondence \emphg(\emphe,\emphf) where a function \emphg outputs either 0 or 1, between a source word \emphf and a target word \emphe for given bilingual sentences. In practice, this formulation does not consider the existence of ’noise’ (or outlier) which may cause problems depending on the corpus. \emphN-to-\emphm mapping objects, such as paraphrases, non-literal translations, and multi-word expressions, may appear as both noise and also as valid training data. From this perspective, this paper tries to answer the following two questions: 1) how to detect stable patterns where noise seems legitimate, and 2) how to reduce such noise, where applicable, by supplying extra information as prior knowledge to a word aligner.}
}

Endnote

%0 Conference Paper
%T Gap Between Theory and Practice: Noise Sensitive Word Alignment in Machine Translation
%A Tsuyoshi Okita
%A Yvette Graham
%A Andy Way
%B Proceedings of the First Workshop on Applications of Pattern Analysis
%C Proceedings of Machine Learning Research
%D 2010
%E Tom Diethe
%E Nello Cristianini
%E John Shawe-Taylor	
%F pmlr-v11-okita10a
%I PMLR
%P 119--126
%U https://proceedings.mlr.press/v11/okita10a.html
%V 11
%X Word alignment is to estimate a lexical translation probability \emphp(\emphe|\emphf), or to estimate the correspondence \emphg(\emphe,\emphf) where a function \emphg outputs either 0 or 1, between a source word \emphf and a target word \emphe for given bilingual sentences. In practice, this formulation does not consider the existence of ’noise’ (or outlier) which may cause problems depending on the corpus. \emphN-to-\emphm mapping objects, such as paraphrases, non-literal translations, and multi-word expressions, may appear as both noise and also as valid training data. From this perspective, this paper tries to answer the following two questions: 1) how to detect stable patterns where noise seems legitimate, and 2) how to reduce such noise, where applicable, by supplying extra information as prior knowledge to a word aligner.

RIS


TY  - CPAPER
TI  - Gap Between Theory and Practice: Noise Sensitive Word Alignment in Machine Translation
AU  - Tsuyoshi Okita
AU  - Yvette Graham
AU  - Andy Way
BT  - Proceedings of the First Workshop on Applications of Pattern Analysis
DA  - 2010/09/30
ED  - Tom Diethe
ED  - Nello Cristianini
ED  - John Shawe-Taylor	
ID  - pmlr-v11-okita10a
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 11
SP  - 119
EP  - 126
L1  - http://proceedings.mlr.press/v11/okita10a/okita10a.pdf
UR  - https://proceedings.mlr.press/v11/okita10a.html
AB  - Word alignment is to estimate a lexical translation probability \emphp(\emphe|\emphf), or to estimate the correspondence \emphg(\emphe,\emphf) where a function \emphg outputs either 0 or 1, between a source word \emphf and a target word \emphe for given bilingual sentences. In practice, this formulation does not consider the existence of ’noise’ (or outlier) which may cause problems depending on the corpus. \emphN-to-\emphm mapping objects, such as paraphrases, non-literal translations, and multi-word expressions, may appear as both noise and also as valid training data. From this perspective, this paper tries to answer the following two questions: 1) how to detect stable patterns where noise seems legitimate, and 2) how to reduce such noise, where applicable, by supplying extra information as prior knowledge to a word aligner.
ER  -

APA


Okita, T., Graham, Y. & Way, A.. (2010). Gap Between Theory and Practice: Noise Sensitive Word Alignment in Machine Translation. Proceedings of the First Workshop on Applications of Pattern Analysis, in Proceedings of Machine Learning Research 11:119-126 Available from https://proceedings.mlr.press/v11/okita10a.html.

Related Material

Download PDF