Machine Translation Project

Currently available commercial machine translation (MT) systems do not meet the requirements of a group of potential MT users whose MT needs are not fixed. The military is a typical example, but various types of civilian users also fall into this group of "ad-hoc MT users" (international relief work, medical work, development work, financial sector, economic advisory work, and so on). The requirements of this group of potential MT users can be characterized as follows in operational terms:

  • Much of the translation needed by ad-hoc MT users is domain-specific, e.g., battle scenario message traffic, training manuals, medical diagnosis routines, intelligence reports and briefing slides, domain-specific newswire, and so on.
  • Often, languages for which ad-hoc MT users need high-quality translation (such as Korean, Arabic, Serbo-Croatian, Ukranian, Somali, Haitian Creole, or Indonesian) are not viable languages for large-scale commercial development of MT.
  • Moreover, specific MT needs for ad-hoc MT users — specific with respect to domain and/or language — can arise at very short notice in response to crises or opportunities, which can be in any spot in the world and can arise with no warning.
  • Finally, unlike many corporate settings, MT must be available for many ad-hoc MT users on laptop PCs under rugged conditions.

Thus, what is needed for ad-hoc MT users is not simply a particular MT system for a particular language and a particular domain, or even a suite of such tools. Instead, what is needed is an integrated approach to machine translation that includes a collection of components, tools and resources, which together meet the requirements outlined above. More specifically, what is needed is a combination of the following:

  • Available broad-coverage, acceptable-quality, cross-platform MT systems for certain key languages.
  • Available domain-specific, high-quality, cross-platform MT systems for certain key domains and languages.
  • The ability to quickly assemble cross-platform MT systems for new languages and/or new limited domains, exploiting existing (legacy) resources as much as possible and using advanced tools to create new resources where needed.

During this project, CoGenTex, Inc. and its subcontractors, the University of Pennsylvania and Systran Software, Inc., propose to develop the above components, tools, resources, and a methodology to use them.

In Phase I, we developed a modular framework with a "plug-and-play" architecture for assembling MT systems from off-the-shelf components. The core of the system is a lexicalized, syntax-oriented transfer component. The definition of the level of transfer also provides the interface definition for other software components, such as parsers and generators. We used two different parsers from the University of Pennsylvania, and the RealPro generator from CoGenTex.

In Phase II, we will add to this framework resources and specific functionality not currently present, and improve currently available resources and functionality.

Two types of results will issue from the Phase II effort.

  • We will produce an extensible plug-and-play translation framework which will come with trainable tools for transfer lexicon extraction, and a choice of parsers and a modifiable generation shell.
  • We will produce an operational prototype MT system for high-quality translation in the battlefield message domain, as well as an operational prototype broad-coverage MT system for acceptable quality. Both systems will be for the language pairs Korean-to-English and English-to-Korean.
  • Subcomponents of these two systems will include stand-alone English and Korean parsers and generators, as well as associated lexicons and the bilingual transfer lexicon.

Because of the modular "plug-and-play" architecture of our framework as developed in Phase I, each of the tasks can be worked on independently, and the results can be easily integrated into the framework. The system can easily be upgraded when new and higher-quality components become available. All implementation work will be done in C, C++, and/or Java, assuring cross-platform compatibility. The principal target platform will be the PC. This approach will enable us to achieve:

  • Good-quality broad coverage MT, by exploiting the newest natural language processing technology.
  • High-quality domain-specific MT.
  • Rapid development of new MT systems for new languages and/or domains.


  • Nasr, Alexis; Rambow, Owen; Palmer, Martha; and Rosenzweig, Joseph (1997). Enriching Lexical Transfer with Cross-Linguistic Semantic Features, or How to Do Interlingua Without Interlingua. In Proceedings of the Interlingua Workshop at the MT Summit, San Diego, CA. [Acrobat, 271 Kb] [PostScript, 163 Kb]
  • Palmer, Martha; Rambow, Owen; and Nasr, Alexis (1998). Rapid Prototyping of Domain-Specific Machine Translation Systems. In Machine Translation and the Information Soup - Proceedings of the Third Conference of the Association for Machine Translation in the Americas (AMTA '98), Springer Verlag (Lecture Notes in Artificial Intelligence No. 1529), Berlin. [Acrobat, 199 Kb] [PostScript, 833 Kb]
  • Han, Chung-hye; Lavoie, Benoit; Palmer, Martha; Rambow, Owen; Kittredge, Richard; Korelsky, Tanya; Kim, Nari; and Kim, Myunghee (2000). Handling Structural Divergences and Recovering Dropped Arguments in a Korean-English Machine Translation System. In Proceedings of the Fourth Conference of the Association for Machine Translation in the Americas (AMTA 2000), MisiĆ³n Del Sol, Mexico. [Acrobat, 204 Kb] [PostScript, 789 Kb]
  • Lavoie, Benoit; Kittredge, Richard; Korelsky, Tanya; and Rambow, Owen (2000). A Framework for MT and Multilingual NLG Systems Based on Uniform Lexico-Structural Processing. In Proceedings of ANLP/NAACL 2000, Seattle, Washington. [Acrobat, 66 Kb] [PostScript, 313 Kb]
  • Lavoie, Benoit; White, Michael; and Korelsky, Tanya (2001). Inducing Lexico-Structural Transfer Rules from Parsed Bi-texts. Proceedings of ACL 2001 Workshop on Data-driven Machine Translation, Toulouse, France, pp. 17-24. [Acrobat, 56 Kb]
  • Lavoie, Benoit; White, Michael; and Korelsky, Tanya (2002). Learning Domain-Specific Transfer Rules: An Experiment with Korean to English Translation. In Proceedings of the COLING 2002 Workshop on Machine Translation in Asia, Taipei, Taiwan, pp. 60-66. [Acrobat, 35 Kb]