Natural Language Processing: Using Machine Translation in Creation of a German/English Translator Jason Ji, 2004-2005 ABSTRACT - The field of machine translation - using computers to provide translations between human languages has been around for decades. And the dream of an ideal machine providing a perfect translation between languages has been around still longer. This project attempts to take the beginning steps towards that goal, creating a translator program that operates within an extremely limited scope to translate between English and German. There are several different strategies to machine translation, and this project will look into them - but the strategy taken to this project will be the researcher's own, with the general guideline of "thinking as a human." For if humans can translate between language, there must be something to how we do it, and hopefully that something - that thought process, hopefully - can be transferred to the machine and provide quality translations. BACKGROUND ­ There are several methods of varying difficulty and success to machine translation. The best method to use depends on what sort of system is being created. A bilingual system translates between one pair of languages; a multilingual system translates between more than two systems. The easiest translation method to code, yet probably least successful, is known as the direct approach. The direct approach does what it sounds like it does - takes the input language (known as the "source language"), performs morphological analysis - whereby words are broken down and analyzed for things such as prefixes and past tense endings, performs a bilingual dictionary look-up to determine the words' meanings in the target language, performs a local reordering to fit the grammar structure of the target language, and produces the target language output. The problem with this approach is that it is essentially a word-for-word translation with some reordering, resulting often in mistranslations and incorrect grammar structures. Furthermore, when creating a multilingual system, the direct approach would require several different translation algorithms - one or two for each language pair. The indirect approach involves some sort of intermediate representation of the source language before translating into the target language. In this way, linguistic analysis of the source language can be performed on the intermediate representation. The two main variants of the indirect approach are interlingua and transfer. The interlingua approach involves translating the source language into an intermediate language or representation that is not language dependent, and then translating into the target language without "looking back" at the source. Translating to the intermediary also enables semantic analysis, as the source language input can be more carefully to detect idioms, etc, which can be stored in the intermediary and then appropriately used to translate into the target language. The transfer method is similar, except that the transfer is language dependent - that is to say, the French-English intermediary transfer would be different from the English-German transfer. An interlingua intermediary can be used for multilingual systems. picture of ....... something goes here Results here, none yet Analysis - The first problem to deal with in creating a machine translator is to be able to recognize the words that are inputted into the system. A sentence or multiple sentences are input into the translator, and a string consisting of that entire sentence (or sentences) is passed to the translate() function. The system loops through the string, finding all space (' ') characters and punctuation characters (comma, period, etc) and records their positions. (It is important to note the position of each punctuation mark, as well as what kind of a punctuation mark it is, because the existence and position of punctuation marks alter the meaning of a sentence.) The number of words in the sentence is determined to be the number of spaces plus one. By recording the position of each space, the string can then be broken up into the words. The start position of each word is the position of each space, plus one, and the end position is the position of the next space. This means that punctuation at the end of any given word is placed into the String with that word, but this is not a problem: the location of each punctuation mark is already recorded, and the dictionary look-up of each word will first check to ensure that the last character of each word is a letter; if not, it will simply disregard the last character. The next problem is the biggest problem of all, the problem of actual translation itself. Here there is no code yet written, but development of pseudocode has begun already. As previously mentioned, translation is a process. In order to write a translator program that follows the human translation process, the human process must first be recognized and broken down into programmable steps. This is no easy task. Humans with five years of experience in learning a language may already translate any given text quickly enough, save time to look up unfamiliar words, that the process goes by too quickly to fully take note of. The basic process is not entirely determined yet, but there is some progress on it. The process to determine the process has been as followed: given a random sentence to translate, the sentence is first translated by a human, then the process is noted. Each sentence given has ever-increasing difficulty to translate. For example: the sentence, "I ate an apple," is translated via the following process: 1) Find the subject and the verb. (I; ate) 2) Determine the tense and form of the verb. (ate = past, imperfekt form) a) Translate subject and verb. (Ich; ass) (note - "ass" is a real German verb form.) 3) Determine what the verb requires. (ate -> eat; requires a direct object) 4) Find what the verb requires in the sentence. (direct object comes after verb and article; apple) 5) Translate the article and the direct object. (ein; Apfel) 6) Consider the gender of the direct object, change article if necessary. (der Apfel; ein -> einen) Ich ass einen Apfel. References 1. http://dict.leo.org (dictionary) 2. "An Introduction To Machine Translation" (available online at http://ourworld.compuserve.com/homepages/ WJHutchins/IntroMT-TOC.htm) 3. http://www.comp.leeds.ac.uk/ugadmit/cogsci/ spchlan/machtran.htm (some info on machine translation)