January 9, 2003
Can one of humankind's newest tools help overcome one of its oldest problems?
Violetta Cavalli-Sforza hopes so. She is using advanced computer science to improve translation between English and Arabic, an effort she hopes will bring two cultures closer together and improve cross-cultural understanding.
Cavalli-Sforza, an assistant professor of computer science at SFSU and a self-described "nomad," is a firm believer in sharing ideas and technologies across cultures. Born in Milan, Italy, she has added English, French, Spanish and Arabic to the list of languages she speaks. While she has lived and worked in the United States for nearly 30 years, she places great value on travel and cross-cultural exchanges and will spend part of the spring and summer 2003 semesters traveling to North Africa. This will be neither her first nor her last visit to that part of the world. She has been to Egypt, Tunisia, and Morocco several times in the last few years, and plans to return there again.
With funding from the National Science Foundation and a Fulbright Fellowship, she'll work with Moroccan colleagues to improve computerized translations between Arabic and French, and in the process identify patterns that will lead to the best English/Arabic machine translations. She will also give a course on natural language processing techniques.
Translating between languages as different as Arabic and English is complex, for humans as well as machines. The best translations aren't simple word-for-word substitutions, but go beyond surface structure to analyze the core meaning and translate the concepts into the other language. Implementing this "knowledge-based" translation process requires tremendous effort to give the computer the knowledge it needs to translate correctly -- an expensive and time-consuming endeavor. Computers are still quite far from performing as well as human translators in most types of translation, but, Cavalli-Sforza says, If a human translator can go over the output of a machine translation system and make just a few corrections, his or her productivity can be enhanced by having that tool available."
"Probably the main reason why translation is difficult is that, in any language, the same word can have different meanings depending on context," Cavalli-Sforza says. By using data in different languages, computational linguists capture word choices or expressions that are very context-dependent, and build a database of translation phrases.
"We look at parallel data -- the same text in two languages -- containing sentence-by-sentence translations," Cavalli-Sforza explains. "Put simply, we use bilingual dictionaries and special algorithms, including some statistical techniques, to match-up corresponding words and phrases of the sentences in the two languages. The computer remembers these matches and, when presented with a new sentence, retrieves the matches and pieces them together to produce a translation for the new sentence."
This "empirical" approach to machine translation is becoming more popular because it requires less human effort and can produce a working system in less time. In addition, the technology and data resources needed to develop it are constantly improving. The resulting systems can have a level of performance that approaches that of "knowledge-based" systems for significantly less cost.
While developing a prototype for Arabic/French machine translation, Cavalli-Sforza hopes to make inroads into Arabic/English translation difficulties. She will analyze structural similarities and differences and gain insight into which machine translation approaches may be most productive.
Cavalli-Sforza will also make a side trip to Algeria to work with a young investigator on using speech recognition software to teach Arabic to foreigners. The idea grew out of Cavalli-Sforza's own difficulty in learning to speak Arabic, when she realized it would be helpful to have a patient listener tell her when her pronunciation was incorrect while reading aloud. The seed funding for this collaboration comes from the Women in International Scientific Cooperation (WISC) program, which is funded by the National Science Foundation and is administered through the American Association for the Advancement of Science. WISC provides travel and living support for a U.S. scientist to visit a partner country in order to establish a new research partnership where at least one of the partners is a woman.
For a variety of reasons, research is not a strong part of academic culture in all countries, Cavalli-Sforza says, so international exchanges are helpful for transferring new technologies and ideas that ultimately lead to improvements in education, commercial products or services. And, there's the added benefit of working on communications software that helps people to better understand one another.
"If I can do that with my work, then I will be thrilled," Cavalli-Sforza says. She adds, "It would be a big mistake to think that, in these exchanges, information flows only one way. Every time I have an opportunity to spend time interacting with a significantly different culture, I find much to learn and to admire, while at the same time gaining a new appreciation for my own culture."
1600 Holloway Avenue, San Francisco, CA 94132 (415) 338-1111