Departmental Papers (CIS)

Date of this Version

December 1998

Document Type

Working Paper


Final Project Report under NSA grant MDA904-97-C-3055, December 1998.


The process of constructing translation lexicons from parallel texts (bitexts) can be broken down into three stages: mapping bitext correspondence, counting co-occurrences, and estimating a translation model. State-of-the-art techniques for accomplishing each stage of the process had already been developed, but only for bitexts involving fairly similar languages. Correct and efficient implementation of each stage poses special challenges when the parallel texts involve two very different languages. This report describes our theoretical and empirical investigations into how existing techniques might be extended and applied to Chinese/English bitexts.



Date Posted: 31 July 2008