Departmental Papers (CIS)

Date of this Version

October 2005

Document Type

Conference Paper

Comments

Postprint version. Published in Lecture Notes in Computer Science, Volume 3651, Natural Language Processing (IJCNLP 2005), pages 553-564.
Publisher URL: http://dx.doi.org/10.1007/11562214_49

Abstract

Recognition of Multi-word Expressions (MWEs) and their relative compositionality are crucial to Natural Language Processing. Various statistical techniques have been proposed to recognize MWEs. In this paper, we integrate all the existing statistical features and investigate a range of classifiers for their suitability for recognizing the non-compositional Verb-Noun (V-N) collocations. In the task of ranking the V-N collocations based on their relative compositionality, we show that the correlation between the ranks computed by the classifier and human ranking is significantly better than the correlation between ranking of individual features and human ranking. We also show that the properties ‘Distributed frequency of object’ (as defined in [27] ) and ‘Nearest Mutual Information’ (as adapted from [18]) contribute greatly to the recognition of the non-compositional MWEs of the V-N type and to the ranking of the V-N collocations based on their relative compositionality.

Share

COinS
 

Date Posted: 14 April 2006