
Departmental Papers (CIS)
Title
Compression Without a Common Prior: An Information-Theoretic Justification for Ambiguity in Language
Date of this Version
1-2011
Document Type
Conference Paper
Recommended Citation
Brendan Juba, Adam T. Kalai, Sanjeev Khanna, and Madhu Sudan, "Compression Without a Common Prior: An Information-Theoretic Justification for Ambiguity in Language", . January 2011.
Abstract
Compression is a fundamental goal of both human language and digital communication, yet natural language is very different from compression schemes employed by modern computers. We partly explain this difference using the fact that information theory generally assumes a common prior probability distribution shared by the encoder and decoder, whereas human communication has to be robust to the fact that a speaker and listener may have different prior beliefs about what a speaker may say. We model this information-theoretically using the following question: what type of compression scheme would be effective when the encoder and decoder have (boundedly) different prior probability distributions. The resulting compression scheme resembles natural language to a far greater extent than existing digital communication protocols. We also use information theory to justify why ambiguity is necessary for the purpose of compression.
Date Posted: 25 July 2012
Comments
Juba, B., Kalai, A., Khanna, S., & Sudan, M., Compression Without a Common Prior: An Information-Theoretic Justification for Ambiguity in Language, Innovations in Computer Science, Jan. 2011, http://conference.itcs.tsinghua.edu.cn/ICS2011/content/papers/23.html