Compression Without a Common Prior: An Information-Theoretic Justification for Ambiguity in Language

Loading...
Thumbnail Image
Penn collection
Departmental Papers (CIS)
Degree type
Discipline
Subject
Computer Sciences
Funder
Grant number
License
Copyright date
Distributor
Related resources
Author
Juba, Brendan
Kalai, Adam T.
Sudan, Madhu
Contributor
Abstract

Compression is a fundamental goal of both human language and digital communication, yet natural language is very different from compression schemes employed by modern computers. We partly explain this difference using the fact that information theory generally assumes a common prior probability distribution shared by the encoder and decoder, whereas human communication has to be robust to the fact that a speaker and listener may have different prior beliefs about what a speaker may say. We model this information-theoretically using the following question: what type of compression scheme would be effective when the encoder and decoder have (boundedly) different prior probability distributions. The resulting compression scheme resembles natural language to a far greater extent than existing digital communication protocols. We also use information theory to justify why ambiguity is necessary for the purpose of compression.

Advisor
Date of presentation
2011-01-01
Conference name
Departmental Papers (CIS)
Conference dates
2023-05-17T07:14:44.000
Conference location
Date Range for Data Collection (Start Date)
Date Range for Data Collection (End Date)
Digital Object Identifier
Series name and number
Volume number
Issue number
Publisher
Publisher DOI
Journal Issue
Comments
Juba, B., Kalai, A., Khanna, S., & Sudan, M., Compression Without a Common Prior: An Information-Theoretic Justification for Ambiguity in Language, Innovations in Computer Science, Jan. 2011, http://conference.itcs.tsinghua.edu.cn/ICS2011/content/papers/23.html
Recommended citation
Collection