Lightweight Privacy for and from the Masses
Degree type
Graduate group
Discipline
Subject
Privacy
Private information retrieval
Secure computation
Funder
Grant number
License
Copyright date
Distributor
Related resources
Author
Contributor
Abstract
Online services today rely on a massive amount of user data. Yet, the data that users supply to orfetch from the services expose their personal information, which often in practice leads to privacy failures. In this dissertation, we design protocols and build systems that allow users to supply or fetch data without putting their privacy at risk. While this is achievable in theory with general purpose cryptography tools, applying them at the scale of today’s applications often serving millions of users—is prohibitively expensive. Our insight is that the large user base can be leveraged to get lightweight privacy, although it is often seen as a performance bottleneck. We consider two types of problems under the model of a central “powerful” server and many “weak”clients: • How does the server aggregate (or more advanced, train machine learning models on) private data of clients without learning any individual client’s data? Here, the clients “push” private data to the server. • How do the clients fetch data from a public database at the server while completely hiding from the server which data they want to fetch? Here, the clients privately “pull” data from the server. For the first problem, we designed and built two systems: 1) Flamingo, a secure aggregation systemfor high-dimensional vector inputs that can be used to train neural networks on private data across hundreds of thousands of clients. 2) Armadillo, a system that shares Flamingo's properties but additionally offers disruption resistance against adversarial clients. The key design principles underlying both systems are distributing the trust among the large number of clients and leveraging them to assist with secure computation. The main challenge we addressed was making clients lightweight enough to run on weak devices. The second problem is closely related to a classical cryptography problem called Private Information Retrieval (PIR). We consider PIR under “the shuffle model”, where queries can be made anonymously by many clients. Under this model, we give the first single-server PIR with information-theoretic security and sublinear communication per query.
Advisor
Rabin, Tal