Lightweight Privacy for and from the Masses

Ma, Yiping

Lightweight Privacy for and from the Masses

Files

Ma_upenngdas_0175C_16984.pdf (1.72 MB)

Degree type

Doctor of Philosophy (PhD)

Graduate group

Computer and Information Science

Discipline

Computer Sciences

Subject

Federated learning
Privacy
Private information retrieval
Secure computation

Copyright date

2025

Permalink

https://repository.upenn.edu/handle/20.500.14332/61340

View all metadata

Author

Ma, Yiping

Abstract

Online services today rely on a massive amount of user data. Yet, the data that users supply to orfetch from the services expose their personal information, which often in practice leads to privacy failures. In this dissertation, we design protocols and build systems that allow users to supply or fetch data without putting their privacy at risk. While this is achievable in theory with general purpose cryptography tools, applying them at the scale of today’s applications often serving millions of users—is prohibitively expensive. Our insight is that the large user base can be leveraged to get lightweight privacy, although it is often seen as a performance bottleneck. We consider two types of problems under the model of a central “powerful” server and many “weak”clients: • How does the server aggregate (or more advanced, train machine learning models on) private data of clients without learning any individual client’s data? Here, the clients “push” private data to the server. • How do the clients fetch data from a public database at the server while completely hiding from the server which data they want to fetch? Here, the clients privately “pull” data from the server. For the first problem, we designed and built two systems: 1) Flamingo, a secure aggregation systemfor high-dimensional vector inputs that can be used to train neural networks on private data across hundreds of thousands of clients. 2) Armadillo, a system that shares Flamingo's properties but additionally offers disruption resistance against adversarial clients. The key design principles underlying both systems are distributing the trust among the large number of clients and leveraging them to assist with secure computation. The main challenge we addressed was making clients lightweight enough to run on weak devices. The second problem is closely related to a classical cryptography problem called Private Information Retrieval (PIR). We consider PIR under “the shuffle model”, where queries can be made anonymously by many clients. Under this model, we give the first single-server PIR with information-theoretic security and sublinear communication per query.

Advisor

Angel, Sebastian
Rabin, Tal

Date of degree

2025

Collection

Dissertations and Theses