Accountability in distributed systems with incomplete information

Mingchen Zhao, University of Pennsylvania


Distributed systems play a critical role in people's daily lives. They provide functions such as routing telephone calls, processing stock market transactions, and recording medical information. However, nodes in distributed systems can fail for many reasons: for instance, they can suffer a software or hardware failure, an operator can configure or operate nodes incorrectly, and an attacker can compromise nodes. These faults can stop the distributed system from working correctly, and they may lead to serious problems. Because of this, there is a need for efficient and effective ways to handle such faults. Accountability is an attractive way to detect faults in distributed systems. It works by detecting any incorrect behaviors of the node. More specifically, it can handle the most general class of faults called "Byzantine faults". Other faults such as crash faults, rational faults are special cases of it. When faults are detected, accountability can provide irrefutable evidence to prove it. Moreover, it offers a provable guarantee that any faults that affect a correct node can be detected. However, previous accountability techniques are difficult to apply to distributed systems with incomplete information. In this dissertation, we study two specific instances of this problem: 1) cases where some nodes work with confidential data, and 2) cases that the actions of a node cannot be easily observed. An example of the former is the Border Gateway Protocol (BGP) information exchanged between neighboring ISPs may allow inferences about confidential business contracts. An example of the later is a peer-assisted Content Distribution Network (CDN) client may be easily modified to submit fake downloading reports. Thus, new approaches are expected to provide accountability in these situations. My work deals with this situation by extending accountability in two ways: First, it proposes to replace the original approach to log checking (deterministic replay) with zero-knowledge techniques, i.e, it no longer tries to reconstruct a particular execution, but rather checks the property that "actually matters" for the purposes of auditing, that is, it checks whether there exists a correct execution that explains all the I/O. And second, it proposes to check potentially unreliable information with a combination of a) heuristics to find suspicious actions, and b) a safe, low-cost countermeasure that can be applied to suspicious nodes without adverse consequences, so it can be used even when the degree of confidence is low. In principle, the two solutions are orthogonal and could be combined in the same system. The thesis presents two separate solutions for the rest contribution, one application specific one that is highly efficient, and another that is general but more expensive. It shows the second contribution in the context of a concrete system, Akamai NetSession, but the approach generalizes to other system (as long as one can find other heuristics and another “safe” countermeasures). We apply the above techniques to BGP and to Akamai NetSession. Evaluations with real world data show that our system can provide accountability with incomplete information, and that the computation, bandwidth, and storage costs are reasonable.

Subject Area

Computer Engineering|Computer science

Recommended Citation

Zhao, Mingchen, "Accountability in distributed systems with incomplete information" (2016). Dissertations available from ProQuest. AAI10190474.