AI-DRIVEN ADAPTIVE DISTRIBUTED SYSTEMS IN UNTRUSTED ENVIRONMENTS

Loading...
Thumbnail Image
Degree type
Doctor of Philosophy (PhD)
Graduate group
Computer and Information Science
Discipline
Computer Sciences
Subject
AI
Blockchains
Distributed Systems
Funder
Grant number
License
Copyright date
2024
Distributor
Related resources
Author
Wu, Chenyuan
Contributor
Abstract

The rapid rise of blockchains and “Web3” has sparked significant interest in designing and implementing distributed systems for untrusted environments. However, due to their rigid one-size-fits-all design, current systems fail to adapt to dynamic workloads, new hardware setup, and different user intentions. To address this problem, we examine the systems stack in a top-down layered manner, from transaction management, Byzantine fault-tolerant (BFT) consensus to their underlying infrastructure. We propose i) AdaChain, the first learned system that adaptively manages blockchain transactions, ii) BFTBrain, a multi-protocol BFT consensus engine that dynamically adjusts itself using reinforcement learning, and iii) FlexChain, a flexible disaggregated infrastructure tailored for permissioned blockchains. AdaChain addresses the challenge in transaction execution under dynamic workloads. As a system driven by reinforcement learning, AdaChain adaptively selects the optimal transactional architecture for dynamic workloads, maximizing throughput. AdaChain securely switches architectures at runtime when a better option is identified. Experiments show AdaChain quickly converges to optimal setups, significantly outperforming fixed architectures in committed transactions with minimal overhead. BFTBrain is a practical reinforcement learning based BFT consensus that provides significant operational benefits: a plug-and-play system suitable for a broad set of hardware and network configurations, and adjusts effectively in real-time to changing fault scenarios and workloads. BFTBrain’s key innovations include systematic performance modeling to guide protocol selection and decentralized learning resilient to adversarial data pollution. As a result, in addition to providing significant operational benefits, BFTBrain improves throughput over fixed protocols by 18% to 119% under dynamic conditions and outperforms state-of-the-art learning based approaches by 44% to 154%. FlexChain is a novel infrastructure layer for permissioned blockchains that physically disaggregates CPUs, DRAM, and storage devices to process different blockchain workloads efficiently. Disaggregation enables independent scaling of hardware resources, improving utilization and preventing fragmentation. Our evaluation results show that FlexChain can provide independent compute and memory scalability, while incurring at most 12.8% disaggregation overhead. FlexChain achieves almost identical throughput as the state-of-the-art distributed approaches with significantly lower memory and CPU consumption for compute-intensive and memory-intensive workloads respectively.

Advisor
Loo, Boon Thau, BL
Date of degree
2024
Date Range for Data Collection (Start Date)
Date Range for Data Collection (End Date)
Digital Object Identifier
Series name and number
Volume number
Issue number
Publisher
Publisher DOI
Journal Issue
Comments
Recommended citation