AI-DRIVEN ADAPTIVE DISTRIBUTED SYSTEMS IN UNTRUSTED ENVIRONMENTS
Degree type
Graduate group
Discipline
Subject
Blockchains
Distributed Systems
Funder
Grant number
License
Copyright date
Distributor
Related resources
Author
Contributor
Abstract
The rapid rise of blockchains and “Web3” has sparked significant interest in designing and implementing distributed systems for untrusted environments. However, due to their rigid one-size-fits-all design, current systems fail to adapt to dynamic workloads, new hardware setup, and different user intentions. To address this problem, we examine the systems stack in a top-down layered manner, from transaction management, Byzantine fault-tolerant (BFT) consensus to their underlying infrastructure. We propose i) AdaChain, the first learned system that adaptively manages blockchain transactions, ii) BFTBrain, a multi-protocol BFT consensus engine that dynamically adjusts itself using reinforcement learning, and iii) FlexChain, a flexible disaggregated infrastructure tailored for permissioned blockchains. AdaChain addresses the challenge in transaction execution under dynamic workloads. As a system driven by reinforcement learning, AdaChain adaptively selects the optimal transactional architecture for dynamic workloads, maximizing throughput. AdaChain securely switches architectures at runtime when a better option is identified. Experiments show AdaChain quickly converges to optimal setups, significantly outperforming fixed architectures in committed transactions with minimal overhead. BFTBrain is a practical reinforcement learning based BFT consensus that provides significant operational benefits: a plug-and-play system suitable for a broad set of hardware and network configurations, and adjusts effectively in real-time to changing fault scenarios and workloads. BFTBrain’s key innovations include systematic performance modeling to guide protocol selection and decentralized learning resilient to adversarial data pollution. As a result, in addition to providing significant operational benefits, BFTBrain improves throughput over fixed protocols by 18% to 119% under dynamic conditions and outperforms state-of-the-art learning based approaches by 44% to 154%. FlexChain is a novel infrastructure layer for permissioned blockchains that physically disaggregates CPUs, DRAM, and storage devices to process different blockchain workloads efficiently. Disaggregation enables independent scaling of hardware resources, improving utilization and preventing fragmentation. Our evaluation results show that FlexChain can provide independent compute and memory scalability, while incurring at most 12.8% disaggregation overhead. FlexChain achieves almost identical throughput as the state-of-the-art distributed approaches with significantly lower memory and CPU consumption for compute-intensive and memory-intensive workloads respectively.