System Optimization for 3D-Stacking Memories
Degree type
Graduate group
Discipline
Subject
Funder
Grant number
License
Copyright date
Distributor
Related resources
Author
Contributor
Abstract
System optimization that is aware of specific application needs can take advantage of the channel-level parallelism present in modern 3D-memory like HMC and HBM. This can improve the performance of both domain-specific accelerators (DSAs) and universal computer systems, achieving performance improvements ranging from 2$\times$ to 8$\times$ to eightfold as opposed to systems that are oblivious to specific application requirements. By ensuring concurrent optimization of the algorithm and hardware framework, and utilizing algorithms and hardware allocation conscious of vertex degree, the suggested graph accelerator can realize an 8$\times$ enhancement. This makes it achievable to reach a capacity of 45.8 billion traversed edges per second with a 28nm FPGA by Intel/Altera along with a 160 Gbps HMC of 4GB. When application and memory subsystem are co-optimized, encompassing the MMU and address mapping, a twofold improvement in the performance of a standard computer system can be attained, relative to a system devoid of application awareness with an RISC-V prototype based on FPGA.