Accelerating FPGA Developments from C to Bitstreams by Partial Reconfiguration
Degree type
Graduate group
Discipline
Subject
FPGA
Latency Insensitive
Partial Reconfiguration
Streaming
Funder
Grant number
License
Copyright date
Distributor
Related resources
Author
Contributor
Abstract
Divide-and-Conquer and incremental compilation strategies are widely used in software compilations. The divide-and-Conquer means that separate source files are compiled independently by multi-threads to objectives, which are linked together to an executable-format file, while incremental compilation means that software tools only need to re-compile modified source files and quickly re-link the objectives. To enable these strategies for FPGAs, this dissertation presents an open-source framework called PRflow which can speed up the compilation times by an order of magnitude. PRflow supports different optimization levels to make better trade-offs among compile-time, area, and performance. -O0 (PRflow RISCV) maps applications to a cluster of on-chip RISC-V cores within seconds for quick verification and debugging. -O1 (PRflow) compiles the separate parts of an application to partial FPGA bitstreams for different partial reconfigurable regions on the chip. Separate parts can be compiled in parallel within 24 minutes. The interconnections between separate parts can be set up by sending configuration packets to configure a network-on-a-chip (NoC) without re-routing physical wires. -O2 (PRflow DW) supports inter-connection customization with a fixed page-size overlay on top of a commercial FPGA to meet high inter-page bandwidth requirements which can improve the performance by up to 10× compared with -O1. -O3 (PRflow HiPR) supports overlay customization for arbitrary inter-page throughput and various page size requirements with similar incremental compile time to -O1 and -O2. HiPR extracts the interconnect information among separate sub-functions and generates a customized overlay with PR regions defined. Users can perform quick incremental compilation for dedicated sub-functions at the cost of an acceptable one-time overlay compilation overhead. -O3 compiles applications with the most aggressive optimization strategies similar to commercial tools.We demonstrate the PRflow framework on the Xilinx Alveo-U50 data-center card with an xcu50-fsvh2104-2-e FPGA chip (16nm FinFET) by mapping Rosetta HLS complete benchmark set. PRflow can accelerate the compilation times from 2–3 hours (state-of-art Vitis) to 10-24 minutes. We expect PRflow based on PR technique to become an important compilation strategy as the increasing scales of FPGAs greatly slow down the compile times.