Reno: A rename-based instruction optimizer
RENO is a modified register renaming mechanism that performs optimizations on the dynamic instruction stream. RENO uses map table manipulations to implement the dynamic counterparts of several well-known static optimizations. RENO examines the dynamic instructions as they flow through rename and optimizes some of them. An optimized instruction must be indistinguishable from an executed one. Before optimizing an instruction, RENO ensures that the value it would compute is already present in the physical register file. RENO then maps the output of the optimized instruction to that register. Because the map table points to the right value, the optimized instruction can safely bypass out-of-order execution. RENO combines four optimizations into a unified framework. Move elimination optimizes moves. Common sub-expression elimination optimizes redundant operations. Register allocation optimizes stack-pointer loads. Finally, constant propagation optimizes add-immediate instructions. Although it may seem superfluous to perform these optimizations in hardware, their static counterparts are inherently limited by: (a) separate, file-level compilation, (b) conservative information about memory dependencies, (c) inability to use resources that are not visible at the architectural level, and (d) the requirement that any transformation be correct along all possible paths. The dynamic RENO versions: (a) ignore compilation boundaries, (b) can optimize speculatively, (c) can access micro-architectural resources, and (d) need to ensure correctness along the current dynamic path only. Consequently, RENO is capable of optimizing an average 22% dynamic instructions from highly-optimized MediaBench and SPEC2000 integer programs. Despite this, RENO is a complement rather than replacement for static optimizations because RENO-optimized instructions still have to fetch and commit—statically-optimized instructions bypass the entire pipeline. Removing instructions from the out-of-order execution stream improves processor performance via execution latency reduction and out-of-order bandwidth and capacity amplification. For a balanced 4-wide/128-instruction window pipeline, these effects convert a 22% optimization rate into average speedups of 8.3%/11.6% (SPEC2000 integer/MediaBench). In addition, RENO's out-of-order bandwidth and capacity amplification effects enables a new class of designs, which couple wider in-order front-end and back-end with a narrower out-of-order core. Because the out-of-order core is difficult to scale, these designs deliver performance in a more complexity-effective way.
Petric, Vlad, "Reno: A rename-based instruction optimizer" (2007). Dissertations available from ProQuest. AAI3260967.