Energy efficient load latency tolerance: Single-thread performance for the multi-core era

Andrew D Hilton, University of Pennsylvania

Abstract

Around 2003, newly activated power constraints caused single-thread performance growth to slow dramatically. The multi-core era was born with an emphasis on explicitly parallel software. Continuing to grow single-thread performance is still important in the multi-core context, but it must be done in an energy efficient way. One significant impediment to performance growth in both out-of-order and in-order processors is the long latency of last-level cache misses. Prior work introduced the idea of load latency tolerance—the ability to dynamically remove miss-dependent instructions from critical execution structures, continue execution under the miss, and re-execute miss-dependent instructions after the miss returns. However, previously proposed designs were unable to improve performance in an energy-efficient way—they introduced too many new large, complex structures and re-executed too many instructions. This dissertation describes a new load latency tolerant design that is both energy-efficient, and applicable to both in-order and out-of-order cores. Key novel features include formulation of slice re-execution as an alternative use of multi-threading support, efficient schemes for register and memory state management, and new pruning mechanisms for drastically reducing load latency tolerance’s dynamic execution overheads. Area analysis shows that energy-efficient load latency tolerance increases the footprint of an out-of-order core by a few percent, while cycle-level simulation shows that it significantly improves the performance of memory-bound programs. Energy-efficient load latency tolerance is more energy-efficient than—and synergistic with—existing performance technique like dynamic voltage and frequency scaling (DVFS).

Subject Area

Computer Engineering|Computer science

Recommended Citation

Hilton, Andrew D, "Energy efficient load latency tolerance: Single-thread performance for the multi-core era" (2010). Dissertations available from ProQuest. AAI3447628.
https://repository.upenn.edu/dissertations/AAI3447628

Share

COinS