Store Vulnerability Window (SVW): Re-Execution Filtering for Enhanced Load Optimization
The load-store unit is a performance critical component of a dynamically-scheduled processor. It is also a complex and non-scalable component. Several recently proposed techniques use some form of speculation to simplify the load-store unit and check this speculation by re-executing some of the loads prior to commit. We call such techniques load optimizations. One recent load optimization improves load queue (LQ) scalability by using re-execution rather than associative search to check speculative intra- and inter- thread memory ordering. A second technique improves store queue (SQ) scalability by speculatively filtering some load accesses and some store entries from it and re-executing loads to check that speculation. A third technique speculatively removes redundant loads from the execution engine; re-execution detects false eliminations. Unfortunately, the benefits of a load optimization are often mitigated by re-execution itself. Re-execution contends for cache bandwidth with store commit, and serializes load re-execution with subsequent store commit. If a given load optimization requires a sufficient number of load re-executions, the aggregate re-execution cost may overwhelm the benefits of the technique entirely and even cause drastic slowdowns. Store Vulnerability Window (SVW) is a new mechanism that significantly reduces the re-execution requirements of a given load optimization. SVW is based on monotonic store sequence numbering and an adaptation of Bloom filtering. The cost of a typical SVW implementation is a 1KB buffer and a 16-bit field per LQ entry. Across the three optimizations we study, SVW reduces re-executions by an average of 85%. This reduction relieves cache port contention and removes many of the dynamic serialization events that contribute the bulk of re-execution’s cost, allows these load optimizations to perform up to their full potential. For the speculative SQ, this means the chance to perform at all, as without SVW it posts significant slowdowns.