Departmental Papers (CIS)

Date of this Version


Document Type

Conference Paper


Copyright 2009 IEEE. Reprinted from:
Hilton, A.; Nagarakatte, S.; Roth, A., "iCFP: Tolerating all-level cache misses in in-order processors," High Performance Computer Architecture, 2009. HPCA 2009. IEEE 15th International Symposium on , vol., no., pp.431-442, 14-18 Feb. 2009

This material is posted here with permission of the IEEE. Such permission of the IEEE does not in any way imply IEEE endorsement of any of the University of Pennsylvania's products or services. Internal or personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution must be obtained from the IEEE by writing to By choosing to view this document, you agree to all provisions of the copyright laws protecting it.


Growing concerns about power have revived interest in in-order pipelines. In-order pipelines sacrifice single-thread performance. Specifically, they do not allow execution to flow freely around data cache misses. As a result, they have difficulties overlapping independent misses with one another. Previously proposed techniques like Runahead execution and Multipass pipelining have attacked this problem. In this paper, we go a step further and introduce iCFP (in-order Continual Flow Pipeline), an adaptation of the CFP concept to an in-order processor. When iCFP encounters a primary data cache or 12 miss, it checkpoints the register file and transitions into an "advance " execution mode. Miss-independent instructions execute as usual and even update register state. Miss- dependent instructions are diverted into a slice buffer, un-blocking the pipeline latches. When the miss returns, iCFP "rallies" and executes the contents of the slice buffer, merging miss-dependent state with miss- independent state along the way. An enhanced register dependence tracking scheme and a novel store buffer design facilitate the merging process. Cycle-level simulations show that iCFP out-performs Runahead, Multipass, and SLTP, another non-blocking in-order pipeline design.


multiprocessing systems, pipeline processing, Runahead execution, all-level cache, in-order continual flow pipeline, in-order pipelines, in-order processors, miss-independent instructions, multipass pipelining, register dependence tracking scheme, register file



Date Posted: 08 June 2009