Martin, Milo

Email Address
ORCID
Disciplines
Research Projects
Organizational Units
Position
Introduction
Research Interests

Search Results

Now showing 1 - 10 of 44
  • Publication
    Improved Sequence-Based Speculation Techniques for Implementing Memory Consistency
    (2008-05-27) Blundell, Colin; Martin, Milo M.K.; Wenisch, Tom
    This work presents BMW, a new design for speculative implementations of memory consistency models in shared-memory multiprocessors. BMW obtains the same performance as prior proposals, but achieves this performance while avoiding several undesirable attributes of prior proposals: non-scalable structures, per-word valid bits in the data cache, modifications to the cache coherence protocol, and global arbitration. BMW uses a read and write bit per cache block and a standard invalidation-based cache coherence protocol to perform conflict detection while speculating. While speculating, stores to block not in the cache are placed into a coalescing store buffer until those misses return. Stores are written speculatively to the primary cache, and non-speculative state is maintained by cleaning dirty blocks before being written speculatively. Speculative blocks are invalidated on abort and marked as non-speculative on commit. This organization allows for fast, local commits while avoiding a non-scalable store queue.
  • Publication
    Litmus Tests for Comparing Memory Consistency Models: How Long Do They Need to Be?
    (2011-01-01) Alur, Rajeev; Mador-Haim, Sela; Martin, Milo
    Even though the general problem of comparing two memory models is infeasible, in this paper we show that checking the equivalence of two memory models becomes feasible when we consider a more restricted class of memory models. We define a class of memory models that is expressive enough to include most known hardware memory models, and we establish a bound of two threads and no more than six memory access instructions for contrasting litmus tests in this class of models. Thus, we can compare memory models in this class by checking a small number of litmus tests. We build a tool for comparing memory models based on this theorem and use the tool to explore and map the space of this class of models.
  • Publication
    Protocol Design With Concolic Snippets
    (2012-01-01) Alur, Rajeev; Deshmukh, Jyotirmoy; Martin, Milo; Mador-Haim, Sela; Raghavan, Arun; Udupa, Abhishek
    With the maturing of computer-aided verification technology, there is an emerging opportunity to develop design tools that can transform the way systems are designed. In this paper, we propose a new way to specify protocols using concolic snippets, that is, sample execution fragments that contain both concrete and symbolic values. While the purely symbolic extreme is simply an alternative representation of the traditional communicating extended finite-state-machines, and the purely concrete extreme is an instantiation of the "programming by examples" paradigm, our specification language allows the designer to specify the desired protocol using a mixture of symbolic state machines and concrete scenarios. Our synthesis engine generalizes the snippets into a transition function, which is then analyzed using a model checker with respect to high-level temporal-logic correctness requirements. We describe a prototype implementation for design of cache coherence protocols built using (1) a straightforward enumeration of all expressions for transition functions, (2) a check for consistency with respect to concolic snippets using the SMT solver CVC3, and (3) a check for correctness using the model checker Murø. We discuss our experience in designing classical cache coherence protocols using the proposed methodology.
  • Publication
    Unrestricted Transactional Memory: Supporting I/O and System Calls Within Transactions
    (2006-05-01) Lewis, E. Christopher; Blundell, Colin; Martin, Milo
    Hardware transactional memory has great potential to simplify the creation of correct and efficient multithreaded programs, enabling programmers to exploit the soon-to-be-ubiquitous multi-core designs. Transactions are simply segments of code that are guaranteed to execute without interference from other concurrently-executing threads. The hardware executes transactions in parallel, ensuring non-interference via abort/rollback/restart when conflicts are detected. Transactions thus provide both a simple programming interface and a highly-concurrent implementation that serializes only on data conflicts. A progression of recent work has broadened the utility of transactional memory by lifting the bound on the size and duration of transactions, called unbounded transactions. Nevertheless, two key challenges remain: (i) I/O and system calls cannot appear in transactions and (ii) existing unbounded transactional memory proposals require complex implementations. We describe a system for fully unrestricted transactions (i.e., they can contain I/O and system calls in addition to being unbounded in size and duration). We achieve this via two modes of transaction execution: restricted (which limits transaction size, duration, and content but is highly concurrent) and unrestricted (which is unbounded and can contain I/O and system calls but has limited concurrency because there can be only one unrestricted transaction executing at a time). Transactions transition to unrestricted mode only when necessary. We introduce unoptimized and optimized implementations in order to balance performance and design complexity.
  • Publication
    SoftBound: Highly Compatible and Complete Spatial Memory Safety for C
    (2009-01-01) Nagarakatte, Santosh; Martin, Milo; Zhao, Jianzhou; Zdancewic, Stephan A
    The serious bugs and security vulnerabilities facilitated by C/C++’s lack of bounds checking are well known. Yet, C and C++ remain in widespread use. Unfortunately, C’s arbitrary pointer arithmetic, conflation of pointers and arrays, and programmer-visible memory layout make retrofitting C/C++ with spatial safety guarantees extremely challenging. Existing approaches suffer from incompleteness, have high runtime overhead, or require non-trivial changes to the C source code. Thus far, these deficiencies have prevented widespread adoption of such techniques. This paper proposes SoftBound, a compile time transformation for enforcing complete spatial safety of C. SoftBound records base and bound information for every pointer as disjoint metadata. This decoupling enables SoftBound to provide complete spatial safety while requiring no changes to C source code. Moreover, SoftBound performs metadata manipulation only when loading or storing pointer values. A formal proof shows this is sufficient to provide complete spatial safety even in the presence of wild casts. SoftBound’s full checking mode provides complete spatial violation detection. To further reduce overheads, SoftBound has a store-only checking mode that successfully detects all the security vulnerabilities in a test suite while adding 15% or less overhead to half of the benchmarks.
  • Publication
    Core Ironclad
    (2013-01-01) Osera, Peter-Michael; Eisenberg, Richard A.; DeLozier, Christian; Nagarakatte, Santosh; Martin, Milo; Zdancewic, Stephan A
    Core Ironclad is a core calculus that models the salient features of Ironclad C++, a library-augmented type-safe subset of C++. We give an overview of the language including its definition and key design points. We then prove type safety for the language and use that result to show that the pointer lifetime invariant, a key property of Ironclad C++, holds within the system.
  • Publication
    RETCON: Transactional Repair Without Replay
    (2009-11-25) Blundell, Colin; Martin, Milo; Raghavan, Arun
    Over the past decade, there has been a surge of academic and industrial interest in optimistic concurrency, i.e., the speculative parallel execution of code regions (transactions or critical sections) with the semantics of isolation from one another. This work analyzes bottlenecks to the scalability of workloads that use optimistic concurrency. We find that one common source of performance degradation is updates to auxiliary program data in otherwise non-conflicting operations, e.g. reference count updates on shared object reads and hashtable size field increments on inserts of different elements. To eliminate the performance impact of conflicts on such auxiliary data, this work proposes RETCON, a hardware mechanism that tracks the relationship between input and output values symbolically and uses this symbolic information to transparently repair the output state of a transaction at commit. RETCON is inspired by instruction replay-based mechanisms but exploits simplifying properties of the nature of computations on auxiliary data to perform repair without replay. Our experiments show that RETCON provides significant speedups for workloads that exhibit conflicts on auxiliary data, including transforming a transactionalized version of the reference python interpreter from a workload that exhibits no scaling to one that exhibits near-linear scaling on 32 cores.
  • Publication
    CheckFence: Checking Consistency of Concurrent Data Types on Relaxed Memory Models
    (2007-06-01) Burckhardt, Sebastian; Alur, Rajeev; Martin, Milo
    Concurrency libraries can facilitate the development of multithreaded programs by providing concurrent implementations of familiar data types such as queues or sets. There exist many optimized algorithms that can achieve superior performance on multiprocessors by allowing concurrent data accesses without using locks. Unfortunately, such algorithms can harbor subtle concurrency bugs. Moreover, they require memory ordering fences to function correctly on relaxed memory models. To address these difficulties, we propose a verification approach that can exhaustively check all concurrent executions of a given test program on a relaxed memory model and can verify that they are observationally equivalent to a sequential execution. Our Check- Fence prototype automatically translates the C implementation code and the test program into a SAT formula, hands the latter to a standard SAT solver, and constructs counterexample traces if there exist incorrect executions. Applying CheckFence to five previously published algorithms, we were able to (1) find several bugs (some not previously known), and (2) determine how to place memory ordering fences for relaxed memory models.
  • Publication
    Multicore Acceleration for Priority Based Schedulers for Concurrency Bug Detection
    (2012-06-01) Nagarakatte, Santosh; Burckhardt, Sebastian; Martin, Milo; Musuvathi, Madan
    Testing multithreaded programs is difficult as threads can interleave in a nondeterministic fashion. Untested interleavings can cause failures, but testing all interleavings is infeasible. Many interleaving exploration strategies for bug detection have been proposed, but their relative effectiveness and performance remains unclear as they often lack publicly available implementations and have not been evaluated using common benchmarks. We describe NeedlePoint, an open-source framework that allows selection and comparison of a wide range of interleaving exploration policies for bug detection proposed by prior work. Our experience with NeedlePoint indicates that priority-based probabilistic concurrency testing (the PCT algorithm) finds bugs quickly, but it runs only one thread at a time, which destroys parallelism by serializing executions. To address this problem we propose a parallel version of the PCT algorithm (PPCT).We show that the new algorithm outperforms the original by a factor of 5x when testing parallel programs on an eight-core machine. We formally prove that parallel PCT provides the same probabilistic coverage guarantees as PCT. Moreover, PPCT is the first algorithm that runs multiple threads while providing coverage guarantees.
  • Publication
    NoSQ: Store-Load Communication without a Store Queue
    (2006-12-01) Sha, Tingting; Martin, Milo; Roth, Amir
    This paper presents NoSQ (short for No Store Queue), a microarchitecture that performs store-load communication without a store queue and without executing stores in the out-of-order engine. NoSQ implements store-load communication using speculative memory bypassing (SMB), the dynamic short-circuiting of DEF-store-load-USE chains to DEF-USE chains. Whereas previous proposals used SMB as an opportunistic complement to conventional store queue-based forwarding, NoSQ uses SMB as a store queue replacement. NoSQ relies on two supporting mechanisms. The first is an advanced store-load bypassing predictor that for a given dynamic load can predict whether that load will bypass and the identity of the communicating store. The second is an efficient verification mechanism for both bypassed and non-bypassed loads using in-order load re-execution with an SMB-aware store vulnerability window (SVW) filter. The primary benefit of NoSQ is a simple, fast datapath that does not contain store-load forwarding hardware; all loads get their values either from the data cache or from the register file. Experiments show that this simpler design - despite being more speculative - slightly outperforms a conventional store-queue based design on most benchmarks (by 2% on average).