Martin, Milo

Email Address
ORCID
Disciplines
Research Projects
Organizational Units
Position
Introduction
Research Interests

Search Results

Now showing 1 - 10 of 44
  • Publication
    CETS: Compiler Enforced Temporal Safety for C
    (2010-06-01) Nagarakatte, Santosh; Martin, Milo; Zhao, Jianzhou; Zdancewic, Stephan A
    Temporal memory safety errors, such as dangling pointer dereferences and double frees, are a prevalent source of software bugs in unmanaged languages such as C. Existing schemes that attempt to retrofit temporal safety for such languages have high runtime overheads and/or are incomplete, thereby limiting their effectiveness as debugging aids. This paper presents CETS, a compile-time transformation for detecting all violations of temporal safety in C programs. Inspired by existing approaches, CETS maintains a unique identifier with each object, associates this metadata with the pointers in a disjoint metadata space to retain memory layout compatibility, and checks that the object is still allocated on pointer dereferences. A formal proof shows that this is sufficient to provide temporal safety even in the presence of arbitrary casts if the program contains no spatial safety violations. Our CETS prototype employs both temporal check removal optimizations and traditional compiler optimizations to achieve a runtime overhead of just 48% on average. When combined with a spatial-checking system, the average overall overhead is 116% for complete memory safety.
  • Publication
    Improved Sequence-Based Speculation Techniques for Implementing Memory Consistency
    (2008-05-27) Blundell, Colin; Martin, Milo M.K.; Wenisch, Tom
    This work presents BMW, a new design for speculative implementations of memory consistency models in shared-memory multiprocessors. BMW obtains the same performance as prior proposals, but achieves this performance while avoiding several undesirable attributes of prior proposals: non-scalable structures, per-word valid bits in the data cache, modifications to the cache coherence protocol, and global arbitration. BMW uses a read and write bit per cache block and a standard invalidation-based cache coherence protocol to perform conflict detection while speculating. While speculating, stores to block not in the cache are placed into a coalescing store buffer until those misses return. Stores are written speculatively to the primary cache, and non-speculative state is maintained by cleaning dirty blocks before being written speculatively. Speculative blocks are invalidated on abort and marked as non-speculative on commit. This organization allows for fast, local commits while avoiding a non-scalable store queue.
  • Publication
    Simulating a $2M Commercial Server on a $2K PC
    (2003-02-01) Alameldeen, Alaa R; Martin, Milo; Mauer, Carl J; Moore, Kevin E; Xu, Min; Hill, Mark D; Wood, David A; Sorin, Daniel J
    The Internet has made database management systems and Web servers integral parts of today’s business and communications infrastructure. These and other commercial transaction-processing applications work with critical personal and business data—storing it, providing access to it, and manipulating it. As dependence on these applications increases, so does the need for them to run reliably and efficiently. Our group at the University of Wisconsin (www.cs.wisc.edu/multifacet/) researches innovative ways to improve the performance of the multiprocessor servers that run these important commercial applications. Execution-driven simulation is a design evaluation tool that models system hardware. These simulations capture actual program behavior and detailed system interactions. They are more flexible and less expensive than hardware prototypes, and they model important system details more accurately than analytic modeling does. However, the combination of large systems and demanding workloads is difficult to simulate, especially on the inexpensive machines available to most researchers. Commercial workloads, unlike simpler workloads, rely heavily on operating system services such as input/output, process scheduling, and interprocess communication. To run commercial workloads correctly, simulators must model these services. In addition, multiprocessor servers introduce the challenges of interactions among processors, large main memories, and many disks. To make effective use of limited simulation resources, researchers must balance three goals: • developing a representative approximation of large workloads, • achieving tractable simulation times, and • simulating a sufficient level of timing detail. We developed a simulation methodology to achieve these goals. Our methodology uses multiple simulations, pays careful attention to scaling effects on workload behavior, and extends VirtutechAB’s Simics full-system functional simulator with detailed timing models.
  • Publication
    Unrestricted Transactional Memory: Supporting I/O and System Calls Within Transactions
    (2006-05-01) Lewis, E. Christopher; Blundell, Colin; Martin, Milo
    Hardware transactional memory has great potential to simplify the creation of correct and efficient multithreaded programs, enabling programmers to exploit the soon-to-be-ubiquitous multi-core designs. Transactions are simply segments of code that are guaranteed to execute without interference from other concurrently-executing threads. The hardware executes transactions in parallel, ensuring non-interference via abort/rollback/restart when conflicts are detected. Transactions thus provide both a simple programming interface and a highly-concurrent implementation that serializes only on data conflicts. A progression of recent work has broadened the utility of transactional memory by lifting the bound on the size and duration of transactions, called unbounded transactions. Nevertheless, two key challenges remain: (i) I/O and system calls cannot appear in transactions and (ii) existing unbounded transactional memory proposals require complex implementations. We describe a system for fully unrestricted transactions (i.e., they can contain I/O and system calls in addition to being unbounded in size and duration). We achieve this via two modes of transaction execution: restricted (which limits transaction size, duration, and content but is highly concurrent) and unrestricted (which is unbounded and can contain I/O and system calls but has limited concurrency because there can be only one unrestricted transaction executing at a time). Transactions transition to unrestricted mode only when necessary. We introduce unoptimized and optimized implementations in order to balance performance and design complexity.
  • Publication
    Correctly Implementing Value Prediction in Microprocessors that Support Multithreading or Multiprocessing
    (2001-12-01) Martin, Milo; Sorin, Daniel J; Cain, Harold W; Hill, Mark D; Lipasti, Mikko H
    This paper explores the interaction of value prediction with thread-level parallelism techniques, including multithreading and multiprocessing, where correctness is defined by a memory consistency model. Value prediction subtly interacts with the memory consistency model by allowing data dependent instructions to be reordered. We find that predicting a value and later verifying that the value eventually calculated is the same as the value predicted is not always sufficient. We present an example of a multithreaded pointer manipulation that can generate a surprising and erroneous result when value prediction is implemented without considering memory consistency correctness. We show that this problem can occur with real software, and we discuss how to apply existing techniques to eliminate the problem in both sequentially consistent systems and systems that obey relaxed memory consistency models.
  • Publication
    Bandwidth Adaptive Snooping
    (2002-02-02) Martin, Milo; Sorin, Daniel J; Hill, Mark D; Wood, David A
    This paper advocates that cache coherence protocols use a bandwidth adaptive approach to adjust to varied system configurations (e.g., number of processors) and workload behaviors. We propose Bandwidth Adaptive Snooping Hybrid (BASH), a hybrid protocol that ranges from behaving like snooping (by broadcasting requests) when excess bandwidth is available to behaving like a directory protocol (by unicasting requests) when bandwidth is limited. BASH adapts dynamically by probabilistically deciding to broadcast or unicast on a per request basis using a local estimate of recent interconnection network utilization. Simulations of a microbenchmark and commercial and scientific workloads show that BASH robustly performs as well or better than the best of snooping and directory protocols as available bandwidth is varied. By mixing broadcasts and unicasts, BASH outperforms both snooping and directory protocols in the mid-range where a static choice of either is inefficient.
  • Publication
    Token Coherence: A New Framework for Shared-Memory Multiprocessors
    (2003-11-01) Martin, Milo; Hill, Mark D; Wood, David A
    Commercial workload and technology trends are pushing existing shared-memory multiprocessor coherence protocols in divergent directions. Token Coherence provides a framework for new coherence protocols that can reconcile these opposing trends.
  • Publication
    Litmus Tests for Comparing Memory Consistency Models: How Long Do They Need to Be?
    (2011-01-01) Alur, Rajeev; Mador-Haim, Sela; Martin, Milo
    Even though the general problem of comparing two memory models is infeasible, in this paper we show that checking the equivalence of two memory models becomes feasible when we consider a more restricted class of memory models. We define a class of memory models that is expressive enough to include most known hardware memory models, and we establish a bound of two threads and no more than six memory access instructions for contrasting litmus tests in this class of models. Thus, we can compare memory models in this class by checking a small number of litmus tests. We build a tool for comparing memory models based on this theorem and use the tool to explore and map the space of this class of models.
  • Publication
    Ironclad C++: A Library-Augmented Type-Safe Subset of C++
    (2013-03-28) DeLozier, Christian; Eisenberg, Richard A.; Nagarakatte, Santosh; Osera, Peter-Michael; Martin, Milo; Zdancewic, Stephan A
    C++ remains a widely used programming language, despite retaining many unsafe features from C. These unsafe features often lead to violations of type and memory safety, which manifest as buffer overflows, use-after-free vulnerabilities, or abstraction violations. Malicious attackers are able to exploit such violations to compromise application and system security. This paper introduces Ironclad C++, an approach to bring the benefits of type and memory safety to C++. Ironclad C++ is, in essence, a library-augmented type-safe subset of C++. All Ironclad C++ programs are valid C++ programs, and thus Ironclad C++ programs can be compiled using standard, off-the-shelf C++ compilers. However, not all valid C++ programs are valid Ironclad C++ programs. To determine whether or not a C++ program is a valid Ironclad C++ program, Ironclad C++ uses a syntactic source code validator that statically prevents the use of unsafe C++ features. For properties that are difficult to check statically Ironclad C++ applies dynamic checking to enforce memory safety using templated smart pointer classes. Drawing from years of research on enforcing memory safety, Ironclad C++ utilizes and improves upon prior techniques to significantly reduce the overhead of enforcing memory safety in C++. To demonstrate the effectiveness of this approach, we translate (with the assistance of a semi-automatic refactoring tool) and test a set of performance benchmarks, multiple bug-detection suites, and the open-source database leveldb. These benchmarks incur a performance overhead of 12% on average as compared to the unsafe original C++ code, which is small compared to prior approaches for providing comprehensive memory safety in C and C++.
  • Publication
    HardBound: Architectural Support for Spatial Safety of the C Programming Language
    (2008-03-01) Devietti, Joe; Martin, Milo; Blundell, Colin; Zdancewic, Stephan A
    The C programming language is at least as well known for its absence of spatial memory safety guarantees (i.e., lack of bounds checking) as it is for its high performance. C's unchecked pointer arithmetic and array indexing allow simple programming mistakes to lead to erroneous executions, silent data corruption, and security vulnerabilities. Many prior proposals have tackled enforcing spatial safety in C programs by checking pointer and array accesses. However, existing software-only proposals have significant drawbacks that may prevent wide adoption, including: unacceptably high runtime overheads, lack of completeness, incompatible pointer representations, or need for non-trivial changes to existing C source code and compiler infrastructure. Inspired by the promise of these software-only approaches, this paper proposes a hardware bounded pointer architectural primitive that supports cooperative hardware/software enforcement of spatial memory safety for C programs. This bounded pointer is a new hardware primitive datatype for pointers that leaves the standard C pointer representation intact, but augments it with bounds information maintained separately and invisibly by the hardware. The bounds are initialized by the software, and they are then propagated and enforced transparently by the hardware, which automatically checks a pointer's bounds before it is dereferenced. One mode of use requires instrumenting only malloc, which enables enforcement of per-allocation spatial safety for heap-allocated objects for existing binaries. When combined with simple intra-procedural compiler instrumentation, hardware bounded pointers enable a low-overhead approach for enforcing complete spatial memory safety in unmodified C programs.