Technical Reports (CIS)
Penn Engineering is the birthplace of the modern computer. It was here that the ENIAC, the world's first electronic, large-scale, general-purpose digital computer, was developed in 1946. Since this auspicious beginning more than five decades ago, the field of computer science at Penn has been marked by exciting innovations. Over the last few years, Penn CIS has grown in algorithms, theory of computation, networking, systems and architecture, and artificial intelligence. We are building on these successes to strengthen work on databases, graphics, programming languages, and security, and to deepen our interdisciplinary work in such areas as bioinformatics, cognitive science, robotics, and management.
Search results
Publication Processing Data-Intensive Workflows in the Cloud(2012-01-01) Zhang, ZhuoyaoIn the recent years, large-scale data analysis has become critical to the success of modern enterprise. Meanwhile, with the emergence of cloud computing, companies are attracted to move their data analytics tasks to the cloud due to its exible, on demand resources usage and pay-as-you-go pricing model. MapReduce has been widely recognized as an important tool for performing large-scale data analysis in the cloud. It provides a simple and fault-tolerance framework for users to process data-intensive analytics tasks in parallel across dierent physical machines. In this report, we survey alternative implementations of MapReduce, contrasting batched-oriented and pipelined execution models and study how these models impact response times, completion time and robustness. Next, we present three optimization strategies for MapReduce-style work- ows, including (1) scan sharing across MapReduce programs, (2) work- ow optimizations aimed at reducing intermediate data, and (3) schedul- ing policies that map work ow tasks to dierent machines in order to minimize completion times and monetary costs. We conclude with a brief comparison across these optimization strate- gies, and discuss their pros/cons as well as performance implications of using more than one optimization strategy at a time.University of Pennsylvania Department of Computer and Information Science Technical Report No. MS-CIS-12-07.Publication Lessons Learned From a PLTL-CS Program(2010-01-01)The Peer-Led Team Learning (PLTL) approach has previously been shown to be effective in recruiting and retaining students, particularly under-represented students, in undergraduate introductory CS courses. In PLTL, small groups of students are led by an undergraduate peer and work together to solve problems related to CS. At Columbia University, the Columbia Emerging Scholars Program has used PLTL in an effort to increase enrollment in CS courses beyond the introductory level, and to increase the number of students who select Computer Science as their major, by demonstrating that CS is necessarily a collaborative activity that focuses more on problem solving and algorithmic thinking than on programming. Over the past five semesters, 68 students have completed the program, and preliminary results indicate that this program has had a positive effect on increasing participation in the major. This paper discusses our experiences of building and expanding the Columbia Emerging Scholars program, and addresses such topics as recruiting, training, scheduling, student behavior, and evaluation. We expect that this paper will provide a valuable set of lessons learned to other educators who seek to launch or grow a PLTL program at their institution as well.Publication Maintaining Distributed Recursive Views Incrementally(2011-01-01) Nigam, Vivek; Loo, Boon Thau; Jia, Limin; Scedrov, AndreDistributed logic programming languages, that allow both facts and programs to be distributed among different nodes in a network, have been recently proposed and used to declaratively program a wide-range of distributed systems, such as network protocols and multi-agent systems. However, the distributed nature of the underlying systems poses serious challenges to developing efficient and correct algorithms for evaluating these programs. This paper proposes an efficient asynchronous algorithm to compute incrementally the changes to the states in response to insertions and deletions of base facts. Our algorithm is formally proven to be correct in the presence of message reordering in the system. To our knowledge, this is the first formal proof of correctness for such an algorithm.Publication Fault Management in Distributed Systems(2010-01-05) Zhou, WenchaoIn the past decade, distributed systems have rapidly evolved, from simple client/server applications in local area networks, to Internet-scale peer-to-peer networks and large-scale cloud platforms deployed on tens of thousands of nodes across multiple administrative domains and geographical areas. Despite of the growing popularity and interests, designing and implementing distributed systems remains challenging, due to their ever- increasing scales and the complexity and unpredictability of the system executions. Fault management strengthens the robustness and security of distributed systems, by detecting malfunctions or violations of desired properties, diagnosing the root causes and maintaining verifiable evidences to demonstrate the diagnosis results. While its importance is well recognized, fault management in distributed systems, on the other hand, is notoriously difficult. To address the problem, various mechanisms and systems have been proposed in the past few years. In this report, we present a survey of these mechanisms and systems, and taxonomize them according to the techniques adopted and their application domains. Based on four representative systems (Pip, Friday, PeerReview and TrInc), we discuss various aspects of fault management, including fault detection, fault diagnosis and evidence generation. Their strength, limitation and application domains are evaluated and compared in detail.Publication Spectrum Sharing in Dynamic Spectrum Access Networks: WPE-II Written Report(2009-06-22) Liu, ChangbinA study by Federal Communication Commission shows that most of the spectrum in current wireless networks is unused most of the time, while some spectrum is heavily used. Recently dynamic spectrum access (DSA) has been proposed to solve this spectrum inefficiency problem, by allowing users to opportunistically access to unused spectrum. One important question in DSA is how to efficiently share spectrum among users so that spectrum utilization can be increased and wireless interference can be reduced. Spectrum sharing can be formalized as a graph coloring problem. In this report we focus on surveying spectrum sharing techniques in DSA networks and present four representative techniques in different taxonomy domains, including centralized, distributed with/without common control channel, and a real case study of DSA networks --- DARPA neXt Gen- eration (XG) radios. Their strengths and limitations are evaluated and compared in detail. Finally, we discuss the challenges in current spectrum sharing research and possible future directions.Publication Declarative Network Verification(2008-01-01) Wang, Anduo; Loo, Boon Thau; Basu, Prithwish; Sokolsky, OlegIn this paper, we present our initial design and implementation of a declarative network verifier (DNV). DNV utilizes theorem proving, a well established verification technique where logic-based axioms that automatically capture network semantics are generated, and a user-driven proof process is used to establish network correctness properties. DNV takes as input declarative networking specifications written in the Network Datalog (NDlog) query language, and maps that automatically into logical axioms that can be directly used in existing theorem provers to validate protocol correctness. DNV is a significant improvement compared to existing use case of theorem proving which typically require several man-months to construct the system specifications. Moreover, NDlog, a high-level specification, whose semantics are precisely compiled into DNV without loss, can be directly executed as implementations, hence bridging specifications, verification, and implementation. To validate the use of DNV, we present case studies using DNV in conjunction with the PVS theorem prover to verify routing protocols, including eventual properties of protocols in dynamic settings.Publication Clifford Algebras, Clifford Groups, and a Generalization of the Quaternions: The Pin and Spin Groups(2013-11-09) Gallier, Jean HOne of the main goals of these notes is to explain how rotations in Rn are induced by the action of a certain group, Spin(n), on Rn, in a way that generalizes the action of the unit complex numbers, U(1), on R2, and the action of the unit quaternions, SU(2), on R3 (i.e., the action is denied in terms of multiplication in a larger algebra containing both the group Spin(n) and R(n). The group Spin(n), called a spinor group, is defined as a certain subgroup of units of an algebra, Cln, the Clifford algebra associated with Rn. Since the spinor groups are certain well chosen subgroups of units of Clifford algebras, it is necessary to investigate Clifford algebras to get a firm understanding of spinor groups. These notes provide a tutorial on Clifford algebra and the groups Spin and Pin, including a study of the structure of the Cliord algebra Clp;q associated with a nondegenerate symmetric bilinear form of signature (p; q) and culminating in the beautiful \8-periodicity theorem" of Elie Cartan and Raoul Bott (with proofs).Publication The Simulation of Human Movement by Computer(1978-09-01) Badler, Norman I; O'Rourke, Joseph; Smoliar, Stephen W; Weber, LynneThis paper is concerned with a software simulation of movement of the human body. This simulation is being designed to drive a system for computer animation as part of a larger program concerned with the translation of movement notation into animated graphics. The simulation is based on a model of the human body as a network of special-purpose processors -- one processor situated at each joint of the body -- each with an instruction set designed around a set of "primitive movement concepts." We shall discuss the extent to which all these processors may employ the same architecture and the function of the network structure.Publication A Human Body Modelling System for Motion Studies(1977-08-01) Badler, Norman I; O'Rourke, JosephThe need to visualize and interpret human body movement data from experiments and simulations has led to the development of a new, computerized, three-dimensional representation for the human body. Based on a skeleton of joints and segments, the model is manipulated by specifying joint positions with respect to arbitrary frames of reference. The external form is modelled as the union of overlapping spheres which define the surface of each segment. The properties of the segment and sphere model include: an ability to utilize any connected portion of the body in order to examine selected movements without computing movements of undesired parts , a naming mechanism for describing parts within a segment, and a collision detection algorithm for finding contacts or illegal intersections of the body with itself or other objects. One of the most attractive features of this model is the simple hidden surface removal algorithm. Since spheres always project onto a plane as disks, a solid, shaded, realistically-formed raster display of the model can be efficiently generated by a simple overlaying of the disks from the backmost to the frontmost. A three-dimensional animated display on a line-drawing device is based on drawing circles. Examples of the three-dimensional figure as viewed on these different display media are presented. The flexibility of the representation is enhanced by a method for decomposing an object into spheres, given one or more of its cross-sections, so that the data input problem is significantly simplified, should other models be desired. Using data from existing simulation programs, movements of the model have been computed and displayed, yielding very satisfactory results. Various transportation related applications are proposed.Publication Disk Generators for a Raster Display Device(1976-12-01) Badler, Norman IA simple modification of Horn's circle drawing procedure yields a disk generator for a class of graphic devices capable of drawing rectangular areas. Another variation produces a disk a scan-line at a time allowing it to be drawn at the refresh rate of the display. The calculations involve only additions and binary shifts.