<?xml version="1.0" encoding="utf-8" ?>
<rss version="2.0">
<channel>
<title>Database Research Group (CIS)</title>
<copyright>Copyright (c) 2013 University of Pennsylvania All rights reserved.</copyright>
<link>http://repository.upenn.edu/db_research</link>
<description>Recent documents in Database Research Group (CIS)</description>
<language>en-us</language>
<lastBuildDate>Fri, 03 May 2013 09:25:57 PDT</lastBuildDate>
<ttl>3600</ttl>








<item>
<title>Containment of Conjunctive Queries on Annotated Relations</title>
<link>http://repository.upenn.edu/db_research/46</link>
<guid isPermaLink="true">http://repository.upenn.edu/db_research/46</guid>
<pubDate>Tue, 08 Dec 2009 08:47:39 PST</pubDate>
<description>
	<![CDATA[
	<p>We study containment and equivalence of (unions of) conjunctive queries on relations annotated with elements of a commutative semiring. Such relations and the semantics of positive relational queries on them were introduced in a recent paper as a generalization of set semantics, bag semantics, incomplete databases, and databases annotated with various kinds of provenance information. We obtain positive decidability results and complexity characterizations for databases with lineage, why-provenance, and provenance polynomial annotations, for both conjunctive queries and unions of conjunctive queries. At least one of these results is surprising given that provenance polynomial annotations seem “more expressive” than bag semantics and under the latter, containment of unions of conjunctive queries is known to be undecidable. The decision procedures rely on interesting variations on the notion of containment mappings. We also show that for any positive semiring (a very large class) and conjunctive queries without self-joins, equivalence is the same as isomorphism.</p>

	]]>
</description>

<author>Todd J. Green</author>


</item>






<item>
<title>Reconcilable Differences</title>
<link>http://repository.upenn.edu/db_research/45</link>
<guid isPermaLink="true">http://repository.upenn.edu/db_research/45</guid>
<pubDate>Tue, 08 Dec 2009 08:47:36 PST</pubDate>
<description>
	<![CDATA[
	<p>Exact query reformulation using views in positive relational languages is well understood, and has a variety of applications in query optimization and data sharing.  Generalizations to larger fragments of the relational algebra (RA) --- specifically, support for the difference operator --- would increase the options available for query reformulation, and also apply to view adaptation (updating a materialized view in response to a modified view definition) and view maintenance.  Unfortunately, most questions about queries become undecidable in the presence of difference/negation.  We present a novel way of managing this difficulty via an excursion through a non-standard semantics, Z-relations, where tuples are annotated with positive or negative integers.</p>
<p>We show that under Z-semantics RA queries have a normal form as a single difference of positive queries and this leads to the decidability of equivalence. In most real-world settings with difference, it is possible to convert the queries to this normal form. We give a sound and complete algorithm that explores all reformulations of an RA query (under Z-semantics) using a set of RA views, finitely bounding the search space with a simple and natural cost model.  We investigate related complexity questions, and we also extend our results to queries with built-in predicates.</p>
<p>Z-relations are interesting in their own right because they capture updates and data uniformly.  However, our algorithm turns out to be sound and complete also for bag semantics, albeit necessarily only for a subclass of RA.  This subclass turns out to be quite large and covers generously the applications of interest to us.  We also show a subclass of RA where reformulation and evaluation under Z-semantics can be combined with duplicate elimination to obtain the answer under set semantics.</p>

	]]>
</description>

<author>Todd J. Green et al.</author>


</item>






<item>
<title>Modeling and Analysis of Multi-hop Control Networks</title>
<link>http://repository.upenn.edu/db_research/44</link>
<guid isPermaLink="true">http://repository.upenn.edu/db_research/44</guid>
<pubDate>Mon, 06 Jul 2009 11:33:24 PDT</pubDate>
<description>
	<![CDATA[
	<p>We propose a mathematical framework, inspired by the Wireless HART specification, for modeling and analyzing multi-hop communication networks. The framework is designed for systems consisting of multiple control loops closed over a multi-hop communication network. We separate control, topology, routing, and scheduling and propose formal syntax and semantics for the dynamics of the composed system. The main technical contribution of the paper is an explicit translation of multi-hop control networks to switched systems. We describe a Mathematica notebook that automates the translation of multihop control networks to switched systems, and use this tool to show how techniques for analysis of switched systems can be used to address control and networking co-design challenges.</p>

	]]>
</description>

<author>Alur Rajeev et al.</author>


</item>






<item>
<title>A Substrate for In-Network Sensor Data Integration</title>
<link>http://repository.upenn.edu/db_research/43</link>
<guid isPermaLink="true">http://repository.upenn.edu/db_research/43</guid>
<pubDate>Tue, 17 Mar 2009 06:44:31 PDT</pubDate>
<description>
	<![CDATA[
	<p>With the ultimate goal of extending the data integration paradigm and query processing capabilities to ad hoc wireless networks, sensors, and stream systems, we consider how to support communication between sets of nodes performing distributed joins in sensor networks. We develop a communication model that enables in-network join at a variety of locations, and which facilitates coordination among nodes in order to make optimization decisions. While we defer a discussion of the optimizer to future work, we experimentally compare a variety of strategies, including at-base and in-network joins. Results show significant performance gains versus prior work, as well as opportunities for optimization.</p>

	]]>
</description>

<author>Svilen Mihaylov et al.</author>


</item>






<item>
<title>Sideways Information Passing for Push-Style Query Processing</title>
<link>http://repository.upenn.edu/db_research/42</link>
<guid isPermaLink="true">http://repository.upenn.edu/db_research/42</guid>
<pubDate>Fri, 31 Oct 2008 06:01:21 PDT</pubDate>
<description>
	<![CDATA[
	<p>In many modern data management settings, data is queried from a central node or nodes, but is stored at remote sources. In such a setting it is common to perform "pushstyle" query processing, using multi-threaded pipelined hash joins and bushy query plans to compute parts of the query in parallel; to avoid idling, the CPU can switch between them as delays are encountered. This works well for simple select-project join queries, but increasingly, Web and integration applications require more complex queries with multiple joins and even nested subqueries. As we demonstrate in this paper, push-style execution of complex queries can be improved substantially via <em>sideways information passing</em>; push-style queries provide many opportunities for information passing that have not been studied in the past literature. We present adaptive information passing, a general runtime decision-making technique for reusing intermediate state from one query subresult to prune and reduce computation of other subresults. We develop two alternative schemes for performing <em>adaptive information passing</em>, which we study in several settings under a variety of workloads.</p>

	]]>
</description>

<author>Zachary G. Ives et al.</author>


</item>






<item>
<title>Annotated XML: Queries and Provenance</title>
<link>http://repository.upenn.edu/db_research/41</link>
<guid isPermaLink="true">http://repository.upenn.edu/db_research/41</guid>
<pubDate>Fri, 11 Jul 2008 12:14:51 PDT</pubDate>
<description>
	<![CDATA[
	<p>We present a formal framework for capturing the provenance of data appearing in XQuery views of XML. Building on previous work on relations and their (positive) query languages, we decorate unordered XML with annotations from commutative semirings and show that these annotations suffice for a large positive fragment of XQuery applied to this data. In addition to tracking provenance metadata, the framework can be used to represent and process XML with repetitions, incomplete XML, and probabilistic XML, and provides a basis for enforcing access control policies in security applications.</p>
<p>Each of these applications builds on our semantics for XQuery, which we present in several steps: we generalize the semantics of the Nested Relational Calculus (NRC) to handle semiring-annotated complex values, we extend it with a recursive type and structural recursion operator for trees, and we define a semantics for XQuery on annotated XML by translation into this calculus.</p>

	]]>
</description>

<author>John N. Foster et al.</author>


</item>






<item>
<title>An Equational Chase for Path-Conjunctive Queries, Constraints, and Views </title>
<link>http://repository.upenn.edu/db_research/40</link>
<guid isPermaLink="true">http://repository.upenn.edu/db_research/40</guid>
<pubDate>Thu, 28 Jun 2007 13:26:49 PDT</pubDate>
<description>
	<![CDATA[
	<p>We consider the class of <em>path-conjunctive</em> queries and constraints (dependencies) defined over complex values with dictionaries. This class includes the relational conjunctive queries and embedded dependencies, as well as many interesting examples of complex value and oodb queries and integrity constraints. We show that some important classical results on containment, dependency implication, and chasing extend and generalize to this class.</p>

	]]>
</description>

<author>Val Tannen et al.</author>


</item>






<item>
<title>Taming Web Sources with &quot;Minute-Made&quot; Wrappers</title>
<link>http://repository.upenn.edu/db_research/39</link>
<guid isPermaLink="true">http://repository.upenn.edu/db_research/39</guid>
<pubDate>Thu, 28 Jun 2007 13:11:26 PDT</pubDate>
<description>
	<![CDATA[
	<p>The Web has become a major conduit to information repositories of all kinds. Today, more than 80% of information published on the Web is generated by underlying databases and this proportion keeps increasing. In some cases, database access is only granted through a Web gateway using forms as a query language and HTML as a display vehicle. In order to permit inter-operation (between Web sources and legacy databases or among Web sources themselves) there is a strong need for Web wrappers.</p>
<p>Web wrappers share some of the characteristics of standard database wrappers but usually the underlying data sources offer very limited query capabilities and the struc- ture of the result (due to HTML shortcomings) might be loose and unstable. To overcome these problems, we divide the architecture of our Web wrappers into three components: (1) fetching the document, (2) extracting the information from its HTML formatting, and (3) mapping the information into a structure that can be used by applications (such as mediators).</p>

	]]>
</description>

<author>Fabien Azavant et al.</author>


</item>






<item>
<title>Transforming Databases with Recursive Data Structures</title>
<link>http://repository.upenn.edu/db_research/38</link>
<guid isPermaLink="true">http://repository.upenn.edu/db_research/38</guid>
<pubDate>Thu, 28 Jun 2007 13:02:08 PDT</pubDate>
<description>
	<![CDATA[
	<p>This thesis examines the problems of performing structural transformations on databases involving complex data-structures and object-identities, and proposes an approach to specifying and implementing such transformations.</p>
<p>We start by looking at various applications of such <em>database transformations</em>, and at some of the more significant work in these areas. In particular we will look at work on transformations in the area of <em>database integration</em>, which has been one of the major motivating areas for this work. We will also look at various notions of correctness that have been proposed for database transformations, and show that the utility of such notions is limited by the dependence of transformations on certain implicit database constraints. We draw attention to the limitations of existing work on transformations, and argue that there is a need for a more general formalism for reasoning about database transformations and constraints.</p>
<p>We will also argue that, in order to ensure that database transformations are well-defined and meaningful, it is necessary to understand the information capacity of the data-models being transformed. To this end we give a thorough analysis of the information capacity of data-models supporting object identity, and will show that this is dependent on the operations supported by a query language for comparing object identities.</p>
<p>We introduce a declarative language, <em>WOL</em>, based on Horn-clause logic, for specifying database transformations and constraints. We also propose a method of implementing transformations specified in this language, by manipulating their clauses into a <em>normal form</em> which can then be translated into an underlying database programming language. Finally we will present a number of optimizations and techniques necessary in order to build a practical implementation based on these proposals, and will discuss the results of some of the trials that were carried out using a prototype of such a system.</p>

	]]>
</description>

<author>Anthony S. Kosky</author>


</item>






<item>
<title>Modeling and Merging Database Schemas</title>
<link>http://repository.upenn.edu/db_research/37</link>
<guid isPermaLink="true">http://repository.upenn.edu/db_research/37</guid>
<pubDate>Thu, 28 Jun 2007 12:28:36 PDT</pubDate>
<description>
	<![CDATA[
	<p>We define a general model for database schemas which is basically functional and supports specialisation relationships. Despite it's simplicity, our model is very general and expressive, so that database schemas and instances arising from a number of other data models can be translated into the model.</p>
<p>We define and investigate a representation for the observations that can be made by querying a database system, and, in particular, look at which observations are valid for a particular database schema, and when one observation implies the observability of another. We will also look at the correspondence between the instances of a database schema and the observations that can be made for the database.</p>
<p>We then go on to look at the problem of schema merging: we define an ordering on schemas representing their informational content and define the merge of a collection of schemas to be the least schema with the informational content of all the schemas being merged. However we establish that one cannot, in general, find a meaningful binary merging operator which is associative, though we would clearly require this of any such operator. We rectify this situation by relaxing our definition of schemas, defining a class of weak schemas over which we can construct a satisfactory concept of merges. Further we define a method of constructing a canonical proper schema with the same informational content as a weak schema whenever possible, thus giving us an adequate definition of the merge of a collection of proper schemas whenever such a merge can exist. In addition we show that, if the schemas we are merging are translations from some other data model, our merging process "respects" the original data model.</p>

	]]>
</description>

<author>Anthony S. Kosky</author>


</item>






<item>
<title>Adding Structure to Unstructured Data </title>
<link>http://repository.upenn.edu/db_research/35</link>
<guid isPermaLink="true">http://repository.upenn.edu/db_research/35</guid>
<pubDate>Tue, 26 Jun 2007 17:12:09 PDT</pubDate>
<description>
	<![CDATA[
	<p>We develop a new schema for unstructured data. Traditional schemas resemble the type systems of programming languages. For unstructured data, however, the underlying type may be much less constrained and hence an alternative way of expressing constraints on the data is needed. Here, we propose that both data and schema be represented as edge-labeled graphs. We develop notions of conformance between a graph database and a graph schema and show that there is a natural and efficiently computable ordering on graph schemas. We then examine certain subclasses of schemas and show that schemas are closed under query applications. Finally, we discuss how they may be used in query decomposition and optimization.</p>

	]]>
</description>

<author>Peter Buneman et al.</author>


</item>






<item>
<title>Optimizing Taxonomic Semantic Web Queries using Labeling Schemes</title>
<link>http://repository.upenn.edu/db_research/34</link>
<guid isPermaLink="true">http://repository.upenn.edu/db_research/34</guid>
<pubDate>Tue, 26 Jun 2007 17:12:06 PDT</pubDate>
<description>
	<![CDATA[
	<p>This paper focuses on the optimization of the navigation through voluminous <em>subsumption</em> hierarchies of topics employed by Portal Catalogs like Netscape Open Directory (ODP). We advocate for the use of labeling schemes for modeling these hierarchies in order to efficiently answer queries such as subsumption check, descendants, ancestors or nearest common ancestor, which usually require costly transitive closure computations. We first give a qualitative comparison of three main families of schemes, namely bit vector, prefix and interval based schemes. We then show that two labeling schemes are good candidates for an efficient implementation of label querying using standard relational DBMS, namely the Dewey Prefix scheme and an Interval scheme by Agrawal, Borgida and Jagadish. We compare their storage and query evaluation performance for the 16 ODP hierarchies using the PostgreSQL engine.</p>

	]]>
</description>

<author>Vassilis Christophides et al.</author>


</item>






<item>
<title>L-Tree: a Dynamic Labeling Structure for Ordered XML Data</title>
<link>http://repository.upenn.edu/db_research/33</link>
<guid isPermaLink="true">http://repository.upenn.edu/db_research/33</guid>
<pubDate>Mon, 25 Jun 2007 13:47:08 PDT</pubDate>
<description>
	<![CDATA[
	<p>With the ever growing use of XML as a data representation format, we see an increasing need for robust, high performance XML database systems. While most of the recent work focuses on efficient XML query processing, XML databases also need to support efficient updates. To speed up query processing, various labeling schemes have been proposed. However, the vast majority of these schemes have poor update performance. In this paper, we introduce a dynamic labeling structure for XML data: L-Tree and its order-preserving labeling scheme with O(log n) amortized update cost and O(log n) bits per label. L-Tree has good performance on updates without compromising the performance of query processing. We present the update algorithm for L-Tree and analyze its complexity.</p>

	]]>
</description>

<author>Yi Chen et al.</author>


</item>






<item>
<title>Piazza: Data Management Infrastructure for Semantic Web Applications</title>
<link>http://repository.upenn.edu/db_research/32</link>
<guid isPermaLink="true">http://repository.upenn.edu/db_research/32</guid>
<pubDate>Mon, 25 Jun 2007 13:35:16 PDT</pubDate>
<description>
	<![CDATA[
	<p>The Semantic Web envisions a World Wide Web in which data is described with rich semantics and applications can pose complex queries. To this point, researchers have defined new languages for specifying <em>meanings</em> for concepts and developed techniques for <em>reasoning</em> about them, using RDF as the data model. To flourish, the Semantic Web needs to be able to accommodate the huge amounts of existing data and the applications operating on them. To achieve this, we are faced with two problems. First, most of the world's data is available not in RDF but in XML; XML and the applications consuming it rely not only on the domain structure of the data, but also on its document structure. Hence, to provide interoperability between such sources, we must map between both their domain structures and their document structures. Second, data management practitioners often prefer to exchange data through local point-to-point data translations, rather than mapping to common mediated schemas or ontologies. This paper describes the Piazza system, which addresses these challenges. Piazza offers a language for mediating between data sources on the Semantic Web, which maps both the domain structure and document structure. Piazza also enables interoperation of XML data with RDF data that is accompanied by rich OWL ontologies. Mappings in Piazza are provided at a local scale between small sets of nodes, and our query answering algorithm is able to chain sets mappings together to obtain relevant data from across the Piazza network. We also describe an implemented scenario in Piazza and the lessons we learned from it.</p>

	]]>
</description>

<author>Alon Y. Halevy et al.</author>


</item>






<item>
<title>Crossing the Structure Chasm </title>
<link>http://repository.upenn.edu/db_research/31</link>
<guid isPermaLink="true">http://repository.upenn.edu/db_research/31</guid>
<pubDate>Tue, 12 Jun 2007 12:43:32 PDT</pubDate>
<description>
	<![CDATA[
	<p>It has frequently been observed that most of the world’s data lies <em>outside</em> database systems. The reason is that database systems focus on <em>structured</em> data, leaving the unstructured realm to others. The world of unstructured data has several very appealing properties, such as ease of authoring, querying and data sharing. In contrast, authoring, querying and sharing structured data require significant effort, albeit with the benefit of rich query languages and exact answers. We argue that in order to broaden the use of data management tools, we need a concerted effort to cross this <em>structure chasm</em>, by importing the attractive properties of the unstructured world into the structured one. As an initial effort in this direction, we introduce the REVERE System, which offers several mechanisms for crossing the structure chasm, and considers as its first application the chasm on the WWW.REVERE includes three innovations: (1) a data creation environment that entices people to structure data and enables them to do it rapidly; (2) a data sharing environment, based on a <em>peer data management system</em>, in which a web of data is created by establishing local mappings between schemas, and query answering is done over the transitive closure of these mappings; (3) a novel set of tools that are based on computing statistics over corpora of schemata and structured data. In a sense, we are trying to adapt the key techniques of the unstructured world, namely computing statistics over text coropra, into the world of structured data. We sketch how statistics computed over such corpora, which capture common term usage patterns, can be used to create tools for assisting in schema and mapping development. The initial application of REVERE focuses on creating a web of structured data from data that is usually stored in HTML web pages (e.g., personal information, course information, etc.).</p>

	]]>
</description>

<author>Oren Etzioni et al.</author>


</item>






<item>
<title>MARS: A System for Publishing XML from Mixed and Redundant Storage </title>
<link>http://repository.upenn.edu/db_research/30</link>
<guid isPermaLink="true">http://repository.upenn.edu/db_research/30</guid>
<pubDate>Tue, 12 Jun 2007 12:38:08 PDT</pubDate>
<description>
	<![CDATA[
	<p>We present a system for publishing as XML data from mixed (relational+XML) proprietary storage, while supporting redundancy in storage for tuning purposes. The correspondence between public and proprietary schemas is given by a combination of LAV- and GAV-style views expressed in XQuery. XML and relational integrity constraints are also taken into consideration. Starting with client XQueries formulated against the pub lic schema the system achieves the combined effect of rewriting-with-views, composition with-views and query minimization under integrity constraints to obtain optimal reformulations against the proprietary schema. The paper focuses on the engineering and the experimental evaluation of the MARS system.</p>

	]]>
</description>

<author>Alin Deutsch et al.</author>


</item>






<item>
<title>Interviewing During a Tight Job Market</title>
<link>http://repository.upenn.edu/db_research/29</link>
<guid isPermaLink="true">http://repository.upenn.edu/db_research/29</guid>
<pubDate>Tue, 12 Jun 2007 12:27:45 PDT</pubDate>
<description>
	<![CDATA[
	<p>Various tips for interviewing for PhD graduates, seeking an academic position in a research university in Asia or North America are discussed. It is suggested that having the dissertation done before interviews gives a large degree of relief on one's mind. It is found that to be practical about job research package and keep a close eye on applications increases the confidence level. It is also observed that the questions during the talk provides opportunity to clarify and strengthen the talk and show this ability during the interview.</p>

	]]>
</description>

<author>Zachary G. Ives et al.</author>


</item>






<item>
<title>Beyond Discrete E-Services: Composing Session-oriented Services in Telecommunications </title>
<link>http://repository.upenn.edu/db_research/28</link>
<guid isPermaLink="true">http://repository.upenn.edu/db_research/28</guid>
<pubDate>Tue, 12 Jun 2007 12:15:45 PDT</pubDate>
<description>
	<![CDATA[
	<p>We distinguish between two broad categories of e-services: <em>discrete</em> services (e.g., add item to shopping cart, charge a credit card), and <em>session-oriented</em> ones (teleconference, collaborative text chat, streaming video, c-commerce interactions). Discrete services typically have short duration, and cannot respond to external asynchronous events. Session-oriented services have longer duration (perhaps hours), and typically can respond to asynchronous events (e.g., the ability to add a new participant to a teleconference). When composing discrete e-services it usually suffices to use a process model and engine that composes the e-services as relatively independent tasks. But when composing session-oriented e-services, the engine must be able to receive asynchronous events and determine how and whether to impact the active sessions. For example, if a teleconference participant loses his wireless connection then it might be appropriate to trigger an announcement to some or all of the other participants. In this paper we propose a process model and architecture for flexible composition and execution of discrete and session-oriented services. Unlike previous approaches, our model permits the specification of scripted "active flowcharts" that can be triggered by asynchronous events, and can appropriately impact active sessions. We introduce here a model and language for specifying process schemas (essentially a collection of active flowcharts) that combine multiple e-services, and describe a prototype engine for executing these process schemas.</p>

	]]>
</description>

<author>Vassilis Christophides et al.</author>


</item>






<item>
<title>Towards A Query Language for Annotation Graphs</title>
<link>http://repository.upenn.edu/db_research/27</link>
<guid isPermaLink="true">http://repository.upenn.edu/db_research/27</guid>
<pubDate>Mon, 11 Jun 2007 14:16:13 PDT</pubDate>
<description>
	<![CDATA[
	<p>The multidimensional, heterogeneous, and temporal nature of speech databases raises interesting challenges for representation and query. Recently, annotation graphs have been proposed as a general-purpose representational framework for speech databases. Typical queries on annotation graphs require path expressions similar to those used in semistructured query languages. However, the underlying model is rather different from the customary graph models for semistructured data: the graph is acyclic and unrooted, and both temporal and inclusion relationships are important. We develop a query language and describe optimization techniques for an underlying relational representation.</p>

	]]>
</description>

<author>Steven Bird et al.</author>


</item>






<item>
<title>Physical Data Independence, Constraints and Optimization with Universal Plans </title>
<link>http://repository.upenn.edu/db_research/26</link>
<guid isPermaLink="true">http://repository.upenn.edu/db_research/26</guid>
<pubDate>Mon, 11 Jun 2007 14:13:42 PDT</pubDate>
<description>
	<![CDATA[
	<p>We present an optimization method and al gorithm designed for three objectives: physi cal data independence, semantic optimization, and generalized tableau minimization. The method relies on generalized forms of chase and "backchase" with constraints (dependen cies). By using dictionaries (finite functions) in physical schemas we can capture with con straints useful access structures such as indexes, materialized views, source capabilities, access support relations, gmaps, etc. The search space for query plans is defined and enumerated in a novel manner: the chase phase rewrites the original query into a "universal" plan that integrates all the access structures and alternative pathways that are allowed by appli cable constraints. Then, the backchase phase produces optimal plans by eliminating various combinations of redundancies, again according to constraints. This method is applicable (sound) to a large class of queries, physical access structures, and semantic constraints. We prove that it is in fact complete for "path-conjunctive" queries and views with complex objects, classes and dictio naries, going beyond previous theoretical work on processing queries using materialized views.</p>

	]]>
</description>

<author>Alin Deutsch et al.</author>


</item>





</channel>
</rss>
