Search results

Now showing 1 - 10 of 47
  • Publication
    A Substrate for In-Network Sensor Data Integration
    (2008-08-24) Mihaylov, Svilen; Jacob, Marie; Ives, Zachary G; Guha, Sudipto
    With the ultimate goal of extending the data integration paradigm and query processing capabilities to ad hoc wireless networks, sensors, and stream systems, we consider how to support communication between sets of nodes performing distributed joins in sensor networks. We develop a communication model that enables in-network join at a variety of locations, and which facilitates coordination among nodes in order to make optimization decisions. While we defer a discussion of the optimizer to future work, we experimentally compare a variety of strategies, including at-base and in-network joins. Results show significant performance gains versus prior work, as well as opportunities for optimization.
  • Publication
    WysiWyg Web Wrapper Factory (W4F)
    (1999) Sahuguet, Arnaud; Azavant, Fabien
    In this paper, we present the W4F toolkit for the generation of wrappers for Web sources. W4F consists of a retrieval language to identify Web sources, a declarative extraction language (the HTML Extraction Language) to express robust extraction rules and a mapping interface to export the extracted information into some user-defined data-structures. To assist the user and make the creation of wrappers rapid and easy, the toolkit offers some wysiwyg support via some wizards. Together, they permit the fast and semi-automatic generation of ready-to-go wrappers provided as Java classes. W4F has been successfully used to generate wrappers for database systems and software agents, making the content of Web sources easily accessible to any kind of application.
  • Publication
    Taming Web Sources with "Minute-Made" Wrappers
    (1999) Azavant, Fabien; Sahuguet, Arnaud
    The Web has become a major conduit to information repositories of all kinds. Today, more than 80% of information published on the Web is generated by underlying databases and this proportion keeps increasing. In some cases, database access is only granted through a Web gateway using forms as a query language and HTML as a display vehicle. In order to permit inter-operation (between Web sources and legacy databases or among Web sources themselves) there is a strong need for Web wrappers. Web wrappers share some of the characteristics of standard database wrappers but usually the underlying data sources offer very limited query capabilities and the struc- ture of the result (due to HTML shortcomings) might be loose and unstable. To overcome these problems, we divide the architecture of our Web wrappers into three components: (1) fetching the document, (2) extracting the information from its HTML formatting, and (3) mapping the information into a structure that can be used by applications (such as mediators).
  • Publication
    Modeling and Merging Database Schemas
    (1991-09-25) Kosky, Anthony S
    We define a general model for database schemas which is basically functional and supports specialisation relationships. Despite it's simplicity, our model is very general and expressive, so that database schemas and instances arising from a number of other data models can be translated into the model. We define and investigate a representation for the observations that can be made by querying a database system, and, in particular, look at which observations are valid for a particular database schema, and when one observation implies the observability of another. We will also look at the correspondence between the instances of a database schema and the observations that can be made for the database. We then go on to look at the problem of schema merging: we define an ordering on schemas representing their informational content and define the merge of a collection of schemas to be the least schema with the informational content of all the schemas being merged. However we establish that one cannot, in general, find a meaningful binary merging operator which is associative, though we would clearly require this of any such operator. We rectify this situation by relaxing our definition of schemas, defining a class of weak schemas over which we can construct a satisfactory concept of merges. Further we define a method of constructing a canonical proper schema with the same informational content as a weak schema whenever possible, thus giving us an adequate definition of the merge of a collection of proper schemas whenever such a merge can exist. In addition we show that, if the schemas we are merging are translations from some other data model, our merging process "respects" the original data model.
  • Publication
    Search and Result Presentation in Scientific Workflow Repositories
    (2013-05-17) Davidson, Susan; Huang, Xiaocheng; Stoyanovich, Julia; Yuan, Xiaojie
    We study the problem of searching a repository of complex hierarchical workflows whose component modules, both composite and atomic, have been annotated with keywords. Since keyword search does not use the graph structure of a workflow, we develop a model of workflows using context-free bag grammars. We then give efficient polynomial-time algorithms that, given a workflow and a keyword query, determine whether some execution of the workflow matches the query. Based on these algorithms we develop a search and ranking solution that efficiently retrieves the top-k grammars from a repository. Finally, we propose a novel result presentation method for grammars matching a keyword query, based on representative parse-trees. The effectiveness of our
  • Publication
    Effecting Database Transformations Using Morphase
    (1996-02-23) Davidson, Susan B; Kosky, Anthony S
    Database transformations are a frequent problem for data managers supporting scientific databases, particularly those connected with the Human Genome Project. The databases involved frequently contain complex data-structures not typically found in conventional databases, such as arbitrarily nested records, sets, variants and optional fields, as well as object identities and recursive data-structures. Furthermore, programs implementing the transformations must be frequently modified since the databases involved evolve rapidly, as often as 3 to 4 times a year. We present in this paper a language (WOL) for specifying transformations between such databases and describe its implementation in a system called Morphase. Optimizations are performed at all stages, with significant impact on the compilation and execution time of sample transformations.
  • Publication
    Interviewing During a Tight Job Market
    (2002-09-01) Ives, Zachary G; Luo, Qiong
    Various tips for interviewing for PhD graduates, seeking an academic position in a research university in Asia or North America are discussed. It is suggested that having the dissertation done before interviews gives a large degree of relief on one's mind. It is found that to be practical about job research package and keep a close eye on applications increases the confidence level. It is also observed that the questions during the talk provides opportunity to clarify and strengthen the talk and show this ability during the interview.
  • Publication
    Semantics of Database Transformations
    (1998) Davidson, Susan; Buneman, Peter; Kosky, Anthony S
    Database transformations arise in many different settings including database integration, evolution of database systems, and implementing user views and data entry tools. This paper surveys approaches that have been taken to problems in these settings, assesses their strengths and weaknesses, and develops require ments on a formal model for specifying and implementing database transformations. We also consider the problem of insuring the correctness of database transformations. In particular, we demonstrate that the usefulness of correctness conditions such as information preservation is hindered by the interactions of transformations and database constraints, and the limited expressive power of established database constraint languages. We conclude that more general notions of correctness are required, and that there is a need for a uniform formalism for expressing both database transformations and constraints, and reasoning about their interactions, Finally we introduce WOL, a declarative language for specifying and implementing database transformations and constraints. We briefly describe the WOL language and its semantics, and argue that it addresses many of the requirements on a formalism for dealing with general database transformations.
  • Publication
    Physical Data Independence, Constraints and Optimization with Universal Plans
    (1999-09-07) Deutsch, Alin; Popa, Lucian; Tannen, Val
    We present an optimization method and al gorithm designed for three objectives: physi cal data independence, semantic optimization, and generalized tableau minimization. The method relies on generalized forms of chase and "backchase" with constraints (dependen cies). By using dictionaries (finite functions) in physical schemas we can capture with con straints useful access structures such as indexes, materialized views, source capabilities, access support relations, gmaps, etc. The search space for query plans is defined and enumerated in a novel manner: the chase phase rewrites the original query into a "universal" plan that integrates all the access structures and alternative pathways that are allowed by appli cable constraints. Then, the backchase phase produces optimal plans by eliminating various combinations of redundancies, again according to constraints. This method is applicable (sound) to a large class of queries, physical access structures, and semantic constraints. We prove that it is in fact complete for "path-conjunctive" queries and views with complex objects, classes and dictio naries, going beyond previous theoretical work on processing queries using materialized views.
  • Publication
    Provenance in Collaborative Data Sharing
    (2009-07-01) Karvounarakis, Grigoris
    This dissertation focuses on recording, maintaining and exploiting provenance information in Collaborative Data Sharing Systems (CDSS). These are systems that support data sharing across loosely-coupled, heterogeneous collections of relational databases related by declarative schema mappings. A fundamental challenge in a CDSS is to support the capability of update exchange --- which publishes a participant's updates and then translates others' updates to the participant's local schema and imports them --- while tolerating disagreement between them and recording the provenance of exchanged data, i.e., information about the sources and mappings involved in their propagation. This provenance information can be useful during update exchange, e.g., to evaluate provenance-based trust policies. It can also be exploited after update exchange, to answer a variety of user queries, about the quality, uncertainty or authority of the data, for applications such as trust assessment, ranking for keyword search over databases, or query answering in probabilistic databases. To address these challenges, in this dissertation we develop a novel model of provenance graphs that is informative enough to satisfy the needs of CDSS users and captures the semantics of query answering on various forms of annotated relations. We extend techniques from data integration, data exchange, incremental view maintenance and view update to define the formal semantics of unidirectional and bidirectional update exchange. We develop algorithms to perform update exchange incrementally while maintaining provenance information. We present strategies for implementing our techniques over an RDBMS and experimentally demonstrate their viability in the Orchestra prototype system. We define ProQL, a query language for provenance graphs that can be used by CDSS users to combine data querying with provenance testing as well as to compute annotations for their data, based on their provenance, that are useful for a variety of applications. Finally, we develop a prototype implementation ProQL over an RDBMS and indexing techniques to speed up provenance querying, evaluate experimentally the performance of provenance querying and the benefits of our indexing techniques.