Database Research Group (CIS)

Document Type

Working Paper

Date of this Version

January 1999

Comments

Database Research Group.

Abstract

The Web has become a major conduit to information repositories of all kinds. Today, more than 80% of information published on the Web is generated by underlying databases and this proportion keeps increasing. In some cases, database access is only granted through a Web gateway using forms as a query language and HTML as a display vehicle. In order to permit inter-operation (between Web sources and legacy databases or among Web sources themselves) there is a strong need for Web wrappers.

Web wrappers share some of the characteristics of standard database wrappers but usually the underlying data sources offer very limited query capabilities and the struc- ture of the result (due to HTML shortcomings) might be loose and unstable. To overcome these problems, we divide the architecture of our Web wrappers into three components: (1) fetching the document, (2) extracting the information from its HTML formatting, and (3) mapping the information into a structure that can be used by applications (such as mediators).

Share

COinS
 

Date Posted: 28 June 2007