Taming Web Sources with "Minute-Made" Wrappers

Azavant, Fabien; Sahuguet, Arnaud

Taming Web Sources with "Minute-Made" Wrappers

Files

taming.pdf (208.27 KB)

Penn collection

Database Research Group (CIS)

Permalink

https://repository.upenn.edu/handle/20.500.14332/8785

View all metadata

Author

Azavant, Fabien

Sahuguet, Arnaud

Abstract

The Web has become a major conduit to information repositories of all kinds. Today, more than 80% of information published on the Web is generated by underlying databases and this proportion keeps increasing. In some cases, database access is only granted through a Web gateway using forms as a query language and HTML as a display vehicle. In order to permit inter-operation (between Web sources and legacy databases or among Web sources themselves) there is a strong need for Web wrappers. Web wrappers share some of the characteristics of standard database wrappers but usually the underlying data sources offer very limited query capabilities and the struc- ture of the result (due to HTML shortcomings) might be loose and unstable. To overcome these problems, we divide the architecture of our Web wrappers into three components: (1) fetching the document, (2) extracting the information from its HTML formatting, and (3) mapping the information into a structure that can be used by applications (such as mediators).

Publication date

1999

Comments

Database Research Group.

Collection

Working Papers