On the Feasibility of Peer-to-Peer Web Indexing and Search

Loading...
Thumbnail Image
Penn collection
Departmental Papers (CIS)
Degree type
Discipline
Subject
Funder
Grant number
License
Copyright date
Distributor
Related resources
Author
Li, Jiyang
Hellerstein, Joseph M
Kaashoek, M Frans
Karger, David
Morris, Robert
Contributor
Abstract

This paper discusses the feasibility of peer-to-peer full-text keyword search of the Web. Two classes of keyword search techniques are in use or have been proposed: flooding of queries over an overlay network (as in Gnutella), and intersection of index lists stored in a distributed hash table. We present a simple feasibility analysis based on the resource constraints and search workload. Our study suggests that the peer-to-peer network does not have enough capacity to make naive use of either of search techniques attractive for Web search. The paper presents a number of existing and novel optimizations for P2P search based on distributed hash tables, estimates their effects on performance, and concludes that in combination these optimizations would bring the problem to within an order of magnitude of feasibility. The paper suggests a number of compromises that might achieve the last order of magnitude.

Advisor
Date Range for Data Collection (Start Date)
Date Range for Data Collection (End Date)
Digital Object Identifier
Book title
Series name and number
Publication date
2003-10-07
Volume number
Issue number
Publisher
Publisher DOI
Journal Issue
Comments
Postprint version. Published in Lecture Notes in Computer Science, Volume 2735, Peer-to-Peer Systems II, 2003, pages 207-215. Publisher URL: http://dx.doi.org/10.1007/b11823 NOTE: At the time of publication, author Boon Thau Loo was affiliated with the University of California at Berkeley. Currently (April 2007), he is a faculty member in the Department of Computer and Information Science at the University of Pennsylvania.
Recommended citation
Collection