Tight results for clustering and summarizing data streams

dc.contributor.authorGuha, Sudipto
dc.contributor.authorGuha, Sudipto
dc.date2023-05-17T02:43:36.000
dc.date.accessioned2023-05-22T12:48:08Z
dc.date.available2023-05-22T12:48:08Z
dc.date.issued2009-01-09
dc.date.submitted2009-01-09T19:23:27-08:00
dc.description.abstractIn this paper we investigate algorithms and lower bounds for summarization problems over a single pass data stream. In particular we focus on histogram construction and K-center clustering. We provide a simple framework that improves upon all previous algorithms on these problems in either the space bound, the approximation factor or the running time. The framework uses a notion of ``streamstrapping'' where summaries created for the initial prefixes of the data are used to develop better approximation algorithms. We also prove the first non-trivial lower bounds for these problems. We show that the stricter requirement that if an algorithm accurately approximates the error of every bucket or every cluster produced by it, then these upper bounds are almost the best possible. This property of accurate estimation is true of all known upper bounds on these problems.
dc.identifier.urihttps://repository.upenn.edu/handle/20.500.14332/6440
dc.legacy.articleid1420
dc.legacy.fulltexturlhttps://repository.upenn.edu/cgi/viewcontent.cgi?article=1420&context=cis_papers&unstamped=1
dc.source.issue394
dc.source.journalDepartmental Papers (CIS)
dc.source.peerreviewedtrue
dc.source.statuspublished
dc.subject.otherdata streams
dc.subject.otherclustering
dc.titleTight results for clustering and summarizing data streams
dc.typePresentation
digcom.contributor.authorisAuthorOfPublication|email:sudipto@cis.upenn.edu|institution:University of Pennsylvania|Guha, Sudipto
digcom.identifiercis_papers/394
digcom.identifier.contextkey691810
digcom.identifier.submissionpathcis_papers/394
digcom.typeconference
dspace.entity.typePublication
relation.isAuthorOfPublication1182f70d-f2a0-4788-99ff-8185081c19f3
relation.isAuthorOfPublication.latestForDiscovery1182f70d-f2a0-4788-99ff-8185081c19f3
upenn.schoolDepartmentCenterDepartmental Papers (CIS)
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
kcen.pdf
Size:
201.45 KB
Format:
Adobe Portable Document Format
Collection