Logo image
Text-based content search and retrieval in ad hoc P2P communities
Technical documentation   Open access

Text-based content search and retrieval in ad hoc P2P communities

Francisco Matias Cuenca-Acuna and Thu Nguyen
Rutgers University
2002
DOI:
https://doi.org/10.7282/t3-5d3p-5g38

Abstract

We consider the problem of content search and retrieval in peer-to-peer (P2P) communities. P2P computing is a potentially powerful model for information sharing between ad hoc groups of users because of its low cost of entry and natural model for resource scaling with community size. As P2P communities grow in size, however, locating information distributed across the large number of peers becomes problematic. We present a distributed text-based content search and retrieval algorithm to address this problem. Our algorithm is based on a state-of-the-art text-based document ranking algorithm: the vector-space model, instantiated with the TFxIDF ranking rule. A naive application of TFxIDF would require each peer in a community to collect an inverted index of the entire community. This is costly both in terms of bandwidth and storage. Instead, we show how TFxIDF can be approximated given compact summaries of peers’ local inverted indexes. We make three contributions: (a) we show how the TFxIDF rule can be adapted to use the index summaries, (b) we provide a heuristic for adaptively determining the set of peers that should be contacted for a query, and (c) we show that our algorithm tracks TFxIDF’s performance very closely, regardless of how documents are distributed throughout the community. Furthermore, our algorithm preserves the main flavor of TFxIDF by retrieving close to the same set of documents for any given query.
pdf
dcs-tr-483146.94 kBDownloadView
Version of Record (VoR) Technical Documentation Open Access
url
Report an accessibility issueView
Please complete a content remediation request to report an accessibility issue with a library electronic resource, website, or service.

Metrics

105 File downloads
47 Record Views

Details

Logo image