Text-based content search and retrieval in ad hoc P2P communities

Francisco Matias Cuenca-Acuna; Thu Nguyen

doi:10.7282/t3-5d3p-5g38

Back

Text-based content search and retrieval in ad hoc P2P communities

Technical documentation

Open access

Text-based content search and retrieval in ad hoc P2P communities

Francisco Matias Cuenca-Acuna and Thu Nguyen

Rutgers University

2002

DOI:

https://doi.org/10.7282/t3-5d3p-5g38

Abstract

We consider the problem of content search and retrieval in peer-to-peer (P2P) communities. P2P computing is a potentially powerful model for information sharing between ad hoc groups of users because of its low cost of entry and natural model for resource scaling with community size. As P2P communities grow in size, however, locating information distributed across the large number of peers becomes problematic. We present a distributed text-based content search and retrieval algorithm to address this problem. Our algorithm is based on a state-of-the-art text-based document ranking algorithm: the vector-space model, instantiated with the TFxIDF ranking rule. A naive application of TFxIDF would require each peer in a community to collect an inverted index of the entire community. This is costly both in terms of bandwidth and storage. Instead, we show how TFxIDF can be approximated given compact summaries of peers’ local inverted indexes. We make three contributions: (a) we show how the TFxIDF rule can be adapted to use the index summaries, (b) we provide a heuristic for adaptively determining the set of peers that should be contacted for a query, and (c) we show that our algorithm tracks TFxIDF’s performance very closely, regardless of how documents are distributed throughout the community. Furthermore, our algorithm preserves the main flavor of TFxIDF by retrieving close to the same set of documents for any given query.

Files and links (1)

pdf

dcs-tr-483146.94 kBDownload View

Version of Record (VoR)Technical Documentation Open Access

Metrics

51 File downloads

27 Record Views

Details

Title: Text-based content search and retrieval in ad hoc P2P communities
Creators: Francisco Matias Cuenca-Acuna (Author) - Computer Science (New Brunswick)
Thu Nguyen (Author) - Computer Science (New Brunswick)
Date published: 2002
Publisher: Rutgers University
Number of pages: 1 online resource (13 pages) : illustrations
Academic Unit: Computer Science (SAS); School of Arts and Sciences; Wireless Information Network Laboratory
Language: English
Resource Type: Technical documentation
Comment: Technical report DCS-TR-483
Identifiers: 991031549896704646