Web Community Dataset


Hyun Chul Lee
Allan Borodin
Leslie Goldsmith



We present the experimental data used in our paper Extracting and Ranking Viral Communities Using Seeds and Content Similarity, presented at Hypertext 2008 in June, 2008.

The dataset presented on this page is intended for the use of researchers. Permission is given researchers to download and use this data with the following provisions: the dataset is for the free and fair use of all and not for resale; the dataset must be cited giving the names of the compiler and editor of the dataset.

As indicated in our paper, we extracted 5 different communities, namely Camry, iPod, Xbox, PlayStation, and Mustang from a set consisting of about 2.84 million blog and forum entries. The provided dataset contains parsed raw data of about 330,000 blog/forum entries, which are constructed by taking the union of entries that compose the Camry, iPod, Xbox, PlayStation and Mustang communities. We present only a small subset of the original data, due to the size of the full set used in our experiments.

The data is in the tar.gz format and it consists of:

Downloads:

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

word to html converter html help workshop This Web Page Created with PageBreeze Free Website Builder  chm editor perl editor ide