Web recommendation

Description

The problem is to classify webpage as interesting or not. In total, 9 users rate webpages as such, therefore there are 9 different datasets. A webpage is a bag, and the links on the webpage are the instances. The features are related to word frequency (and therefore very high-dimensional).

 

Original source

Thanks to professor Zhi-Hua Zhou for allowing us to provide this data.

@article{zhou2005multi,
  title={Multi-instance learning based web mining},
  author={Zhou, Zhi-Hua and Jiang, Kai and Li, Ming},
  journal={Applied Intelligence},
  volume={22},
  number={2},
  pages={135--147},
  year={2005},
  publisher={Springer}
}

 

Files

web.zip – This file contains 9 .MAT files, each corresponding to a different user. Each .MAT file has a training set x and testing set z. It is possible to concatenate the train and test set as follows:

data = [x;z];