Newsgroups

Description

Newsgroups is a text categorization dataset. A bag is a collection of posts from different newsgroups. There are 20 categories in total. A positive bag for the target category is generated to contain 3% posts from the target category, and 97% of posts, randomly sampled from the other categories.

 

 

Source

Thanks to professor Zhi-Hua Zhou for allowing us to provide this data.

BibTeX entry:

@inproceedings{zhou2009multi,
  title={Multi-instance learning by treating instances as non-IID samples},
  author={Zhou, Zhi-Hua and Sun, Yu-Yin and Li, Yu-Feng},
  booktitle={Proceedings of the 26th Annual International Conference on Machine Learning},
  pages={1249--1256},
  year={2009},
  organization={ACM}
}

Files

newsgroups.zip – This file contains 20 .mat files, one for each target category.