UCSB Breast



UCSB Breast is an image classification problem. The original datasets consists of 58 TMA image excerpts (896 × 768 pixels) taken from 32 benign and 26 malignant breast cancer patients. The learning task is to classify images as benign (negative) or malignant (positive).

Patches of 7×7 size are extracted. The image is thresholded to segment the content from the white background and the patches that contain background more than 75% of their area are discarded. The features used are 657 features that are global to the patch (histogram, LBP, SIFT), and averaged features extracted from the cells, detected in each patch.

Original source

The original data that this dataset is based on can be found here: http://www.bioimage.ucsb.edu/research/biosegmentation. This dataset has been represented as a MIL problem by Dr. Melih Kandemir. When using the dataset, please cite the following paper:

title={Empowering Multiple Instance Histopathology Cancer Diagnosis by Cell Graphs},
author={Kandemir, Melih and Zhang, Chong and Hamprecht, Fred A},

See also the PDF and the code used in the paper.




One thought on “UCSB Breast

  1. Pingback: Resources about multiple instance learning? | Tips Thoughts Notes

Leave a Reply

Your email address will not be published. Required fields are marked *