π Data DescriptionΒΆ
The dataset consists of whole slide images (WSIs) from seven organs: breast, bladder, cervix, colon, lung, prostate, and stomach. These WSIs were collected from five institutions across Korea, Turkey, India, Japan, and Germany. Each WSI is paired with a corresponding pathology report, which has been standardized according to the guidelines of the College of American Pathologists (CAP).
Each report contains information on the organ, procedure, histologic type, and histologic grade. Histologic subtypes were classified according to the 5th edition of the World Health Organization (WHO) Classification of Tumours.
In total, the dataset comprises 10,494 WSI-report pairs. It encompasses a broad spectrum of diagnostic categories across the seven organs, including representative malignant entities (e.g., adenocarcinoma, squamous cell carcinoma), premalignant lesions (e.g., tubular adenoma with low-grade dysplasia), benign conditions, and non-neoplastic tissues (see Table 1 for pan-Asian data)
Organ | Representative Diagnostic Categories | Number of Cases |
---|---|---|
Stomach | Adenocarcinoma, Tubular adenoma with low/high grade dysplasia, Extranodal marginal zone B cell lymphoma of MALT type, Chronic (active) gastritis, Others | 1651 |
Prostate | Acinar adenocarcinoma, Normal, Others | 1978 |
Lung | Adenocarcinoma, Squamous cell carcinoma, Small cell carcinoma, Chronic granulomatous inflammation, No evidence of malignancy or granuloma, Others | 998 |
Colon | Adenocarcinoma, Tubular adenoma with low/high grade dysplasia, Hyperplastic polyp, Serrated lesion, Chronic nonspecific inflammation, Others | 1051 |
Cervix | Low-grade squamous intraepithelial lesion, High-grade squamous intraepithelial lesion, Invasive squamous cell carcinoma, Endocervical adenocarcinoma in situ (AIS), HPV-associated, Chronic nonspecific cervicitis, Others | 695 |
Bladder | Invasive urothelial carcinoma, Non-invasive papillary urothelial carcinoma, Urothelial carcinoma in situ, No tumor present | 978 |
Breast | Invasive ductal carcinoma, Invasive lobular carcinoma, Ductal carcinoma in situ, Fibroepithelial tumor, Papillary neoplasm, Others | 2143 |
Table 1. Diagnostic Categories and Number of Cases by Organ in the REG2025 Dataset
Figure 1. Distribution of Cases by Organ in REG2025 Dataset
The dataset is divided into 8,494 samples for training, 1,000 for test 1, and 1,000 for test 2. The distribution of classes follows an equal class distribution across all subsets, ensuring balanced representation for model training and evaluation.
The dataset is provided in two formats:
- Image (WSI): in .tiff
format
- Text (Label): in .json
format
Figure 2. Dataset example
The WSI data from each hospital was scanned using the following scanners:
- Korea University Medical Center β Aperio & Generic TIFF
- Kameda β Generic TIFF
- Memorial Health Group β Aperio
- All india institute of medical sciences (AIIMS) β Hamamatsu NanoZoomer
- University Hospital Cologne β Aperio
Each dataset was anonymized and saved as TIFF files containing only information at 20x
magnification.
π Dataset DownloadΒΆ
You can download the data from the Reg2025-Traindataset section.
π How to cite the datasetΒΆ
It will be released after the challenge ends.