πŸ“Š Data DescriptionΒΆ

The dataset consists of whole slide images (WSIs) from seven organs: breast, bladder, cervix, colon, lung, prostate, and stomach. These WSIs were collected from five institutions across Korea, Turkey, India, Japan, and Germany. Each WSI is paired with a corresponding pathology report, which has been standardized according to the guidelines of the College of American Pathologists (CAP).

Each report contains information on the organ, procedure, histologic type, and histologic grade. Histologic subtypes were classified according to the 5th edition of the World Health Organization (WHO) Classification of Tumours.

In total, the dataset comprises 10,494 WSI-report pairs. It encompasses a broad spectrum of diagnostic categories across the seven organs, including representative malignant entities (e.g., adenocarcinoma, squamous cell carcinoma), premalignant lesions (e.g., tubular adenoma with low-grade dysplasia), benign conditions, and non-neoplastic tissues (see Table 1 for pan-Asian data)

Organ Representative Diagnostic Categories Number of Cases
Stomach Adenocarcinoma, Tubular adenoma with low/high grade dysplasia, Extranodal marginal zone B cell lymphoma of MALT type, Chronic (active) gastritis, Others 1651
Prostate Acinar adenocarcinoma, Normal, Others 1978
Lung Adenocarcinoma, Squamous cell carcinoma, Small cell carcinoma, Chronic granulomatous inflammation, No evidence of malignancy or granuloma, Others 998
Colon Adenocarcinoma, Tubular adenoma with low/high grade dysplasia, Hyperplastic polyp, Serrated lesion, Chronic nonspecific inflammation, Others 1051
Cervix Low-grade squamous intraepithelial lesion, High-grade squamous intraepithelial lesion, Invasive squamous cell carcinoma, Endocervical adenocarcinoma in situ (AIS), HPV-associated, Chronic nonspecific cervicitis, Others 695
Bladder Invasive urothelial carcinoma, Non-invasive papillary urothelial carcinoma, Urothelial carcinoma in situ, No tumor present 978
Breast Invasive ductal carcinoma, Invasive lobular carcinoma, Ductal carcinoma in situ, Fibroepithelial tumor, Papillary neoplasm, Others 2143

Table 1. Diagnostic Categories and Number of Cases by Organ in the REG2025 Dataset
Figure 1. Distribution of Cases by Organ in REG2025 Dataset

The dataset is divided into 8,494 samples for training, 1,000 for test 1, and 1,000 for test 2. The distribution of classes follows an equal class distribution across all subsets, ensuring balanced representation for model training and evaluation.

The dataset is provided in two formats:
- Image (WSI): in .tiff format
- Text (Label): in .json format Figure 2. Dataset example

The WSI data from each hospital was scanned using the following scanners:

  • Korea University Medical Center – Aperio & Generic TIFF
  • Kameda – Generic TIFF
  • Memorial Health Group – Aperio
  • All india institute of medical sciences (AIIMS) – Hamamatsu NanoZoomer
  • University Hospital Cologne – Aperio

Each dataset was anonymized and saved as TIFF files containing only information at 20x magnification.


πŸ”— Dataset DownloadΒΆ

You can download the data from the Reg2025-Traindataset section.


πŸ”— How to cite the datasetΒΆ

It will be released after the challenge ends.