Task 1: Lesion Boundary Segmentation

Goal

Submit automated predictions of lesion segmentation boundaries within dermoscopic images.

Data

Input Data

The input data are dermoscopic lesion images in JPEG format.

All lesion images are named using the scheme ISIC_<image_id>.jpg, where <image_id> is a 7-digit unique identifier. EXIF tags in the images have been removed; any remaining EXIF tags should not be relied upon to provide accurate metadata.

The lesion images were acquired with a variety of dermatoscope types, from all anatomic sites (excluding mucosa and nails), from a historical sample of patients presented for skin cancer screening, from several different institutions. Every lesion image contains exactly one primary lesion; other fiducial markers, smaller secondary lesions, or other pigmented regions may be neglected.

The distribution of disease states represent a modified “real world” setting whereby there are more benign lesions than malignant lesions, but an over-representation of malignancies.

Response Data

The response data are binary mask images in PNG format, indicating the location of the primary skin lesion within each input lesion image.

Mask images are named using the scheme ISIC_<image_id>_segmentation.png, where <image_id> matches the corresponding lesion image for the mask.

Mask images must have the exact same dimensions as their corresponding lesion image. Mask images are encoded as single-channel (grayscale) 8-bit PNGs (to provide lossless compression), where each pixel is either:

  • 0: representing the background of the image, or areas outside the primary lesion
  • 255: representing the foreground of the image, or areas inside the primary lesion

As the primary skin lesion is a single contiguous region, mask images should also contain only a single contiguous foreground region, without any disconnected components or holes. The foreground region may be of any size (including the entire image) and may abut the borders of the image.

Ground Truth Provenance

Mask image ground truth (provided for training and used internally for scoring validation and test phases) data were generated using several techniques, but all data were reviewed and curated by practicing dermatologists with expertise in dermoscopy.

Ground truth segmentations were generated by either:

  • fully-automated algorithm, reviewed and accepted by a human expert
  • a semi-automated flood-fill algorithm, with parameters chosen by a human expert
  • manual polygon tracing by a human expert

Evaluation

Goal Metric

Predicted responses are scored using a threshold Jaccard index metric.

To compute this metric:

  • For each image, a pixel-wise comparison of each predicted segmentation with the corresponding ground truth segmentation is made using the Jaccard index.
  • The final score for each image is computed as a threshold of the Jaccard according to the following:
    • score = 0, if the Jaccard index is less than 0.65
    • score = the Jaccard index value, otherwise
  • The mean of all per-image scores is taken as the final metric value for the entire dataset

Rationale

The choice of threshold Jaccard index metric is based on a previously published analysis which demonstrated using the Jaccard directly as a measure of performance does not accurately reflect the number of images in which automated segmentation fails, or falls outside expert interobserver variability (i.e. the raw Jaccard is overly optimistic). The number of images in which automated segmentation fails is a direct measure of the amount of labor required to correct an algorithm.

In order to determine the threshold, the lowest Jaccard agreement between 3 independent expert annotators was measured on a subset of 100 images. This empirically measured value (~0.74) is the basis for the 0.65 value threshold (with additional error tolerance), which indicates segmentation failure on an image.

Other Metrics

Participants will be ranked and awards granted based only on the Threshold Jaccard index metric. However, for scientific completeness, predicted responses will also have the following metrics computed on a pixel-wise basis (comparing prediction vs. ground truth) for each image:

Submission Instructions

To participate in this task:

  1. Train
    1. Download the training input data and training ground truth response data.
    2. Develop an algorithm for generating lesion segmentations in general.
  2. Validate (optional)
    1. Download the validation input data.
    2. Run your algorithm on the validation Input data to produce validation predicted responses.
    3. Submit these validation predicted responses to receive an immediate score. This will provide feedback that your predicted responses have the correct data format and have reasonable performance. You may make unlimited submissions.
  3. Test
    1. Download the test input data.
    2. Run your algorithm on the test input data to produce test predicted responses.
    3. Submit these test predicted responses. You may submit a maximum of 3 separate approaches/algorithms to be evaluated independently. You may make unlimited submissions, but only the most recent submission for each approach will be used for official judging. Use the “brief description of your algorithm’s approach” field on the submission form to distinguish different approaches. Previously submitted approaches are available in the dropdown menu.
    4. Submit a manuscript describing your algorithm’s approach.