Preliminary Evaluation Criteria - To be Finalised...ΒΆ

For the segmentation task which frequently includes many sites of metastatic spread in each training case we consider an adapted criteria from the AutoPET challenges. We the Dice Similarity Coefficient (DSC) of total tumour burden (TTB) as well as connected-component false-positive and false negative volume (https://github.com/lab-midas/autoPET). Given the task has specific focus on matching the standardised threshold workflow for measuring imaging biomarkers of PSMA SUVmean and FDG metabolic tumour volume which are both sensitive to the inclusion or omission of voxels near the boundary surface, we will record Surface Dice Coefficient (agreement of voxels constrained to the label perimeter). Lastly we score the percentage agreement to label-derived image biomarkers for SUVmean and tumour volume.

The leaderboard is determined by ranking for each of the 6 catagories (DSC, False-positive, False-negative, Surface Dice, PSMA SUVmean Agreement, FDG Metabolic Tumour Volume Agreement). As there is one PSMA and FDG scan per case the final leaderboard will be comprised of the 12 categories with rank determined as a composite of all categories and a tie being stratified by average (PSMA+FDG) DSC agreement.

We anticipate that many of the statistics will be co-correlated and overall semantic segmentation accuracy will necessarily yield strong performance in terms of both Dice measures, limited false positive/negative penalty, and accurate biomarker scores. Otherwise, special emphasis should be given for agreement to label surfaces as this has implications for the time required to apply manual corrections as well as biomarker measures (mean and volume) for detected lesions.