PrecisionFDA
NCICPTAC Multiomics Enabled Sample Mislabeling Correction Challenge  Subchallenge 2
Sample mislabeling (accidental swapping of patient samples) or data mislabeling (accidental swapping of patient omics data) is known to be one of the obstacles in basic and translational research because this accidental swapping contributes to irreproducible results and invalid conclusions. The objective of this challenge is to encourage development and evaluation of computational algorithms that can accurately detect and correct mislabeled samples using rich multiomics datasets.

Starts
20181106 00:00:00 UTC

Ends
20181219 07:59:59 UTC
The precisionFDA NCICPTAC Multiomics Enabled Sample Mislabeling Correction Challenge – Subchallenge 2 ran from November 5, 2018 to December 18, 2018. This subchallenge asked participants to develop computational models for identifying and correcting samples that have mismatched clinical, protein profiling, and mRNA profiling data. There were 82 valid entries from 30 participants to the challenge.
This NCICPTAC Multiomics Enabled Sample Mislabeling Correction Challenge – Subchallenge 2 results page displays the summarized results in the tables below. As with previous challenges, due to novelties related to the truth data and the comparison methodology, these results offer a first glance at our understanding. We welcome the community to further explore these results and provide insight for the future.
Introductory Remarks
At the start of this challenge, participants were provided with paired clinical, proteomics and mRNA profiling data for each of the 160 tumor samples. The 160 tumor samples contained labelling errors and were divided into training and test sets. Participants were asked to develop computational algorithms to model the relationship between clinical attributes, protein profiles, and mRNA profiles using the training data set, then apply the model to identify and correct mislabeled samples in the test data set that have one data type among the three mislabeled. Sample mislabeling patterns and rates were introduced based on observations in the TCGA data sets.
Overview of Results
For each sample in the test data set, participants submitted a prediction indicating whether there is a mismatch between clinical, proteomics profiling, and mRNA profiling data. Predictions were compared to known mislabeled samples. For this subchallenge, precision, recall, and Fscore were computed by comparing the binary mismatch predictions, to the known mismatched samples. These three evaluation metrics are defined in the table below.
Metric  Definition 

Precision  True Positives / (True Positives + False Positive) 
Recall  True Positives / (True Positives + False Negative) 
Fscore  Harmonic mean of Precision and Recall 
For subchallenge 2, we evaluated the model performance at three different levels:
 Sample level – measures model performance at the sample level. If any of the predicted labels of the three data types do not match the original sample label, it is considered an incorrect label at the sample level.
 Sampledata level  measures model performance at the level of each individual data type of each sample. A prediction that correctly identifies a mislabeled data type of a sample is considered a true positive at this level, even if the mislabeling is not corrected.
 Correction level – measures model performance at correcting sample mislabeling. At this level, only when a corrected label matches the true sample label will it be considered a true positive.
The final ranking was computed by averaging the Fscores at the three levels.
Finally, to determine significant performance differences between submissions, a bootstrapping approach was used to compute the confidence interval of the Fscore of each submission. Rankings were generated based on: (1) method performance, by treating each submission as unique, and (2) submitter performance, by taking the median Fscore of each participant’s submissions.
Method Performance Results
The table below shows the top 3 highest performing submissions/methods based on Fscores.
Name  Submission  Precision  Recall  Fscore  Mean(Fscore)  SD(Fscore)  95%_CI_lower  95%_CI_upper 

Anders Carlsson  thep_bionamic_sub2_B  1  1  1  1  0  1  1 
Renke Pan  subchallenge_2_modelA_1  1  1  1  1  0  1  1 
Soon Jye Kho  subchallenge_2_sub1  1  1  1  1  0  1  1 
The anonymized complete results table can be downloaded here. To protect the identity of participants, each participant’s performance will be emailed.
Submitter Performance Results
The table below shows the top 3 highest performing participants based on the median Fscores of their submissions.
Name  Fscore  Mean(Fscore)  SD(Fscore)  95%_CI_lower  95%_CI_upper 

Anders Carlsson  0.967  0.965  0.009  0.963  0.967 
Soon Jye Kho  0.967  0.965  0.009  0.963  0.967 
Renke Pan  0.933  0.924  0.015  0.921  0.927 
The performance of all submitters is shown in the figure below:
Figure: Median Fscores and the corresponding 95% confidence intervals. The confidence intervals were derived through a bootstrap procedure with 100 iterations.
The anonymized complete results table can be downloaded here. To protect the identity of participants, each participant’s performance will be emailed.
Scientific Manuscript
The NCICPTAC Multiomics Enabled Sample Mislabeling Correction Challenge team plans to prepare a scientific manuscript that describes that challenge and challenge results. All challenge participants that submit a onepage description of their methods will be included as a challenge participant consortium author. In addition, the challenge team will select some participants, based on performance and/or unique methodology, to participate in the manuscript development.
Challenge Key
The subchallenge 2 key, including mislabeling information for the test samples, is available here.