Brain Cancer Predictive Modeling and Biomarker Discovery Challenge
An estimated 86,970 new cases of primary brain and other central nervous system tumors are expected to be diagnosed in the US in 2019. Brain tumors comprise a particularly deadly subset of all cancers due to limited treatment options and the high cost of care. Clinical investigators at Georgetown University are seeking to advance precision medicine techniques for the prognosis and treatment of brain tumors through the identification of novel multi-omics biomarkers. In support of this goal, precisionFDA and the Georgetown Lombardi Comprehensive Cancer Center and The Innovation Center for Biomedical Informatics at Georgetown University Medical Center (Georgetown-ICBI) are launching the Brain Cancer Predictive Modeling and Biomarker Discovery Challenge! This challenge will ask participants to develop machine learning and/or artificial intelligence models to identify biomarkers and predict patient outcomes using gene expression, DNA copy number, and clinical data.
2019-11-01 16:00:00 UTC
2020-02-15 04:59:59 UTC
PrecisionFDA partnered with Georgetown University Lombardi Comprehensive Cancer Center and The Innovation Center for Biomedical Informatics (ICBI) to launch the Brain Cancer Predictive Modeling and Biomarker Discovery Challenge that ran from November 1, 2019 to February 14, 2020. This challenge asked participating teams to develop machine learning and/or artificial intelligence models to identify biomarkers to predict patient outcomes using DNA copy number, gene expression, and clinical data. In total, there were 30 submissions from 28 teams submitted for Phase 1, and 22 submissions from 22 teams for Phase 2 encompassing a wide range of models.
Results are displayed and discussed below. As with previous challenges, these results offer a first glance at our understanding. We welcome the community to further explore these results and provide additional insights for the future.
In this challenge, participating teams were asked to take part in three sub-challenges, which would be scored individually and combined for an overall challenge score. They were provided DNA copy number data, gene expression profiles, clinical phenotypes, and outcomes for a cohort of patients.
The data for the challenge was released in two phases – Phase 1 and Phase 2. The Phase 1 data was the provided data that would be used to develop the models. Phase 2 data was the test data that would be used to score model performance. The truth are the actual outcomes (survival status) for the patients in the Phase 2 data that were withheld from participants. These outcomes would be compared with the predicted values in Phase 2 for each sub-challenge submission. The models would be evaluated using metrics such as sensitivity, specificity, and accuracy.
In summary, Phase 1 and 2 included the following sub-challenges:
- Sub-challenge 1 – Gene expression
- Sub-challenge 2 – DNA copy number
- Sub-challenge 3 – Gene expression and DNA copy number combined
The goal of this challenge was to develop models that are focused on integration of molecular data of two types including gene expression, copy number, and clinical phenotype data to better predict clinical outcome.
Overview of Results
The challenge went live on Nov 1, 2019, and Phase 1 ran until February 5, 2020. The Phase 2 data was then released on February 7, 2020 and the challenge submission period closed February 14, 2020. The truth are the actual outcomes (survival status) for the patients in the Phase 2 data. These outcomes would be compared with the predicted values in Phase 2 for each sub-challenge submission. The models would be evaluated using metrics such as sensitivity, specificity, and accuracy.
We received a total of 30 submissions from participating teams for Phase 1. These included 30 submissions from 28 participating teams since some teams submitted more than once. We received a total of 22 submissions from participating teams for Phase 2 that included a wide range of machine learning models. Table 1 shows a summary of the different types of models used in Sub-challenge 3 (SC3) by the participating teams.
Table 1: Summary of the different types of models submitted for Sub-challenge 3
|Name of Model||Number of Submissions|
|Gradient Boosting Framework||6|
|Support Vector Machine||3|
|Generalized Linear Model (GLM)||2|
|K-Nearest Neighbors Algorithm||1|
Our goal was to rank participant teams whose models provided a short list of the most informative features for brain cancer. We developed an evaluation algorithm that would automatically rank the Phase 2 submissions based on three metrics – accuracy, sensitivity and specificity. In each sub-challenge, each metric was ranked individually. For instance, the participating team with the highest accuracy in Sub-challenge 1 was given a rank of 1 for the accuracy metric and so on. The sum of the ranks were taken to obtain a final score in Sub-challenge 1. Similarly, the sum of the ranks for Sub-challenges 2 and 3 were calculated. The best possible score for a sub-challenge is 3, which means the submission was the top rank (i.e., 1) for accuracy, sensitivity, and specificity. The worst possible score for a sub-challenge is 66, which means the submission was the lowest rank (i.e., 22) for accuracy, sensitivity, and specificity.
The overall score was obtained using this formula: Overall score = SC1 score + SC2 score + (2 × SC3 score). Sub-challenge 3 (SC3) was given twice the importance as Sub-challenge 1 (SC1) and Sub-challenge 2 (SC2) since it contained multiple data types including gene expression, copy number, and clinical phenotype data, which made the model building and prediction more complex. The best possible overall score is 12, and the worst possible overall score is 264.
The top performers selected by the automated algorithm are summarized in Table 2.
Table 2. Summary of the top performers
|Rank||Team||Username||Sub-challenge 1 Rank||Sub-challenge 2 Rank||Sub-challenge 3 Rank||Overall Score|
|2||Seven Bridges Genomics**||nan.xiao||49||18||12||91|
|Best Possible Score||3||3||3||12|
|Worst Possible Score||66||66||66||264|
*Hanying Feng, Hong Chen, Luoqi Chen, and Jun Ye **Nan Xiao, Soner Koc, and Kaushik Ghose ***Konstantinos Parachakis, and team lead Ioannis Tsamardinos ****Canqiang Xu, Wenxian Yang, Frank Zheng, and Rongshan Yu
We also explored trends for each sub-challenge. Figure 1 shows a stacked line graph of the scores in each sub-challenge, and the overall score. The submissions are ordered on the overall scores, with the highest rank (top performer) on the left side.
Figure 2 shows the distribution of overall scores. Since the overall score is essentially derived from a rank, the best score is the one with the lowest value. To better understand this overall score, and for ease of understanding, we calculated normalized overall score using the following formula:
Figure 3 shows the normalized overall score across all the submissions. The submissions are ordered based on the rank from left to right.
The top performer of the challenge was the participating team with the highest rank i.e., the lowest overall score Sentieon*. This participating team had a total of 46 features selected in this model from SC3 including 40 genes, 4 cytobands and 2 clinical attributes.
Take-Aways and Lessons Learned
- In SC1 and SC3, the participating teams were challenged with a large p, small n issue i.e., large number of features (approximately 20,000), and a small number of data points. The teams had to find solutions to not only find a way to select the most important features, but also deal with a potential overfitting challenge as a result of this imbalance. Another challenge the teams faced in all three sub-challenges was that the survival outcome status was imbalanced. For instance, in SC3, 76% of the patients were deceased. Many participating teams tried multi-layered models to deal with these challenges, wherein they would use a set of models for feature selection, another set of models for model building, and choose the best performing model as the final model.
- SC2 was more challenging than SC1 because DNA copy number was more complex to model compared to gene expression data
- SC3 was more challenging than SC2 and SC1 since there are multiple data types including gene expression, DNA copy number, and clinical phenotype data to model in conjunction with clinical outcome
- 22 models were developed and applied to REMBRANDT Brain Cancer collection
- Several Models reached high accuracy of predictions of clinical outcome based on integration of molecular profiling data and clinical attributes
- Machine learning algorithms generated short lists of most informative features (genes, cytobands, and clinical attributes) that contain promising candidate biomarkers
The challenge team is planning to author a manuscript, along with the top performers of the challenge. It would provide an overview of the challenge data and design, and a summary of the submissions from various participating teams. Publication of the manuscript will be contingent on a standard evaluation process including editorial assessment and peer review. The data along with the submitted algorithms, including the winning algorithm, would be shared with the research community.
The top 3 performers will be invited to present a podium presentation and poster at the 9th Annual Health Informatics and Data Science Symposium to be held in October at Georgetown University. This collaborative effort with the FDA could enable further discoveries and new hypotheses in translational research through science and open data and algorithm approaches.
Extra Credit Badges
In addition to the evaluation algorithm, badges were awarded to the top 5 performing teams based on overall score based on the several criteria outlined below.
- Model robustness. This badge was awarded for participating teams that had specificity more than 0.5 and/or for using advanced modeling techniques in SC3
- Extra credit based on short listed features, for potential use in biomarker research. This extra credit was awarded for using a solution that selected a short list of features out of more than 20,000 features. This short list of features has the potential to be used in biomarker discovery-based applications.
- Extra Credit for utilizing domain knowledge i.e., clinical and/or biological/phenotypic information. This badge was awarded to those teams that did specialized curation or feature selection based on domain knowledge. We know that using machine learning models in conjunction with domain knowledge lends itself to superior models in bioinformatics.
- Overall documentation, usability and overall presentation of results.
Table 3. Badges awarded to the top 5 performing teams