PrecisionFDA
CDRH Biothreat Challenge


Provide challenge data sets and reference standards for performance comparison of bioinformatics tools used in the biothreat and infectious disease NGS diagnostics community. The focus of this challenge is to enable tool developers to test their algorithms on blinded mock-clinical and in silico metagenomics samples using provided regulatory-grade reference genomes from the FDA-ARGOS database. This will enable the community to look at bioinformatics pipeline performance using a fixed reference genome data standard. The challenge will help familiarize precisionFDA users with the agency’s innovative FDA-ARGOS database resource (www.fda.gov/argos).


  • Starts
    2018-08-04 00:00:00 UTC
  • Ends
    2018-10-19 03:00:00 UTC

The precisionFDA CDRH Biothreat Challenge ran from August 3, 2018 to October 18, 2018. This challenge asked participants to benchmark their detection algorithms on a task to identify and quantify biothreat microorganisms in clinically relevant metagenomics next generation sequencing (NGS) samples. There were 29 valid entries from 11 participants to the challenge.

This precisionFDA CDRH Biothreat Challenge results page displays the summarized results in the tables below. As with previous challenges, due to novelties related to the truth data and the comparison methodology, these results offer a first glance at our understanding. We welcome the community to further explore these results and provide insight for the future.

Introductory Remarks

At the start of this challenge, participants were provided with 12 mock clinical and 9 in silico metagenomics sequencing samples. Each sample was a mixture between a certain percentage of background (mock matrix) short reads and target (microbe) short reads. Participants were asked to develop pipelines for detecting (subchallenge 1) and quantifying (subchallenge 2) the target short reads (microbial composition). The FDA-ARGOS, a regulatory-grade microbial pathogen reference genome database, was used as the fixed reference database for all target microbes. For more information, please visit the challenge introduction page.

Overview of Results

Challenge participants ran the mock clinical and in silico metagenomics sequencing samples through their pipeline(s) and returned two TSV files containing the identities and quantities of FDA-ARGOS genomes. Predictions were compared to known species identities and quantities.

Species Identification Subchallenge

For the species identification subchallenge, the area under the precision-recall curve (AUPRC) was computed by comparing the predicted normalized confidence scores for identified species to the known species. 9 AUPRC scores were computed for each submission, including an overall AUPRC and 8 sub-group AUPRC scores. The table below describes the 9 AUPRC scores:

Samples Description
All Overall AUPRC Score
In Silico Samples AUPRC score for in silico samples (C1-9) AUPRC score for in silico samples (C1-9)
Biological Samples AUPRC score for biological samples (C10-21) AUPRC score for biological samples (C10-21)
C1-3 AUPRC score for samples containing:
Burkholderia thailandensis
Burkholderia mallei
Escherichia coli
Propionibacterium acnes
C4-6 AUPRC score for samples containing:
Zika virus
Chikungunya virus
Ross River Valley Virus
C7-9 AUPRC score for samples containing:
Ebola Virus
C10-12 AUPRC score for samples containing:
Yersinia pestis
Yersinia pseudotuberculosis
Escherichia coli
C13-15 AUPRC score for samples containing:
Burkholderia thailandensis
C16-21 AUPRC score for samples containing:
Staphylococcus aureus and negative controls
NIST RM 8375 (MG-002 Staphylococcus aureus)

The table below shows the top 5 submissions for overall species identification, and the 8 sub-group analyses.

Rank Overall In Silico Biological C1-3 C4-6 C7-9 C10-12 C13-15 C16-21
1 CosmosID Team
CosmosID
MB_FL_Genome_Identification
CosmosID Team
CosmosID
MB_FL_Genome_Identification
Richa Agarwala
NCBI
submit.identification
Chung-Tsai Su
Atgenomix
m5_Genome_Identification
Chung-Tsai Su
Atgenomix
m4_Genome_Identification
Chung-Tsai Su
Atgenomix
m4_Genome_Identification
Chung-Tsai Su
Atgenomix
m7_Genome_Identification
Chung-Tsai Su
Atgenomix
m7_Genome_Identification
Richa Agarwala
NCBI
submit.identification
2 CosmosID Team
CosmosID
MB_S1_UF_Sp_2_Genome_Identification
Chung-Tsai Su
Atgenomix
m4_Genome_Identification
CosmosID Team
CosmosID
MB_FL_Genome_Identification
Chung-Tsai Su
Atgenomix
m7_Genome_Identification
CosmosID Team
CosmosID
MB_FL_Genome_Identification
CosmosID Team
CosmosID
MB_FL_Genome_Identification
Chung-Tsai Su
Atgenomix
m5_Genome_Identification
Chung-Tsai Su
Atgenomix
m5_Genome_Identification
CosmosID Team
CosmosID
MB_S1_UF_Sp_2_Genome_Identification
3 CosmosID Team
CosmosID
MB_S1_UF_Genome_Identification
CosmosID Team
CosmosID
MB_S1_UF_Genome_Identification
CosmosID Team
CosmosID
MB_S1_UF_Sp_2_Genome_Identification
Chung-Tsai Su
Atgenomix
m8_Genome_Identification
CosmosID Team
CosmosID
MB_S1_UF_Genome_Identification
CosmosID Team
CosmosID
MB_S1_UF_Genome_Identification
Richa Agarwala
NCBI
submit.identification
Richa Agarwala
NCBI
submit.identification
CosmosID Team
CosmosID
MB_FL_Genome_Identification
4 Nick Greenfield
One Codex
OCX_2_ARGOS_Reference_Genome_Identification
Chung-Tsai Su
Atgenomix
m3_Genome_Identification
CosmosID Team
CosmosID
MB_S1_UF_Genome_Identification
Chung-Tsai Su
Atgenomix
m10_Genome_Identification
CosmosID Team
CosmosID
MB_S1_UF_Sp_2_Genome_Identification
CosmosID Team
CosmosID
MB_S1_UF_Sp_2_Genome_Identification
CosmosID Team
CosmosID
MB_S1_UF_Genome_Identification
CosmosID Team
CosmosID
MB_S1_UF_Genome_Identification
Nick Greenfield
One Codex
OCX_2_ARGOS_Reference_Genome_Identification
5 CosmosID Team
CosmosID
MB_S1_UF_Sp_3_Genome_Identification
Chung-Tsai Su
Atgenomix
m7_Genome_Identification
Nick Greenfield
One Codex
OCX_2_ARGOS_Reference_Genome_Identification
Chung-Tsai Su
Atgenomix
m6_Genome_Identification
CosmosID Team
CosmosID
MB_S1_UF_Sp_3_Genome_Identification
CosmosID Team
CosmosID
MB_S1_UF_Sp_3_Genome_Identification
CosmosID Team
CosmosID
MB_S1_UF_Sp_2_Genome_Identification
CosmosID Team
CosmosID
MB_S1_UF_Sp_2_Genome_Identification
CosmosID Team
CosmosID
MB_S1_UF_Sp_3_Genome_Identification

Species Quantification Subchallenge

For the quantification subchallenge, the species quantifications were evaluated based on their agreement with the species composition of samples C1 to C9. Only the in silico samples were used to assess the species quantification result because the exact composition of each in silico sample is known. The Bray Curtis Dissimilarity Index (BCDI) was used to evaluate the agreement between the predicted and known species quantifications.

  • Bray Curtis Dissimilarity Index is used to quantify the difference in species quantities between two samples. For two samples, i and j, the Bray Curtis Dissimilarity Index BCDIij is defined as:
    BCDIij = 1 – 2Cij/(Si + Sj)
    Where Cij is defined as the sum of the lesser value of species shared by sample i and sample j. Si and Sj denote the sample size. When species abundance is expressed as proportions,
    BCDIij = 1 – 2Cij/2 = 1 - 2Cij

4 BCDI scores were computed for each submission, including an overall BCDI for all in silico samples and 3 sub-group BCDI scores. The table below describes the 4 BCDI scores:

Samples Description
All Overall BCDI Score
C1-3 BCDI score for samples containing:
Burkholderia thailandensis
Burkholderia mallei
Escherichia coli
Propionibacterium acnes
C4-6 BCDI score for samples containing:
Zika virus
Chikungunya virus
Ross River Valley Virus
C7-9 BCDI score for samples containing:
Ebola Virus

The table below shows the top 5 submissions for overall species quantification, and the 3 sub-group analyses.

Rank Overall C1-3 C4-6 C7-9
1 Chung-Tsai Su
Atgenomix
m3_Genome_Quantification
Chung-Tsai Su
Atgenomix
m3_Genome_Quantification
Chung-Tsai Su
Atgenomix
m4_Genome_Quantification
Nick Greenfield
One Codex
OCX_ARGOS_Reference_Genome_Quantification
2 Nick Greenfield
One Codex
OCX_ARGOS_Reference_Genome_Quantification
Nick Greenfield
One Codex
OCX_ARGOS_Reference_Genome_Quantification
Chung-Tsai Su
Atgenomix
m2_Genome_Quantification
CosmosID Team
CosmosID
MB_FL_Genome_Quantification
3 Jonathan Jacobs
QIAGEN
ARGOS_Reference_Genome_Quantification_vSCJJ1
Jonathan Jacobs
QIAGEN
ARGOS_Reference_Genome_Quantification_vSCJJ1
CosmosID Team
CosmosID
MB_UC_UF_Genome_Quantification
CosmosID Team
CosmosID
MB_S1_UF_Sp_2_Genome_Quantification
4 Chung-Tsai Su
Atgenomix
m1_Genome_Quantification
Chung-Tsai Su
Atgenomix
m1_Genome_Quantification
CosmosID Team
CosmosID
MB_FL_Genome_Quantification
CosmosID Team
CosmosID
MB_S1_UF_Sp_3_Genome_Quantification
5 Chung-Tsai Su
Atgenomix
m4_Genome_Quantification
Chung-Tsai Su
Atgenomix
m4_Genome_Quantification
CosmosID Team
CosmosID
MB_S1_UF_Sp_2_Genome_Quantification
CosmosID Team
CosmosID
MB_S1_UF_Genome_Quantification

Scientific Manuscript

The precisionFDA CDRH Biothreat Challenge team is preparing a scientific manuscript that describes that challenge and challenge results. Challenge participants will be included as a challenge participant consortium author.

Challenge Key

The full challenge key including sample descriptions and known species and quantities for each sample is available here.