BioCompute Object (BCO) App-a-thon
PrecisionFDA is partnering with George Washington University and FDA/CBER HIVE to launch a BioCompute Object (BCO) App-a-thon. Participants will be given the opportunity to enhance standards around reproducibility and documentation of biomedical high-throughput sequencing through BCO creation and conformance. Beginner and advanced tracks will be available for all participant levels.
2019-05-14 14:55:14 UTC
2019-10-18 14:55:26 UTC
PrecisionFDA partnered with George Washington University and FDA/CBER HIVE to launch the BioCompute Object (BCO) App-a-thon that ran from May 14, 2019 to October 18, 2019. This app-a-thon asked participants to create BCOs for the beginner track and/or create applications that support the creation, display, and conformance testing of a BCO to the specification for the advanced track. In total, there were 28 submissions to the beginner track and 3 submissions to the advanced track.
Results are displayed and discussed below. As with previous challenges, these results offer a first glance at our understanding. We welcome the community to further explore these results and provide additional insights for the future.
This app-a-thon consisted of two submission tracks, beginner and advanced. For the beginner track, participants submitted BioCompute Objects (BCOs) that were judged based on basic qualifications, including conformance to the current BCO specification, and adequate documentation. Applications submitted to the advanced track were judged based on basic functional requirements, including the ability to check BCO conformance and reproducibility. The app-a-thon was successful in gauging beginner aptitude with creating BCOs, and the results indicate that the specification is not too complex for bioinformatics novices to use. Some of the insights gleaned from the app-a-thon were incorporated into the 1.4 version of the specification, such as the explicit demarcation of required fields.
Overview of Results
Each submission was blindly distributed to one of four reviewers, who evaluated the BCO within three sections: basic qualifications, documentation, and bonus. A description of the criteria used for scoring each of these sections were described on the challenge page:
The BCO must:
- Include provenance, prerequisites, and scripts
- Conform with specification 1.3.0
- Be clear to the evaluator
- The pipeline input should be clear to the evaluator
- The pipeline steps should be clear to the evaluator
- Points will be awarded for the creative use of keywords, the Usability domain, the Extension domain, and the Error domain
Three top performers were selected based on total score. Table 1 names the top three performers, and shows their overall score as well as their scores within each section. Clicking on each of the top performer’s names will link directly to their BCO.
Table 1: Top Performers
|Name||Score Total (out of 40)||*Basic Qualifications Score (out of 25)||**Documentation Score (out of 5)||***Bonus Score (out of 10)|
*Basic qualification scores were based on a BCO conforming with specifications, including provenance, prerequisites and scripts, and documentation of a valid pipeline. **Documentation scores were based on the inclusion of a Usability Domain that described the purpose of the pipeline, links to relevant references and materials, and a Description Domain that included clearly named and documented pipeline steps. ***A Bonus score was awarded based on creative use of keywords, creative use of Usability, Extension, and Error Domains, and reproducibility of the BCO.
Table 2 provides descriptive statistics for all 28 submissions to the beginner track overall, and within each section. Each section differed in the number possible points which totaled to 40 possible points overall.
Table 2: Summary of Results
|Number of Submissions||28||28||28||28|
|Total Possible Points||25||5||10||40|
Figure 1 displays score trends for each section. Score was calculated by normalizing each score, which was accomplished by dividing the score of each submission by the total possible points from each section (including total points). The basic qualifications and documentation sections were combined into one section capturing a majority of the total possible points (30 points or 75% of all possible points). In Figure 1(A), each submission was assigned a random number to anonymize the submission for reporting results. The total (green line) is positively correlated with both the combined basic qualifications and documentation sections (blue line; r=0.933, p=5.18e-13) and bonus section (orange line; r=0.701, p=3.27e-5). The combined basic qualifications and documentation section does not have a strong positive correlation with the bonus section (r=0.4, p=0.04). Overall, the top performing submissions scored high in all three sections, but the bonus section set top performers apart from the rest of the submissions. B, C, and D show the distribution of scores from each section as well as the total. In general, submissions performed well in both basic qualifications and documentation sections (A). Most submissions did not perform well in the bonus section (B). Overall, the submissions with the lower total scores could indicate those who are most new to the BCO concept or standards conformance, in general (C).
Figure 1: Score Trends Across Each Section
Take-Aways and Lessons Learned
- Top performers went above and beyond the specifications. For example, an average submission linked to the paper(s) from which the pipeline was taken while top performers linked to the paper as well as other resources aimed to help a reader develop a deeper understanding of concepts relevant to the paper or pipeline.
- Basic qualification scores were generally high, indicating that most users did not struggle with the fundamental concepts of BCO specification.
- Few users made use of bonus points, suggesting that users prefer to focus on the core content. We suspect that expanded use of BCOs will come with more complex and nuanced pipelines, requiring more in depth documentation, and therefore an expanded use of additional elements, including an Error Domain (e.g. for explicitly demarcating the detection abilities of a pipeline or false positive rate), and the Extension Domain (e.g. for incorporating additional or user-defined schemas not already inherent to the base BCO).
The advanced track submissions were evaluated on five sections: basic qualifications, functionality, documentation, usability and aesthetics, and bonus. The basic qualifications were described on the challenge page where the applications should have the ability to:
- Create a BCO
- Check a BCO for conformance
- Display a BCO
The criteria for each of the other sections are an extension of these basic requirements.
A single top performer was selected for the advanced track based on the evaluation criteria:
- Seven Bridges Genomics team: Soner Koc, Nan Xiao, and Dennis Dean
- Seven Bridges Genomics BCO App (GitHub): https://github.com/sbg/bco-app
- Seven Bridges Genomics BCO App User Manual: https://sbg.github.io/bco-app
There was great deal of variability in the function and creativity of the three advanced track submissions. Submissions ranged from a web application to command line tools that produced static results.
The goal was to develop applications that lower the barrier to entry for those unfamiliar with standards compliance, JSON, or some other aspect of BCOs. The instructions and basic requirements provided to the applicants provided the desired final outcome but did not specify how those outcomes should be achieved.
The ability to harmonize correct function, a simple user experience, and the aesthetics of the tool interface (hinted at in the instructions, but not explicitly required) differentiated the tools. Each of the submitted tools were markedly different in how documentation was employed, execution environment utilized, and in the overall aesthetics. The variability in BCO tool interpretations is one of the major utilities for app-a-thons. Instructions for future app-a-thons may have a more refined set of “best practices” (or outright guidelines) regarding the construction of tools for submission, while still leaving room for creative interpretation.
Take-Aways and Lessons Learned
- Reviewers felt that documentation and reproducibility – or the ability to install and run the application – were the most important components of each of the submissions.
- All of the submissions provided great detail on important analysis pipelines, but we did not find specific parametric domain data for any of them. Description of parameters is among the most important criteria for an actual regulatory submission. However, because the Parametric Domain is only required for any parameter that is changed from the default and therefore not required broadly (in the event that every tool in a pipeline is used in a “canned” way without changing any parameters), it is currently not possible to know if these were omitted accidentally. Best practices guidelines have therefore been revised to suggest that a BCO generator indicate in the Usability Domain when a Parametric Domain is intentionally left blank.
- Each submitter is encouraged to continue to develop these applications and keep them available to the community.
BMC Genomics staff support the submission of a paper describing the results of the BCO App-a-thon and broadly applicable insights that emerge from it. Publication of the manuscript will be contingent on a standard evaluation process including editorial assessment and peer review.
Participants of this app-a-thon were awarded badges based on additional categories. For the categories Functional, Documentation, and Usability participants could receive a bronze, silver, or gold badges based on their score. For the all-or-nothing categories of "All the Basics" and "Extra Credit" participants would receive a platinum badge. A list of participant badges is shown below.