BioCompute Object (BCO) App-a-thon
PrecisionFDA is partnering with George Washington University and FDA/CBER HIVE to launch a BioCompute Object (BCO) App-a-thon. Participants will be given the opportunity to enhance standards around reproducibility and documentation of biomedical high-throughput sequencing through BCO creation and conformance. Beginner and advanced tracks will be available for all participant levels.
2019-05-14 14:55:14 UTC
2019-10-18 14:55:26 UTC
Like scientific laboratory experiments, bioinformatics analysis results and interpretation are faced with reproducibility challenges due to the variability in multiple computational parameters, including input format, prerequisites, platform dependencies, and more. Even small changes in these computational parameters may have a large impact on the results and carry big implications for their scientific validity. Because there are currently no standardized schemas for reporting computational scientific workflows and parameters together with their results, the ways in which these workflows are communicated is highly variable, incomplete, and difficult or impossible to reproduce.
The US Food and Drug Administration (FDA) High Performance Virtual Environment (HIVE) group and George Washington University (GW) have partnered to establish a framework for community-based standards development and harmonization of high-throughput sequencing (HTS) computations and data formats based around a schema termed a BioCompute. A workflow description that adheres to the BioCompute specification, called a BioCompute Object (BCO), is a record of a bioinformatics pipeline containing all the necessary information to understand or repeat an entire pipeline, and includes additional metadata to identify provenance and usage. BCOs are of interest to a wide group of users including 1) researchers to reproduce computational results more accurately, both within and between labs, 2) clinical laboratories offering ‘omics tests for precision medicine, and 3) pharma/biotech companies submitting data for regulatory review. BCOs have unique identifiers, can be made congruent with other standards (e.g. CWL or GA4GH), and can be created in open source databases for publication or secured for private communication (e.g. HIPAA or FDA submission). BCOs can help build transparency into the process of HTS analysis and substantially improve reproducibility. They are built on open data and open standards, so anyone, anywhere is free to contribute. Greater usage will help strengthen standards, so a major goal of this App-a-thon is to encourage greater use of BCOs and to lower the barrier to entry.
A valid BCO includes the following information in a JSON file format:
- Information about parameters, version, steps, dependencies, and prerequisites of the executable programs in a pipeline
- Reference to input and output test data for verification of the pipeline
- A usability domain in plain language to describe the purpose
- A list of agents involved and other important metadata
- An error domain for setting the bounds of detection (such as false positive rate)
- A user-defined extension
This App-a-thon will consist of two submission tracks, beginner and advanced. Each submission must consist of at least one complete BCO, which will be judged, in part, on adherence to the specification listed below.
For those who wish to experiment with the user-defined structured domains (Error Domain and Extension Domain) please see instructions on the PrecisionFDA BCO App-a-thon GitHub: https://github.com/biocompute-objects/PrecisionFDA_App-a-thon.
Entries for the advanced track must also develop an application that supports the creation, display, and conformance testing of a BCO to the BioCompute standard. A README explaining the application’s function and implementation must also be included.
Beginner track details
App-a-thon participants will create a BCO that documents a bioinformatics pipeline which must contain more than one step. Participants may submit up to three entries. Submitted BCOs must conform to the BCO specifications (error domain and extension domain are optional), and participants must be able to show it publicly. Evaluation of each submission will be done on the BCO and not the pipeline, but all pipeline components must be clear to the reviewer. Additional evaluation criteria will be the ability of the reviewer to understand the BCO, and inclusion of provenance and prerequisites.
For those less familiar with bioinformatics analyses, Galaxy (https://usegalaxy.org/) is a publicly available and free site that can be used to build a pipeline without downloading software. If participants require data to run through their pipeline, one possible source for data is the 1000 Genomes Project, which provides genomic data for human samples (http://www.internationalgenome.org/home). The App-a-thon is not limited to genomics-related pipelines. BCOs for any computational and ‘omics pipelines are welcomed.
Beginner track submission details
A valid submission to the beginner track requires:
- BCO that documents a multi-step bioinformatics pipeline in JSON file format
- BCO conforming to the current BioCompute specifications
- BCO that can be shown publicly (does not contain proprietary or sensitive information)
- OPTIONAL: If the submission uses a novel framework for the user-defined structured domains (Error Domain and Extension Domain), please see instructions on the PrecisionFDA BCO App-a-thon GitHub: https://github.com/biocompute-objects/PrecisionFDA_App-a-thon
Before creating a new schema extension, participants are advised to check the existing schema in the repository. The entrants are encouraged to use or modify existing schema that have been submitted. To do this, they can link to the existing extension or fork it to include any further modifications required for their pipeline.
Beginner track evaluation details
Submissions to the beginner track will be evaluated by a panel of BCO experts. Experts will utilize the following criteria to evaluate BCO submissions:
Ability of an FDA reviewer to understand the BCO:
- Pipeline input should be clear (it does not have to be downloadable for evaluation)
- BCO should be clear
- Pipeline steps should be clear
The BCO must:
- Include provenance, prerequisites, scripts
- Conform with specifications (1.3.0)
Points will be awarded for the creative use of keywords, the Usability domain, the Extension domain, and the Error domain.
The spirit of the BCO Challenge is to communicate workflows in a way that is easy to understand. In keeping with this spirit, the more complex a workflow, and the easier it is to understand, the more value it will be assigned by judges. “Creative use” of domains is also in keeping with this spirit: the way in which all these domains work together coherently and concisely to tell the story of the experiment. An example of one way in which the Usability Domain might be used – in this case to tie together external references – can be found in the User Guide: https://github.com/biocompute-objects/BCO_Specification/blob/master/usability-domain.md. Well-annotated BCOs that allow a user to understand and tweak various parameters for similar but not identical purposes will be given additional points.
Advanced track details
Submissions for this track will be an application uploaded to the precisionFDA site (or code for an application) supporting the creation and display of a BCO, and the ability to check conformance against the BioCompute specification, as well as a user manual. It is acceptable to build from existing tools if proper attribution is given. If a submission is fully integrated into an existing platform, submitters should provide a way for the evaluation team to access the application (e.g. a URL to a page where the application can be tested). Advanced track participants will receive a pre-curated BCO and a randomly sourced BCO submission from the beginner track as input. Reviewers will evaluate the tool based on ease of use, aesthetic appeal, ability of the tool to correctly spot errors in specification compliance, and the quality of the user manual (README).
Advanced track submission details
A valid submission to the advanced track requires:
A BCO that meets all the criteria of the beginner track
A tool with the ability to do the following:
- Create a BCO
- Check a BCO for conformance
- Display a BCO
A detailed README accompanying the tool explaining its intended use, installation, and any other pertinent info (such as error or help messages, dependencies, etc.)
The application can be directly uploaded on the precisionFDA website using the following resources:
- https://youtu.be/f-DBLB2v1sM (how to create an application)
- https://youtu.be/4-voYR-q-cw (Application import)
- https://precision.fda.gov/docs/creating_apps (general rules and tips)
Advanced track evaluation details
Experts will utilize the following criteria to evaluate coding application submissions:
- Ease of use
- Aesthetic appeal
- Can handle user input and make a BCO
- Can display a BCO (points awarded for creativity of the display)
- Can check conformance of a BCO for a both pre-curated BCO and a randomly sourced BCO submission from the beginner track (points awarded for accurately identifying deviations from the specification with no errors)
- If integrated in a platform, is able to reproduce native pipelines
- A User manual/README sufficient to execute the program
For submissions that are integrated into a platform (e.g., Galaxy), a large amount of information can be automatically captured from a workflow created on that platform without any manual input from the user. It is up to the participant to determine how much information is automated on their own platform. However, some fields will require manual input (e.g., the Usability domain containing the purpose of the experiment). Points will be awarded for additional functionality (e.g., comparing a specific output file to the Error Domain).
The George Washington University HIVE Lab: https://hive.biochemistry.gwu.edu/home
The FDA HIVE GitHub: https://github.com/FDA/fda-hive
The BioCompute Objects website: http://www.biocomputeobject.org
The BioCompute Objects portal: https://hive.biochemistry.gwu.edu/biocompute
How to build and run a pipeline (presentation slides): https://hive.biochemistry.gwu.edu/prd/htscsrs/content/slideDecks/13_Addepalli_Durga_Session2A.pdf
Frequently Asked Questions
My lab has a lot of people. Do we all have to submit separate BCOs? or may we submit as a team?
Yes, this commonly done in challenges/app-a-thons. The BCO can be submitted under one precisionFDA username.
Do I need to submit anything other than the BCO for the beginner track (e.g., data and code for you to run the pipeline)?
Beginner track submissions require only the BCO. Data and code for your pipeline is not necessary, however your pipeline steps must be clear to the reviewer.
Does my pipeline need to be a Next Generation Sequencing Pipeline?
All 'omics encompassing pipelines are welcomed!
Can I just send you a GitHub link or do you want zipped code?
Code just needs to be packaged in an asset (along with any dependencies), and a precisionFDA application created that calls their code. If a submission is fully integrated into an existing platform, submitters should provide a way for the evaluation team to access the application (e.g. a URL to a page where the application can be tested).
What kind of documentation needs to be included for apps?
A User manual (README)
Can I submit to both tracks?
Yes, but your application will be based on a random pre-curated BCO submission from the beginner’s track and not your own.
What software licenses are allowed?
Any open source license that allows us to publicly display the tool is acceptable.
- PrecisionFDA: Elaine Johanson, Ruth Bandler
- FDA CBER: Elaine Thompson
- FDA CDRH: Zivana Tezak
- George Washington University: Jonathon Keeney, Hadley King
- DNAnexus: John Didion
- Booz Allen: Zeke Maier, Holly Stephens, Sarah Prezek, Marissa Wiener, Sean Watford