PrecisionFDA
COVID-19 Precision Immunology App-a-thon


Participants will develop applications and pipelines to illuminate the relationship between personalized adaptive immunity molecular data and COVID-19 disease variables.


  • Starts
    2020-11-30 16:15:27 UTC
  • Ends
    2021-01-29 20:23:45 UTC

14 days remaining


The Food and Drug Administration (FDA) calls on the scientific and analytics community to develop innovative applications to explore the relationship between personalized immune repertoires and COVID-19 disease variables and associated factors.

Challenge Time Period

November 30, 2020 – January 29, 2021

AT A GLANCE

Motivation: The novel coronavirus disease 2019 (COVID-19), a respiratory disease caused by a new type of coronavirus, known as “severe acute respiratory syndrome coronavirus 2” or SARS-CoV-2, was declared a global pandemic by the World Health Organization on March 11, 2020. To date, the Johns Hopkins University COVID-19 dashboard reports over 62 million confirmed cases worldwide, with a wide range of disease severity from asymptomatic to  deaths (over 1.46 million). To effectively combat the widespread transmission of COVID-19 infection and save lives especially of those vulnerable individuals, it is imperative to better understand its pathophysiology to enable effective diagnosis, prognosis and treatment strategies using rapidly shared data.

Immunology: There are two types of immune responses: innate and adaptive.  This app-a-thon focuses on adaptive immunity.

  • The innate response is the body’s first reaction (non-specific) to a foreign invader (e.g., a laceration), consisting of physical barriers like skin and mucus membranes and immune cells such as neutrophils, macrophages, and monocytes that migrate to the injury site to kill the invader.
  • The adaptive response is highly specific involving a response upon recognition of an antigen, a molecule presented on the surface of a foreign invader from an infection (not self). Major components of the adaptive response include T cells, B cells, immunoglobulins (antibodies), and the Human Leukocyte Antigen (HLA). Major histocompatibility complex (MHC), group of genes that code for proteins found on the surfaces of cells that help the immune system recognize foreign invaders is also called HLA system in humans.
  • T cell receptors (TCRs) are protein complexes on the surface of T cells responsible for recognizing antigens bound to MHC molecules. Similarly, B cell receptors (BCRs) are protein complexes on the surface of B cells. Upon interaction between BCRs and their cognate antigens to activate B cells, B cells proliferate and differentiate to generate a population of antibody-secreting plasma B cells and memory B cells. Thus, understanding an individual’s diverse TCR and BCR repertoires, as well as HLA type (antigen presenting) and immunoglobin H (IgH antibody) states, is important in accelerating COVID-19 research.
  • In this regard, current literature shows specific TCR/BCR signatures and HLA types that correlate with differential COVID-19 outcomes (1-4). A list of immunology term definition is provided here for reference. In addition, a brief description of adaptive immune response is included in the YouTube video (see Data).

Current State of Science: The recent advent of high-throughput sequencing of lymphocyte antigen receptor genes (responsible for antigen recognition by T and B cells) has enabled the exploration of adaptive immune responses to infection. With these opportunities come significant challenges in leveraging the analysis techniques that accurately reflect underlying biology and identify correlations between disease characteristics and personalized adaptive immunity. In particular, BCR analysis harbors additional complexities, such as antibody gene somatic hypermutation (high frequency of DNA alterations) and class switch recombination (e.g., antibody production switching from type M to A) during B cell maturation. The current COVID-19 worldwide pandemic prompted rapid data sharing in the scientific community (e.g., iReceptor with growing publications and datasets), providing unprecedented opportunities for developing innovative research tools and methodologies urgently needed to accelerate our understanding of the disease and aid in our fight against this virus.

App-a-thon Structure: There are two phases, with Phase I focusing on extracting knowledge from molecular sequencing data from TCRs/BCRs and associated clinical variables. Specifically, research tools and workflows developed are anticipated to facilitate our understanding of the relationship between personalized immune repertoires and COVID-19 disease characteristics such as severity and associated factors. Phase I is comprising of two tracks designed to attract participants from the general data science community (DS) and the computational immunology community (CI). Participants are encouraged to participate in either or both track(s).

For DS participants (participants not required to possess immunology expertise):

  • Participants are to generate tools and pipelines capable of elucidating meaningful associations between sequence annotation data (e.g., TCR and BCR repertoire annotation of v call or j call ) and COVID-19 disease variables, features, and associated factors, e.g., disease severity, prior medical history, current therapies.
  • Additional bonus points will be awarded if you make algorithms/apps to analyze additional datasets (e.g., HLA Class information from Microsoft/Adaptive Biotechnologies when available, or using additional immune repertoire data) to gain more biological insights (optional).
  • Participants are expected to build data exploration tools that enable further statistical exploration of features. A typical exploration tool would allow immunologists with limited coding expertise to perform independent analyses.
  • Participants are expected to provide more than basic summary statistics, i.e., beyond what is already provided by iReceptor Gateway (primarily counts). The tools should compare study information and any other general cross-study and intra-study statistical analysis for feature engineering.
  • A tool can be a command line tool, Jupyter notebook, or full stack application. An ideal tool will be modular and lower the barrier for coding novices.
  • The tools can be submitted as a Docker file or image or compressed source code repository (see Submitting and Running Your Apps on precisionFDA section for details) and must be able to be reproduced according to the provided documentation. If source code is available in a public repository, we ask the participants to provide the link.

For CI participants:

  • In addition to our expectations of the DS track, you are asked to perform additional tasks that include but are not limited to:
  1. Creating pipelines to harmonize data generated from different TCR and BCR alignments and assembling pipelines (e.g., by considering library generation techniques, PCR primers);
  2. Creating pipelines to harmonize data generated from a variety of existing HLA prediction tools such as commercial software;
  3. Enabling analysis of a patient’s longitudinal data to investigate the relationship between TCR/BCR expression persistence with sustained immunity (optional).
  • Similarly, expected outputs are:
    • A tool can be a command line tool, Jupyter notebook, or full stack application. An ideal tool will be modular and lower the barrier for coding novices.
    • The tools can be submitted as a Docker file or image or compressed source code repository (see Submitting and Running Your Apps on precisionFDA section for details) and must be able to be reproduced according to the provided documentation. If source code is available in public repository, we ask participants to provide the link.

Time Periods: Phase I will launch on Nov. 30 and run through Jan. 29, 2021. Evaluation of submissions will be performed according to consensus criteria and scoring rubric established by expert judges with a diverse scientific background for both tracks.

In Phase II of this app-a-thon, we will ask the participants (prior participants or those new to the app-a-thon) to create research tools and apps to report the outputs from Phase I to be incorporated into an easily interpretable format for healthcare professionals in order to lower the barrier of complex computational analyses for physicians. Please stay tuned for more information.

A schematic for the app-a-thon is illustrated below:

CHALLENGE DETAILS

Getting on precisionFDA

If you do not yet have a contributor account on precisionFDA, please file an access request with your complete information and indicate your intent to participate in the app-a-thon. The FDA acts as a steward by providing the precisionFDA service to the community and ensuring proper use of the resources, so your request will be initially pending. Once a request is approved (typically takes 1-2 business days), you will receive another email with your contributor account information.

With your contributor account, you can use the features required to participate in the app-a-thon (such as transfer files or run comparisons). All work performed on precisionFDA is private (not accessible to the FDA or the rest of the community) until you choose to publicize it. Once published, your work will be available for review by the FDA and precisionFDA community.

Locating and Understanding the Data Files

The datasets provided for both tracks were obtained and aggregated from the iRECEPTOR platform (5), and linked in the data table as flat files (see Data Table). For CI participants, you may identify additional datasets to help you refine your apps/models, explore datasets in European Nucleotide Archive/Sequence Read Archive for improving and harmonizing HLA typing, as well as identifying longitudinal datasets for evidence of TCR/BCR expression persistence in patients as bonus points. You are also encouraged to share your unique datasets with other participants that would award you additional bonus points. Furthermore, a video tutorial is provided to ensure clear understanding of our expectations of the participants, datasets, and their key characteristics, etc.

Submitting to the App-A-Thon

The submission processes are the same for both DS and CI tracks. The correct naming nomenclature for submissions is as follows: “track_team name_submission_submission number.” For example, the submission name of “DS_precisionFDA_team_submission_1” indicates that this is the 1st submission from a “precisionFDA_team” who participated in the DS track. Submissions that are non-compliant with this nomenclature will not be evaluated. When submitting to the app-a-thon, each submission must include two inputs:

1.  A README containing a description of the application. The README must include:

  • Team or participant name for the submission. A team or participant name is used in reporting the results of the app-a-thon. The precisionFDA team only has access to the username of the account associated with the submission. Since many submissions are a team effort, this section of the README can be used to give credit to all individual team members.
  • A description of the submission, which includes detailed information on how to install and run any part of the tool. If not explicitly provided in the README, the location of the instructions on how to install and run the tool must be referenced within the documentation. Note that part of evaluation of the tool is ease of use. If the tool cannot be installed and/or run from provided documentation, the tool will not be evaluated. More information on what to provide in this section of the README is provided in the “Evaluation Criteria” below.

2.  A single input for the tool and all associated files. The data files should not be a part of a submission, however, the Extract Translation Load (ETL) from the raw files provided for the app-a-thon must be reproducible according to documentation. In the event you identify and utilize additional datasets, you must share links to these datasets publicly by posting them on the Precision Immunology App-a-thon Discussion forum.

(Optional) Creating your App on precisionFDA

A precisionFDA app is defined at: https://precision.fda.gov/docs/apps. If you intend to create an app on precisionFDA, please review the instructional video here. There are other acceptable ways to submit your tools (see Submitting and Running your App on precisionFDA below).

(Optional) Submitting and Running your App on precisionFDA

See Submitting to the app-a-thon above: the submission page should have at least a README and generic inputs for compressed source code/docker image/Jupyter notebook/compressed project directory, etc. The submission apps should prepend the submission names to all input files. Participants should NOT submit apps to precisionFDA using pre-compiled binaries and array inputs. To run an app on precisionFDA, see this video tutorial.

Evaluation Criteria

DS track

Submissions to this track will be scored based on two overall categories including basic qualifications, pipelines (impact and innovation), with additional bonus points that are optional.

Basic qualifications must include:

  • A README.md is provided that:
    • References the data used by the tool. It’s not a requirement to have the data available in the submission; and
    • Provides a basic description of the tool; and
    • Properly explains the intended use, installation, and any other pertinent information that would enable the tool to be reproduced by others.
  • The tool provides capabilities beyond counts and summary statistics similar to that already provided by the iReceptor Gateway.

The tools will also be scored based on their impact and innovation.

For a tool to be impactful, it can be any or a combination of the following:

  • Be capable of analyzing a variety of large and future datasets that contribute to a better understanding of COVID-19 pathophysiology;
  • Enable a non-coding immunologist to understand and analyze the datasets;
  • Be modular with generic modules that would fit any dataset, as well as specific modules relevant to the immunology dataset;
  • Demonstrate efficient data management, i.e., the documentation must address how the tool would handle new datasets coming from iReceptor Gateway or another resource.

For a tool to be innovative, it can be any or a combination of the following:

  • Be a user-friendly analysis method that lowers the entry barrier for immunologists;
  • Be interactive and automate some typical approaches of data exploration pipelines;
  • Improve on similar approaches (e.g., clustering or correlation steps).

In addition to the basic evaluation criteria, we will be awarding bonus points (optional) for tools that 1) are particularly fast and easy to utilize, and/or 2) harmonize and improve various HLA typing algorithms.

CI TRACK

Similar to the DS track, submissions to this track will be scored based on two overall categories including basic qualifications and pipelines.

Basic qualifications must include:

  • A README.md is provided that:
    • References the data used by the tool. It’s not a requirement to have the data available in the submission; and
    • Provides a basic description of the tool; and
    • Properly explains the intended use, installation, and any other pertinent information that would enable the tool to be reproduced by others.
  • The tool provides capabilities beyond counts and summary statistics similar to that already provided by the iReceptor Gateway.

Tools submitted to this track will also be scored on impact and innovation.

For a tool to be impactful, it can be any or a combination of the following:

  • Harmonize data generated using different TCR/BCR alignment and assembly pipelines;
  • Identify known germline or expressed TCR/BCR repertoires from short reads;
  • Be capable of analyzing a variety of large and future datasets to contribute to a better understanding of COVID-19 pathophysiology.

For a tool to be innovative, it can be any or a combination of the following:

  • Be interactive and reduce human intervention efforts (e.g., automating some typical approaches of data exploration pipelines);
  • Leverage complex data;
  • Improve or expand on similar approaches.

In addition to the basic evaluation criteria, we will be awarding bonus points (optional) for tools that 1) are particularly fast and easy to use; and/or 2) enable analysis of longitudinal data to demonstrate persistent expression of TCR/BCR repertoires, 3) harmonize data generated by various HLA typing prediction software.

Opportunities for Top Performers

Selected participants will be publicly recognized and invited to contribute to a scientific manuscript describing the app-a-thon and methodologies/results. Selected participants may also have opportunities to present at a conference (TBD; e.g., DREAM satellite conference, ISCB conference), and continue solution development with the Precision Immunology App-a-thon team.

ADDITIONAL INFORMATION

Please use the Precision Immunology App-a-thon discussion on the precisionFDA Discussions Forum ONLY and post links to any additional datasets utilized to create your app.

Frequently Asked Questions

1. Question: How long does it take to convert my guest account to a contributor account?

Answer: Account approval typically takes 1-2 business days. PrecisionFDA administrators will provision your contributor account automatically upon review. Therefore, it is unnecessary to email precisionFDA after you receive an initial email about your guest account.

2. Question: Am I allowed to submit multiple entries to this app-a-thon?

Answer: If you only participate in one of the tracks using one type of approach/methodology, please ensure you only flag the final submission for evaluation consideration (one submission per methodology per track). However, you are permitted to submit more than one submission for the same track if you utilize different methodologies up to 5. The submission number should be included in the submission name to differentiate the submissions. For example, the first submission to the DS track can be named “DS_precisionFDA_team_submission_1” and a second submission to the same track would be “DS_precisionFDA_team_submission_2”. You are permitted to submit entries to both tracks provided that you follow the rules above per track.

3. Question: Am I required to use all the sequence-relevant data provided by precisionFDA?

Answer: If you participate in the DS track, you must use at least one sequence-relevant annotation or data such as v calls, or junction_aa and its relationship with clinical variables. You are highly encouraged to use all the sequence annotation information as it provides a complete picture of a person’s immune repertoires. If you are a CI participant, you are expected to use all sequence data relevant to clinical variables.

Challenge Team

  • PrecisionFDA: Emily Boja, Elaine Johanson
  • Booz Allen Hamilton: Doug Deer, Zeke Maier, Anish Prasanna, Holly Stephens, Sean Watford
  • DNAnexus: Ben Busby, Omar Serang, Sam Westreich

Data Table

Datasets TCR HLA Bonus Points (optional)
DS track Preloaded in precisionFDA (video tutorial for description): Study_ids ImmuneCODE-COVID-Release-002: COVID-19-Adaptive, ImmuneCODE-COVID-Release-002: COVID-19-ISB, are here (metadata) and here (annotated sequence) ---- Microsoft/Adaptive Biotechnologies datasets. Obtain an account at ImmunoSeq ANALYSER prior to accessing data
CI track iReceptor API: Example datasets containing full sequences for study_id ImmuneCODE-COVID-Release-002: COVID-19-Adaptive linked here (metadata) and here (annotated sequence),  iRECEPTOR account activation required. Please strictly follow specific instructions on API query limitations (below) A variety of data from European Nucleotide Archive/Sequence Read Archive for benchmarking of algorithms. Example:  https://www.ncbi.nlm.nih.gov/bioproject/PRJNA274775/ Longitudinal patient datasets to demonstrate persistent expression of TCR/BCR repertoires associated with immunity (e.g., study_id IR-BINDER dataset in iRECEPTOR).

Specific iRECEPTOR API Query Instructions

  1. One query at a time per user on all API endpoints. iRECEPTOR is not set up for massive parallel queries which will have a performance impact on the production Gateway.
  2. Limits on the fields that can be searched on the /rearrangement endpoint. Searches that are exact matches on the following rearrangement fields would be OK, but might still take minutes to complete. Users should be restricted to exact match queries on the following fields and are better off working from downloaded data if they want to perform searches that lie outside of the following:
    • repertoire_id, v_call
    • repertoire_id, d_call
    • repertoire_id, j_call
    • repertoire_id, v_gene
    • repertoire_id, d_gene
    • repertoire_id, j_gene
    • repertoire_id, v_subgroup
    • repertoire_id, d_subgroup
    • repertoire_id, j_subgroup
    • repertoire_id, junction_aa
    • repertoire_id, junction_aa_length

An example of v_call filter on the /rearrangement endpoint would look like this:

{

"filters":

{

"op":"and",

"content": [

{ "op":"=", "content": { "field":""repertoire_id", "value":"4" } },

{ "op":"=", "content": { "field":"v_call", "value":"IGHV1-1*01" } }

]

}

}

ADDITIONAL RESOURCES

Additional datasets CI track participants may consider:

YouTube videos relevant to Immunology for background:

References

1. Levantovsky, Rachel, and Verena van der Heide. "Shared CD8+ T cell receptors for SARS-CoV-2." Nature Reviews Immunology (2020): 1-1.

2. Lorente, L., et al. "HLA genetic polymorphisms and prognosis of patients with COVID-19." Medicina intensiva (2020).

3. Tomita, Yusuke, et al. "Association between HLA gene polymorphisms and mortality of COVID‐19: An in silico analysis." Immunity, Inflammation and Disease.

4. Schultheiß, Christoph, et al. "Next-generation sequencing of T and B cell receptor repertoires from COVID-19 patients showed signatures associated with severity of disease." Immunity 53.2 (2020): 442-455.

5. Corrie, Brian D., et al. "iReceptor: A platform for querying and analyzing antibody/B‐cell and T‐cell receptor repertoire data across federated repositories." Immunological reviews 284.1 (2018): 24-41 (https://b-t.cr/t/publicly-available-covid-19-airr-seq-data-sets/849).