To create a new app, click "Create App" in the Apps page. The following section walks you through important concepts of app development.
Apps have a machine-readable name that cannot contain spaces (such as "bwa-freebayes") and a human-readable title (such as "BWA-MEM and FreeBayes"). Among apps that you create, names need to be unique (you cannot author two distinct apps with the same name). This restriction is only per-user, meaning that you can still create an app with the same name as someone else's app. In fact, the system encourages you to use someone else's app as a starting point and make further tweaks and save it as your own app (a process called "forking" an app). This model was inspired from the model of GitHub repositories.
Apps require an input/output specification, which mandates what inputs they need from the user, and what outputs they are expected to generate. Note that an "input" is anything that needs to be received from the user and which can potentially vary between executions. These can be not only input files but also numerical or boolean values, and strings. In that sense, the "inputs" can be used both for receiving data to operate on as well as receiving configuration parameters. Each input field has the following properties:
|Class||The kind of input. There are exactly five classes supported: file, string, integer, float and boolean.|
|Name||A machine-readable name for this input (no spaces allowed). The system will create a shell variable named after this, for your script to use.|
|Label||A human-readable label for this input. The system uses this to render the form that users see when launching the app.|
|Help text||Additional help text describing what this input field is about. The system shows this help text in the app details page ("spec" tab), and upon hovering on an input during app launch.|
|Default value||A default value that this field will be pre-filled with when users launch the app. (You are not required to provide defaults; do so only if you need to guide users in choosing the right values.)|
|Choices||A set of comma-separated values denoting the only permitted values for this field. If such choices are provided, the user must choose one of them using a drop-down menu and can't write in their own value.|
|Optional?||Whether this field is optional or required. When launching an app, users must fill all required fields before they can continue.|
Input spec example
Let's consider an app which takes a BED file with genomic intervals, and extends each interval's coordinates by adding a fixed amount of padding on both sides. Here's an example of input spec:
|Property||Value for 1st input||Value for 2nd input|
|Label||BED file with intervals||Padding amount to add|
|Help text||The BED file whose genomic intervals will be extended.||The number of base pairs to extend each interval along both directions.|
The output specification is similar to the input specification (but with no default values). When creating an app, you specify what kind of inputs your app is expected to create, and define names and labels for them. When your script runs, it is responsible for generating the respective outputs. If an output is marked as optional, your script is not required to produce it. See the app shell script section for more information.
Output spec example
To continue our aforementioned example, here is a potential output specification for our example app:
|Property||Value for 1st output|
|Label||Padded BED result|
|Help text||The generated BED file with the padded genomic intervals.|
Apps run inside a virtual machine (VM); a computer on the cloud with a specific environment. When authoring an app, you have the opportunity to configure the environment according to your needs, using the "VM Environment" tab.
By default, apps do not have access to the Internet. Removing Internet access ensures that apps cannot communicate with the outside world over the Internet -- this increases user comfort and lowers the barriers for users to try out apps. If your app requires Internet access (for example, to communicate with a third-party database over the Internet, to fetch files from URLs, or to fetch and install external software at runtime), you can enable it in this tab.
The default instance type denotes the particular hardware configuration that the app will run on. Each instance type comes with a specific amount of memory, number of CPU cores, and hard disk storage. See the section on available instance types below for more information. Although you can choose a default one in the "VM Environment" tab, users can still override the default choice when launching the app. This is useful if you have a single app that can work for both small inputs (such as an exome) and large inputs (such as a whole genome).
The operating system of the virtual machine is Ubuntu 14.04, with several preinstalled packages. If your app requires additional Ubuntu packages, you can specify so in the "VM Environment" tab. For example, if your app needs Java, we recommend adding the "openjdk-7-jre-headless" package. If you are unsure as to what a certain package is called, you can use the packages.ubuntu.com website to locate packages (make sure to select the "trusty" distribution in the search form, as that is the codename for Ubuntu 14.04). Note that, specifically for Java 8, we support additional packages (such as "openjdk-8-jre-headless") which are not listed on the Ubuntu packages website.
If you need to load additional files onto the virtual machine and have them available to your app's shell script, such as executables, libraries, reference genome files or pretty much any other static files required for your execution, you can use App assets. Assets are tarballs that are uncompressed in the root folder of the virtual machine right before running your app script. The App assets section discusses in detail how to create, manage, and select assets for your app.
The shell script of an app contains the shell code that will run inside the virtual machine. The script runs as root. During the script execution, the default working directory (home directory) is
/work. For more information about the shell variables available to your script, and the handling of app inputs and outputs from your script, consult the App script section.
To summarize, here is what happens when your app is launched:
|1||A new virtual machine with Ubuntu 14.04 and these preinstalled packages is initialized.|
|2||Additional Ubuntu packages are installed per your app's spec.|
|3||Your app's assets are fetched and uncompressed in the root folder.|
|4||The job's input files are downloaded in subfolders under the
|5||Shell variables are populated according to your job's inputs.|
|6||Your app's shell script is executed.|
The precisionFDA system supports the following hardware configurations (instance types) for apps to run on:
|Instance type||# of CPU cores||Memory||Hard Disk Storage|
|Baseline 2||2||3.8 GB||32 GB|
|Baseline 4||4||7.5 GB||80 GB|
|Baseline 8||8||15 GB||160 GB|
|Baseline 16||16||30 GB||320 GB|
|Baseline 32||32||60 GB||640 GB|
|High Mem 2||2||15 GB||32 GB|
|High Mem 4||4||30.5 GB||80 GB|
|High Mem 8||8||61 GB||160 GB|
|High Mem 16||16||122 GB||320 GB|
|High Mem 32||32||244 GB||640 GB|
|High Disk 2||2||3.8 GB||160 GB|
|High Disk 4||4||7.5 GB||320 GB|
|High Disk 8||8||15 GB||640 GB|
|High Disk 16||16||30 GB||1280 GB|
|High Disk 36||36||60 GB||2880 GB|
App assets are the building blocks of apps. They are tarballs (file archives), which get uncompressed in the root folder of the virtual machine before the app script starts to run. They can contain executables (such as bioinformatics tools), static data (such as reference genomes and index files) or pretty much anything else that is required for an app to run.
Just like regular files, app assets can be either private or publicly contributed to the precisionFDA community. Your app can choose among any accessible assets (whether private or public).
To help get you started, the precisionFDA team has contributed a few popular app assets that you can include in your app's environment. The table below lists some examples of such public app assets:
|grch37-fasta||The GRCh37 reference genome FASTA file (
|bwa-grch37||The GRCh37 reference genome, indexed for BWA.|
When editing an app, in the "VM Environment" tab, you will see a list of assets that have been selected for inclusion in the app's virtual machine. You can remove assets by hovering over them and clicking the "X" button on the right hand side. You can select additional assets by clicking the "Select assets" button, which will pop up the asset selector.
The selector lists all available assets on the left hand side. Clicking on the name of an asset, or on the checkbox next to it, will select that asset for inclusion. Clicking on the whitespace surrounding the asset name, or on the right-pointing arrow next to the asset name will display information about the asset (but not toggle the selection). Each asset comes with documentation, which is meant to describe what is the asset and how it can be used. In addition, the system displays a list of all files that are found inside the tarball.
We understand that asset names may not always be indicative of their contents; for example, many people would recognize
tabix as the executable that indexes VCF files, but fewer people would recognize
htslib as the asset containing that executable. For this reason, the precisionFDA system includes a feature that allows you to search filenames across all assets. In the asset selector, type a search keyword (such as
tabix) in the upper left corner. The asset list will be filtered to show you assets which include that file (such as
htslib), as well as assets whose name starts with that prefix.
To upload your own assets, or to perform more detailed asset management (such as download an asset to take a look at it yourself, or delete an asset you've previously uploaded) click "Manage your assets", from either the asset selector or the "VM Environment" tab (or "Manage Assets" from the Apps listing page). You will be taken to a page listing all the precisionFDA assets (your private ones, and all public ones). Click on an asset's name to see asset details, and to perform actions such as download, delete, or edit its readme. Click "Create Assets" at the top to be presented with instructions on how to upload your own assets. The next section discusses the process in detail.
To upload an asset, you must first prepare the files that will be included in the tarball archive. On your computer, start by creating a "fake root" folder and by assembling your files underneath it.
Since the asset will be uncompressed in the root folder on the cloud, it is important for the tarball to contain the proper subfolders inside of it. If an asset tarball does not have any subfolders, then its files will be placed directly inside the root folder (i.e. in
/), which is not typically desired.
Therefore, create the
usr/bin subfolder under the "fake root" and place there any binaries, and create the
work subfolder for any working directory files. Since your app's script starts its execution inside
/work, any files you place under that folder will be readily accessible. For example, if your asset includes a file
/work/GenomeAnalysisTK.jar, you can use it inside your script without any other folder designation, i.e. like this:
java -jar GenomeAnalysisTK.jar.
If you need to compile binaries for Ubuntu 14.04, or otherwise experiment with a Linux environment similar to the one that apps run on, download and install the freely available VirtualBox virtualizer. Then, from the "Create Assets" page, download the precisionFDA virtual machine image and double-click it to open it in VirtualBox. Power on the machine and log in as the
ubuntu user. This environment contains the same Ubuntu packages as the cloud environment where apps run.
ssh -p 2222 ubuntu@localhost. This will allow you to use your host operating system's copy/paste capabilities, or to transfer files in and out of the VM.
The following table summarizes ways in which you can use the VirtualBox machine to prepare content for inclusion in your fake root:
|To include...||Do this...|
|Complex compilable packages||
Answer Y to the question "create a personal library"
After assembling your fake_root, prepare a Readme file for your asset. This file needs to contain Markdown syntax. Below is an example of the Readme file included with the htslib-1.2.1 public asset: (note the extra two spaces after tabix-1.2.1.html -- this is how you introduce line breaks in markdown)
This asset provides the `bgzip` and `tabix` executables. Include this asset if your app needs to compress and index a VCF file. ### Example usage The following produces `file.vcf.gz` and `file.vcf.tbi`: ``` bgzip file.vcf tabix -p vcf file.vcf.gz ``` ### Links http://www.htslib.org/doc/tabix-1.2.1.html https://github.com/samtools/htslib/releases/tag/1.2.1
Download the precisionFDA uploader by clicking the respective button in the "Create Assets" page. The downloaded archive contains a single python script,
pfda, which you can run to upload the asset. (NOTE: It requires Python 2.7, as well as the python 'requests' and 'futures' packages, so ensure you have those available in your environment).
The tool requires an "authorization key" in order to authenticate the client against the precisionFDA system. You can get a key by clicking the respective link in the "Add Assets" page. Copy the key from that page and paste it in the command below where it says KEY. For your security, the key is valid for 24h.
./pfda upload-asset --auth KEY --name my-asset.tar.gz --root /path/to/fake_root --readme my-asset.txt. This command will archive the contents of the fake root into the named tarball, and upload it to precisionFDA along with the contents of the readme file. The tarball name must end in either
.tar (in which latter case it will not be compressed).
$HOME/.config/precision-fda/config.json, so after you have run it once, you don't need to specify the key in subsequent invocations.
When creating an app, the "Script" tab provides you with an editor where you can write the shell script that will be executed. The script will run as root, inside the
/work folder (which is also set as the home directory during execution). The script is
source'ed from inside bash, so you don't need to include any
#!/bin/bash headers as they will be ignored. Bash by default runs with the
set -e -x -o pipefail options.
App inputs are handled in the following way:
- For string, integer, float and boolean inputs, the system defines a shell variable with the same name. Its value is set to whatever value the user provided for that input (or empty, if that input is optional and no value was provided)
- For files, the system downloads each file input under
/work/in/field/filename. For instance, in the example we gave earlier, if a user provides a file called
trusight.bedfor the input field
intervals, the system will download the file into
/work/in/intervals/trusight.bed. In addition, the following variables are defined:
The unique system id (i.e. file-Bk0kjkQ0ZP01x1KJqQyqJ7yq) of whatever file was assigned for that field.
The full file path, i.e.
The filename without its suffix (and if its suffix is ".gz", without its second suffix, i.e. without ".tar.gz", ".vcf.gz", or ".fastq.gz").
Example of system-defined variables
For our example, the system would define the following variables:
The system defines the prefix variable because it can be often used to name results. In our example app, we can name the padded intervals
Your script needs to communicate back to the system its outputs. This is handled via a helper utility called
emit. Use it as follows:
- For string, integer, float and boolean outputs, type
emit field value. For example, if you've defined an output field called
qc_passof boolean type, use
emit qc_pass trueto set it to true.
- For file outputs, type
emit field filename. This command will upload the particular file from the local hard disk of the virtual machine onto the cloud storage, and assign it to that field.
Example of app script
To put it all together, here is what the script would look like for our example app:
bedtools slop -i "$intervals_path" -g grch37.chrsizes -b "$padding" >"$intervals_prefix".padded.bed
emit padded_intervals "$intervals_prefix".padded.bed
Bash is the shell interpreter that runs your app's shell script. It is the most popular shell interpreter in Linux distributions, and also used to power the OS X Terminal app. In most systems you can reach the bash manual by typing
On precisionFDA, your app's script runs with the
set -e -x -o pipefail options. These options have the following effects:
||The script will halt as soon as any command fails.|
||The script will echo every command as it is executed into the output logs.|
||The script will halt as soon as any command fails in a pipeline of commands, i.e.
pipefailto ensure that code such as
zcat file.vcf.gz | head >vcf-header.txtwould fail if the input file was corrupted and could not be uncompressed. Without pipefail, a failure in the first part (
zcat) of the pipeline would not cause this command to fail, so your script would have continued running. However, this means that you must be careful to not include any commands which may return non-zero exit status in your script. For example,
grep chr1 some_file | wc -l >chr1-counts.txtwould fail if there are no
some_file, instead of outputting the number
chr1-counts.txt(because when grep does not find something, it fails). If you are worried about this behavior, you can undo the option via
set +o pipefail.
When using bash variables that refer to a single unit (such as a filename, or a value that should not be further tokenized or otherwise interpreted on the command line), it is strongly recommended that you enclose such variables within double quotes, i.e.
"$file_path" instead of
$file_path. This will allow you to handle corner cases such as spaces included in the filename.
When viewing any app, clicking the "Fork" button will bring up the app editor and initialize it with the specification of the original app. You can make any changes and then save them into a new private app owned by you. (Unlike GitHub, precisionFDA does not keep track of forks, and the operation is always private).
In addition, this feature can be used to take a peek at the insides of an app — just fork it to bring up the editor, and the simply cancel the operation. This allows you to see the app's script, assets, etc.