Initialize new application

In this section, we will go through how to organize new applications.

Before you start

Make sure you have the following

nextflow

Note

You can install nextflow with

curl -s https://get.nextflow.io | bash

Java 8 or later is required
1. A md5 key file for one UK Biobank application
2. This is required so that we can download the sample ID

Generating the application folder

Assuming you are in the UK Biobank directory initialized from previous section, run the following

sh scripts/administration/new_application.sh \
    --key <Your key file> \
    --id <UK Biobank application ID>

Data structure

After running the script, a new folder named ukb<application ID> should appear under the application folder. This folder should have the following structure:

ukb<application ID>
  |
  |-- genotyped 
  |
  |-- imputed
  |
  |-- exome
  |    |   
  |    |-- PLINK
  |
  |-- phenotype
  |    |
  |    |-- raw
  |    |    |
  |    |    |-- encrypted
  |    |    |
  |    |    |-- keys
  |    |
  |    |-- withdrawn
  |    |
  |    |-- ukb<application ID>_rel_s488251.dat # This is the relatedness file
  |
  |-- ukb<application ID>_init.log

You will only have the genotyped, imputed or the exome folder if your application has permission to access them.

Getting the phenotype files

UK Biobank does not provide a command line option to download the phenotype files. As such, the delegate of each application will need to download and upload the phenotype files manually. Here we will go through the steps involved

Downloading phenotype files

The encrypted dataset must be access via the UK Biobank Access Management System (AMS).

Note

Only project Principal Investigator (PI) or collaborators with delegate access are able to perform the following steps

To download a dataset, you need to follow these steps

Login to the AMS
Navigate to the projects tab and click the view button next to the relevant application ID.
Select data tab at the top right, and then click “Go to Showcase to refresh or download data”.
Your dataset will be shown in the dataset tab.
Click on the ID for the dataset you wish to download, which will take you to the authentication screen.
Enter the 32-character MD5 checksum (included in the main body of the notification email for the dataset). Then click Generate.
This will open a new page with a link to your dataset. Click the Fetch button to download the encrypted dataset.
Upload the encrypted file to ukb<application ID>/phenotype/raw/encrypted. Also put the md5 key to ukb<application ID>/phenotype/raw/keys

Downloading Primary Care Records

The primary care records are stored in the data portal. To download these data, you need to follow these steps

Login to the AMS
Navigate to the projects tab and click the view button next to the relevant application ID.
Select data tab at the top right, and then click “Go to Showcase to refresh or download data”.
Select data portal from the top right, then click Connect.
Select table download.
Type in the table you want to download and click "Fetch Table".
Note

These tables might be of interest
- gp_clinical: Clinical event records
- gp_scripts: Prescription records
Click the link to download the table.
Upload the downloaded file(s) to ukb<application ID>/phenotype/raw/.

Tips

You can find details of primary care record here