Initialize new application
In this section, we will go through how to organize new applications.
Before you start
Make sure you have the following
-
nextflow
Note
You can install nextflow with
curl -s https://get.nextflow.io | bash
Java 8 or later is required
- A md5 key file for one UK Biobank application
- This is required so that we can download the sample ID
Generating the application folder
- Assuming you are in the UK Biobank directory initialized from previous section, run the following
sh scripts/administration/new_application.sh \ --key <Your key file> \ --id <UK Biobank application ID>
Data structure
After running the script, a new folder named ukb<application ID>
should appear under the application
folder.
This folder should have the following structure:
ukb<application ID>
|
|-- genotyped
|
|-- imputed
|
|-- exome
| |
| |-- PLINK
|
|-- phenotype
| |
| |-- raw
| | |
| | |-- encrypted
| | |
| | |-- keys
| |
| |-- withdrawn
| |
| |-- ukb<application ID>_rel_s488251.dat # This is the relatedness file
|
|-- ukb<application ID>_init.log
You will only have the genotyped
, imputed
or the exome
folder if your application has permission to access them.
Getting the phenotype files
UK Biobank does not provide a command line option to download the phenotype files. As such, the delegate of each application will need to download and upload the phenotype files manually. Here we will go through the steps involved
Downloading phenotype files
The encrypted dataset must be access via the UK Biobank Access Management System (AMS).
Note
Only project Principal Investigator (PI) or collaborators with delegate access are able to perform the following steps
To download a dataset, you need to follow these steps
- Login to the AMS
- Navigate to the projects tab and click the view button next to the relevant application ID.
- Select data tab at the top right, and then click “Go to Showcase to refresh or download data”.
- Your dataset will be shown in the dataset tab.
- Click on the ID for the dataset you wish to download, which will take you to the authentication screen.
- Enter the 32-character MD5 checksum (included in the main body of the notification email for the dataset). Then click Generate.
- This will open a new page with a link to your dataset. Click the Fetch button to download the encrypted dataset.
- Upload the encrypted file to
ukb<application ID>/phenotype/raw/encrypted
. Also put the md5 key toukb<application ID>/phenotype/raw/keys
Downloading Primary Care Records
The primary care records are stored in the data portal. To download these data, you need to follow these steps
- Login to the AMS
- Navigate to the projects tab and click the view button next to the relevant application ID.
- Select data tab at the top right, and then click “Go to Showcase to refresh or download data”.
- Select data portal from the top right, then click Connect.
- Select table download.
-
Type in the table you want to download and click "Fetch Table".
Note
These tables might be of interest
gp_clinical
: Clinical event recordsgp_scripts
: Prescription records
-
Click the link to download the table.
- Upload the downloaded file(s) to
ukb<application ID>/phenotype/raw/
.
Tips
You can find details of primary care record here