Introduction to UK Biobank
Introduction
The UK Biobank is a very large population-based prospective study that includes detailed health-related and genetic data on about 500,000 individuals and is available to the research community. Men and women aged 40-69 years were recruited during 2006-2010 across 22 recruitment centers in the United Kingdom (Figure 1). Participants provided general consent for all types of health research by both academic and commercial researchers and follow-up through health-related records.
Figure 1. The UK Biobank cohort recruitment centres (Source: UK Biobank)
The aims of the UK Biobank cohort
-
To allow detailed investigations of genetic and non-genetic determinants of disease of middle and old age. UK Biobank provides extensive and precise assessments of exposures with comprehensive follow up and characterization of many different health-related outcomes.
-
To promote innovative science by maximizing access to the resource. UK Biobank is open to bona fide researchers anywhere in the world, including those funded by academia and industry.
Note
The participants age range in the UK Biobank cohort was based on compromise to get people without a disease, and to get significant health outcomes during early years of follow up. This prospective approach enables the measurement of risk factors before the disease develops, and therefore 1) avoids reverse causality and recall bias, 2) improves measurement detail, 3) reduces measurement error.
Participants demographics
- 46% male
- 57% aged 40-59 years; 43% aged 60-69 years
- Less socioeconomically deprived than UK average, but all strata represented
- 85% urban
- 94.5% white; 5.5% other
- 58% paid employment/ self employed
- 89% recruited in England; 7% in Scotland; 4% in Wales
Current and planned data for UK Biobank
We encourage to visit the official UK Biobank website for the most updated information and timelines about data availability. Please note number of participants in table below are approximate and may change as further updates on the cohort data recruitment become available (Last update: November 2020).
UK BIOBANK ASSESSMENT CENTRE
NUMBER OF PARTICIPANTS | DETAILS | DATE OF DATA ACQUISITION | DATA FIRST AVAILABLE FOR RESEARCH | |
---|---|---|---|---|
UKB Baseline assessment | Whole cohort |
|
2006-2010 | Q2 2012 |
Repeat of baseline assessment | 20,000 - 25,000 | Same as above. Link to UKB. | 2012-2013 | Q3 2013 |
ONLINE FOLLOW-UP
NUMBER OF PARTICIPANTS | DETAILS | DATE OF DATA ACQUISITION | DATA FIRST AVAILABLE FOR RESEARCH | |
---|---|---|---|---|
Online 24-h dietary recall web questionnaire | 210,000 | Detailed questions on the intake of foods and beverages consumed during the previous 24-hour period. Link to UKB. | 2011-2012 | Q2 2013 |
Digestive health | ~180,000 | Questionnaire with self-reported information on abdominal and associated symptoms for the study of Irritable bowel syndrome and related disorders. Link to UKB. | 2017 | 2018 |
Food (and other) preferences | ~180,000 | Questionnaire with items reflecting both sensory preferences (bitter, sweet etc.) and foodstuff preferences (fruit, vegetables, meat, etc.) Link to UKB. | Q4 2019 | Q1 2020 |
Physical activity data with accelerometry | 100,000 | Wrist worn tri-axial accelerometer - type, intensity, and duration of PA; one-week test. Link to UKB. | 2013-2015 | 2015 |
Online ‘Healthy Work questionnaire’ | 100,000 – 120,000 | Occupational history since finishing full time education; respiratory health outcomes and medication for these conditions; and smoking habits. Link to UKB. | Q3 2015 | Q2 2017 |
Questionnaire on cognitive function | 100,000 – 120,000 | Tests for mood, fluid intelligence, trail making, symbol digit substitution pairs matching, numeric memory. Link to UKB. | ||
Questionnaire on mental health | ~160,000 | Questionnaire on life-time experiences of mental disorders. Link to UKB. | 2016 | Q3 2017 |
IMAGING
NUMBER OF PARTICIPANTS | DETAILS | DATE OF DATA ACQUISITION | DATA FIRST AVAILABLE FOR RESEARCH | |
---|---|---|---|---|
Multimodal imaging | Goal: Imaging available for 100,000 participants. ~40k ready as of early 2020 | MRI imaging for brain, heart, abdomen and bone densitometry (DXA). Link to UKB | 2014- | 2015 |
HEALTH RECORD LINKAGE
NUMBER OF PARTICIPANTS | DETAILS | DATE OF DATA ACQUISITION | DATA FIRST AVAILABLE FOR RESEARCH | |
---|---|---|---|---|
Death registrations | Whole cohort | Primary and Secondary ICD-10 coded cause of death. Link to UKB. | 2006- | Q2 2013 |
Cancer registrations | Whole cohort | ICD coded cancer diagnoses. | 1981- | Q2 2013 |
Hospital inpatient episodes | Whole cohort | ICD coded diagnoses. Link to UKB. | 1997- | Q2 2013 |
Algorithmically-defined outcomes | Whole cohort | Health-related events, obtained through algorithmic combinations of coded information from UK Biobank's baseline assessment data collection, linked data from hospital admissions and death registries. Link to UKB. | 2003- | 2015 |
Primary care | ~250,000 participants | Primary care data recorded by health professionals working at general practices. Includes diagnoses, measurements, referrals etc. Link to UKB. | variable | Q3 2019 |
First occurrences | Whole cohort | Data shows the 'first occurrence of ~1200 broad health outcomes identified from Primary Care data, Hospital inpatient data, Death Register and self-reported medical condition ICD codes. Link to UKB. | variable | Q3 2019? |
COVID-19 | Whole cohort | COVID-19 data. Includes COVID-19 tests results, GP clinical events, and prescription records. Link to UKB. | Q2 2020 | Q3 2020 |
GENETIC DATA
NUMBER OF PARTICIPANTS | DETAILS | DATE OF DATA ACQUISITION | DATA FIRST AVAILABLE FOR RESEARCH | |
---|---|---|---|---|
Genotyping (baseline samples) | Whole cohort | 50,000 participants genotyped using the UK BiLEVE array and 100,000 participants genotyped on the UK Biobank array. Link to QC and imputation details. | 2013-2015 | Q3 - 2017 |
Exome sequencing | 50,000 exomes available – Whole cohort planned Q4 2020 | VCF and CRAM files for 49,960 exomes available. Link to Exome-seq FAQs Joint-call exome data in pVCF format, sample-level variant (VCFs) and sequence data (CRAMs) for the first 200k exomes planned for November 2020 | Q4 - 2019 | |
Whole genome sequencing | Whole cohort planned Q4 2022 | For more information on the release of whole genome sequencing data click here. | TBA | TBA |
BIOCHEMICAL DATA
NUMBER OF PARTICIPANTS | DETAILS | DATE OF DATA ACQUISITION | DATA FIRST AVAILABLE FOR RESEARCH | |
---|---|---|---|---|
Serum biomarker data | Whole cohort | Urine, packed red blood cells (PRBC) and serum assay data for all participants. Link to UKB. | 2006-2010 and 2013 | Q1 2019 |
Source: Table adapted and updated from Sudlow et al. 2015