Introduction to UK Biobank

Introduction

The UK Biobank is a very large population-based prospective study that includes detailed health-related and genetic data on about 500,000 individuals and is available to the research community. Men and women aged 40-69 years were recruited during 2006-2010 across 22 recruitment centers in the United Kingdom (Figure 1). Participants provided general consent for all types of health research by both academic and commercial researchers and follow-up through health-related records.

ukb_centres

Figure 1. The UK Biobank cohort recruitment centres (Source: UK Biobank)

The aims of the UK Biobank cohort

  • To allow detailed investigations of genetic and non-genetic determinants of disease of middle and old age. UK Biobank provides extensive and precise assessments of exposures with comprehensive follow up and characterization of many different health-related outcomes.

  • To promote innovative science by maximizing access to the resource. UK Biobank is open to bona fide researchers anywhere in the world, including those funded by academia and industry.

Note

The participants age range in the UK Biobank cohort was based on compromise to get people without a disease, and to get significant health outcomes during early years of follow up. This prospective approach enables the measurement of risk factors before the disease develops, and therefore 1) avoids reverse causality and recall bias, 2) improves measurement detail, 3) reduces measurement error.

Participants demographics

  • 46% male
  • 57% aged 40-59 years; 43% aged 60-69 years
  • Less socioeconomically deprived than UK average, but all strata represented
  • 85% urban
  • 94.5% white; 5.5% other
  • 58% paid employment/ self employed
  • 89% recruited in England; 7% in Scotland; 4% in Wales

Current and planned data for UK Biobank

We encourage to visit the official UK Biobank website for the most updated information and timelines about data availability. Please note number of participants in table below are approximate and may change as further updates on the cohort data recruitment become available (Last update: November 2020).

UK BIOBANK ASSESSMENT CENTRE

NUMBER OF PARTICIPANTS DETAILS DATE OF DATA ACQUISITION DATA FIRST AVAILABLE FOR RESEARCH
UKB Baseline assessment Whole cohort
  • Socio-demographics and lifestyle factors collected with touchscreen questionnaire and brief verbal interview.
  • Physical measurements (blood pressure, arterial stiffness, eye measures, body composition, hand-grip strength, ECG, etc).
  • Collection of blood, urine and saliva samples
  • Link to UKB.
2006-2010 Q2 2012
Repeat of baseline assessment 20,000 - 25,000 Same as above. Link to UKB. 2012-2013 Q3 2013


ONLINE FOLLOW-UP

NUMBER OF PARTICIPANTS DETAILS DATE OF DATA ACQUISITION DATA FIRST AVAILABLE FOR RESEARCH
Online 24-h dietary recall web questionnaire 210,000 Detailed questions on the intake of foods and beverages consumed during the previous 24-hour period. Link to UKB. 2011-2012 Q2 2013
Digestive health ~180,000 Questionnaire with self-reported information on abdominal and associated symptoms for the study of Irritable bowel syndrome and related disorders. Link to UKB. 2017 2018
Food (and other) preferences ~180,000 Questionnaire with items reflecting both sensory preferences (bitter, sweet etc.) and foodstuff preferences (fruit, vegetables, meat, etc.) Link to UKB. Q4 2019 Q1 2020
Physical activity data with accelerometry 100,000 Wrist worn tri-axial accelerometer - type, intensity, and duration of PA; one-week test. Link to UKB. 2013-2015 2015
Online ‘Healthy Work questionnaire’ 100,000 – 120,000 Occupational history since finishing full time education; respiratory health outcomes and medication for these conditions; and smoking habits. Link to UKB. Q3 2015 Q2 2017
Questionnaire on cognitive function 100,000 – 120,000 Tests for mood, fluid intelligence, trail making, symbol digit substitution pairs matching, numeric memory. Link to UKB.
Questionnaire on mental health ~160,000 Questionnaire on life-time experiences of mental disorders. Link to UKB. 2016 Q3 2017


IMAGING

NUMBER OF PARTICIPANTS DETAILS DATE OF DATA ACQUISITION DATA FIRST AVAILABLE FOR RESEARCH
Multimodal imaging Goal: Imaging available for 100,000 participants. ~40k ready as of early 2020 MRI imaging for brain, heart, abdomen and bone densitometry (DXA). Link to UKB 2014- 2015


HEALTH RECORD LINKAGE

NUMBER OF PARTICIPANTS DETAILS DATE OF DATA ACQUISITION DATA FIRST AVAILABLE FOR RESEARCH
Death registrations Whole cohort Primary and Secondary ICD-10 coded cause of death. Link to UKB. 2006- Q2 2013
Cancer registrations Whole cohort ICD coded cancer diagnoses. 1981- Q2 2013
Hospital inpatient episodes Whole cohort ICD coded diagnoses. Link to UKB. 1997- Q2 2013
Algorithmically-defined outcomes Whole cohort Health-related events, obtained through algorithmic combinations of coded information from UK Biobank's baseline assessment data collection, linked data from hospital admissions and death registries. Link to UKB. 2003- 2015
Primary care ~250,000 participants Primary care data recorded by health professionals working at general practices. Includes diagnoses, measurements, referrals etc. Link to UKB. variable Q3 2019
First occurrences Whole cohort Data shows the 'first occurrence of ~1200 broad health outcomes identified from Primary Care data, Hospital inpatient data, Death Register and self-reported medical condition ICD codes. Link to UKB. variable Q3 2019?
COVID-19 Whole cohort COVID-19 data. Includes COVID-19 tests results, GP clinical events, and prescription records. Link to UKB. Q2 2020 Q3 2020


GENETIC DATA

NUMBER OF PARTICIPANTS DETAILS DATE OF DATA ACQUISITION DATA FIRST AVAILABLE FOR RESEARCH
Genotyping (baseline samples) Whole cohort 50,000 participants genotyped using the UK BiLEVE array and 100,000 participants genotyped on the UK Biobank array. Link to QC and imputation details. 2013-2015 Q3 - 2017
Exome sequencing 50,000 exomes available – Whole cohort planned Q4 2020 VCF and CRAM files for 49,960 exomes available. Link to Exome-seq FAQs Joint-call exome data in pVCF format, sample-level variant (VCFs) and sequence data (CRAMs) for the first 200k exomes planned for November 2020 Q4 - 2019
Whole genome sequencing Whole cohort planned Q4 2022 For more information on the release of whole genome sequencing data click here. TBA TBA


BIOCHEMICAL DATA

NUMBER OF PARTICIPANTS DETAILS DATE OF DATA ACQUISITION DATA FIRST AVAILABLE FOR RESEARCH
Serum biomarker data Whole cohort Urine, packed red blood cells (PRBC) and serum assay data for all participants. Link to UKB. 2006-2010 and 2013 Q1 2019


Source: Table adapted and updated from Sudlow et al. 2015