Introduction
EraSOR is a python software for removing bias introduced from having overlapped samples between the base GWAS data and the target genotype data.
To run EraSOR, you will need LD scores, and the GWAS summary statistics obtained from the base and target samples. EraSOR will calculate the bi-variate LD scores intercepts and adjust the base summary statistics accordingly.
Download EraSOR
You can either download the source script directly from gitlab release or you can clone this repository in the command line interface (Require installation of git)
git clone https://gitlab.com/choishingwan/EraSOR.git
Warning
This document is written for unix operation system (e.g. not windows). EraSOR should in theory also works on windows but I am not family with running python software on windows.
Dependency
EraSOR is a python3 software, and will require installation of Python3. You can find instructions here.
You will also need to install the following python packages for EraSOR to work:
- numpy
- pandas
- scipy
You can install them with the following commands (assuming python3
is in your environment)
python3 -m pip install numpy pandas scipy
Tips
You can check if python3 is in your path by typing which python3
.
This command should return a path if python3
is found.
You may use pyenv to organize multiple version of python
Before you start
Before you start, please make sure you have the followings:
-
GWAS summary statistics from your base data
- Sample size information of your GWAS (or a column containing sample size information)
- Must contain the following columns:
- SNP ID
- Effective allele
- Effect size (either in \(\beta\), Odds Ratio, or logOR)
- P-value
-
GWAS summary statistics from your target data
- Your target data should have the same phenotype as the base
- Must contain the following columns:
- SNP ID
- Effective allele
- Effect size (either in \(\beta\), Odds Ratio, or logOR)
- P-value
-
LD Score calculated using samples representative of your target population
- Can also generate a separate copy of LD score with the MHC region removed (or only contain SNPs that were genotyped in your target data)
Note
In theory, EraSOR might also work for cross-trait analyses. However, we have not performed any simulations and thus are uncertain of the potential bias of EraSOR in such scenario
Quick start
Tips
You can see all available parameters by typing
EraSOR.py --help
Assuming you have the following files
-
Base summary statistics
- Name: phenotype.sumstat.txt
- SNP ID Column: SNP
- Effect Size column: BETA
- Is beta?: true
- Effective allele column: A1
- Non-effective allele column: A2
- P-value column: p
- Sample size column: N
-
Target summary statistics
- Name: data.sumstat
- SNP ID Column: ID
- Effect Size column: OR
- Is beta?: false
- Effective allele column: Effective
- Non-effective allele column: NonEffective
- P-value column: P-value
- Sample size column: OBS_CT
- LD scores
- Assuming it is chromosome separated, with the following format (# represent chromosome number):
- baseline-#.l2.ldscore.gz
- baseline-#.l2.M
- baseline-#.l2.M_5_50
- And the weight scores in the following format:
- weight-#.l2.ldscore.gz
- weight-#.l2.M
- weight-#.l2.M_5_50
- Assuming it is chromosome separated, with the following format (# represent chromosome number):
You can run EraSOR with the following command:
python EraSOR.py \
--base phenotype.sumstat.txt \
--base-snp SNP \
--base-signed-sumstats BETA,0 \
--base-a1 A1 \
--base-a2 A2 \
--base-p p \
--base-N-col N \
--target data.sumstat
--target-snp ID \
--target-signed-sumstats OR,1 \
--target-a1 Effective \
--target-a2 NonEffective \
--target-p P-value \
--target-N-col OBS_CT \
--ref-ld-chr baseline- \
--w-ld-chr weight- \
--out EraSOR.adjusted
This will generate two files:
-
EraSOR.adjusted.assoc.gz
- This file contain the adjusted summary statistics that can be used for downstream polygenic risk score analyses
- The
Z
column contains the adjusted effect size andP
column contains the adjusted p-value
-
EraSOR.adjusted.meta
- Contain information used for adjustments, including the heritability estimates of the base and target GWAS, the intercepts, and the value of adjustment
Note
Most parameter of EraSOR are identical to LDSC with the added prefix of --base-
or --target-
to indicates if those parameters are for the base or target GWAS.
If the column names of the base and target GWAS are identical, you can provide the column name once with the --base-*
parameter and then use --same