Generalized confounder adjustment for testing and estimation (GCATE)

This repository contains the code for reproducing simulation and real data analysis results of the paper "Simultaneous inference for generalized linear models with unmeasured confounders".

Files

Python module

gcate: The main module for GCATE.

Scripts

ex1: Simulation with Poisson DGP with sample splitting
- ex1_generate_data.py: Generate simulated data.
- ex1_run_gcate.py: Run GCATE.
ex2: Simulation with Poisson DGP without sample splitting
- ex2_generate_data.py: Generate simulated data.
- ex2_run_gcate.py: Run GCATE.
- ex2_run_glm.py: Run GLM oracle and GLM naive.
- ex2_run_cate.R: Run CATE.
ex3: Simulation with Splatter simulator
- ex3_generate_data.py: Generate simulated data.
- ex3_run_gcate.py: Run GCATE.
- ex3_run_glm.py: Run GLM naive.
- ex3_run_cate.R: Run CATE.
ex4: Lupus data
- ex4_preprocess_lupus.py: preprocess the lupus data
- ex4_run_glm.py: Run GLM on subset and full set of covaraites.
- ex4_run_gcate.py: Run GCATE on subset of covaraites.
- ex4_run_gcate_full.py: Run GCATE full set of covaraites.
- ex4_run_cate.R: Run CATE on subset of covaraites.
- ex4_run_cate_full.R: Run CATE on full set of covaraites.
- ex4_GO.R: gene ontology analysis
ex5: Simulation with varying dimensions
- ex5_blessing_dim.py: Run GCATE on varying dimensions.

Jupyter notebooks:

Plot_simu.ipynb: Reproduce the figures and tables for simulation studies.
Plot_lupus.ipynb: Reproduce the figures and tables for the lupus data analysis.

Requirements

The following packages are required for the reproducibility workflow.

Python packages

Package	Version
anndata	0.9.2
cvxpy	1.1.18
h5py	3.1.0
joblib	1.1.0
jupyter	1.0.0
matplotlib	3.4.3
numba	0.54.1
numpy	1.22.0
pandas	1.3.3
python	3.8.12
scanpy	1.9.3
scikit-learn	1.1.2
scipy	1.10.1
seaborn	0.13.0
statsmodels	0.13.5
tqdm	4.62.3

R packages

Package	Version
AnnotationDbi	1.56.2
cate	1.1.1
clusterProfiler	4.2.2
org.Hs.eg.db	3.14.0
qvalue	2.26
R	3.8.2
reticulate	1.31
rrvgo	1.6.0
tidyverse	1.3.1

Reproducibility workflow

For simulation studies, the workflow is as follows:

Run script ex1_generate_data.py to generate simulated data, which will be stored in the folder /data/ex1/. The data for the second and the third experiments can be similarly generated by running ex2_generate_data.py and ex3_generate_data.py, respectively.
Run scripts of individual methods for each experiment as described below, and the results will be stored in the folder result/:
- Ex1: ex1_run_gcate.py
- Ex2: ex2_run_glm.py, ex2_run_gcate.py, ex2_run_cate.R
- Ex3: ex3_run_glm.py, ex3_run_gcate.py, ex3_run_cate.R
For experiments on varying dimensions, run ex5_blessing_dim.py.
Use Plot_simu.ipynb to reproduce the figures (Figures 2-6, F1-F2, and G3) and table (Table G2) based on the previous results.

For real data analysis, the workflow is as follows:

Obtain the h5ad file of the lupus data from the authors of the original paper and store it in the folder data/lupus/GSE174188_CLUES1_adjusted.h5ad.
Run ex4_preprocess_lupus.py to preprocess the lupus data.
Run scripts of individual methods (ex4_run_glm.py, ex4_run_gcate.py, ex4_run_gcate_full.py, ex4_run_cate.R, ex4_run_cate_full.R), and the results will be stored in the folder result/lupus/.
Use Plot_lupus.ipynb to reproduce the figures (Figures 6, G4-G9, and G11) and tables (Tables G3-G4) based on the previous results.
Run ex4_GO.R to perform gene ontology analysis (Figure G10).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Generalized confounder adjustment for testing and estimation (GCATE)

Files

Python module

Scripts

Jupyter notebooks:

Requirements

Python packages

R packages

Reproducibility workflow

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
gcate		gcate
.gitignore		.gitignore
LICENSE		LICENSE
Plot_lupus.ipynb		Plot_lupus.ipynb
Plot_simu.ipynb		Plot_simu.ipynb
README.md		README.md
ex1_generate_data.py		ex1_generate_data.py
ex1_run_gcate.py		ex1_run_gcate.py
ex2_generate_data.py		ex2_generate_data.py
ex2_run_cate.R		ex2_run_cate.R
ex2_run_gcate.py		ex2_run_gcate.py
ex2_run_glm.py		ex2_run_glm.py
ex3_generate_data.R		ex3_generate_data.R
ex3_run_cate.R		ex3_run_cate.R
ex3_run_gcate.py		ex3_run_gcate.py
ex3_run_glm.py		ex3_run_glm.py
ex4_GO.R		ex4_GO.R
ex4_preprocess_lupus.py		ex4_preprocess_lupus.py
ex4_run_cate.R		ex4_run_cate.R
ex4_run_cate_full.R		ex4_run_cate_full.R
ex4_run_gcate.py		ex4_run_gcate.py
ex4_run_gcate_full.py		ex4_run_gcate_full.py
ex4_run_glm.py		ex4_run_glm.py
ex5_blessing_dim.py		ex5_blessing_dim.py

Folders and files

Latest commit

History

Repository files navigation

Generalized confounder adjustment for testing and estimation (GCATE)

Files

Python module

Scripts

Jupyter notebooks:

Requirements

Python packages

R packages

Reproducibility workflow

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages