This repository contains code used for the bioinformatic analysis of the scRNA-seq and scATAC-seq data in Gagler et al., 2025, Blood
The raw data for this study can be found on the NCBI's Gene Expression Omnibus (GEO) database under accession GSE296167. The code included in this repository cover the major analytical workflow of the study, outlined as follows:
- scRNA_Analysis.Rmd - scRNA preprocessing, QC, integration, and creation of input files for CellTypist
- CellTypist_Analysis.ipynb - Running CellTypist in python
- PostCellTypist_h5_to_Seurat_Conversion.Rmd - Converting CellTypist outputs back into Seurat objects
- scATAC_Analysis_PreprocessingQC_Clustering_scRNA_Integration.Rmd - scATAC processing and integration with scRNA
- PostIntegration_AddingPeaksDeviations.Rmd - Calling peaks and chromVAR deviations on integrated data
- run_TRUST4.sh - Running TRUST4 on each patient
- TRUST4_Annotation.Rmd - Adding TRUST4 annotations to integrated object
- PrimaryAnalysis.Rmd - Code for executing analyses involved in main figure generation
- SupplementaryAnalysis.Rmd - Code for executing analyses involved in supplementary figure generation
More details regarding analysis can be found in the comments of the associated R markdown scripts. In addition to many R packages including Seurat and ArchR, this analysis requires the installation and execution of MACS2 and TRUST4.
For questions, please reach out to dylangagler@gmail.com or gareth.morgan@nyulangone.org.