This repository contains analysis code and the NeuroCLIP foundation model training pipeline associated with the ssKIND paper.
Wu et al., "An AI-powered single-cell and spatial omics ecosystem for neurodegenerative disease"
ssKIND/
├── NeuroCLIP/ NeuroCLIP fine-tuning (vision-omics foundation model)
├── GWAS/ GWAS enrichment and scDRS per-cell disease scoring
└── pseudobulk_deg/ Pseudobulk differential expression (cell-type and class level)
All processed single-cell and spatial transcriptomic datasets, harmonized metadata, and cell-type annotations are publicly available at the ssKIND portal:
https://bmblx.bmi.osumc.edu/sskind/datasets
Download the relevant .h5ad files from the portal before running the analysis scripts. After downloading, set <BASE_DIR> in each script to your local data directory.
NeuroCLIP is a brain-specialized multimodal foundation model that aligns H&E histology with spatial transcriptomics. It is fine-tuned from the OmiCLIP backbone using ~0.96 million paired image patches and gene expression sequences from spatial transcriptomic data across nine neurodegenerative diseases.
Training data preprocessing follows the Loki pipeline for spatial transcriptomics image–expression pairing.
See NeuroCLIP/README.md for training instructions and pretrained model weights.
Scripts for GWAS enrichment analysis and per-cell disease relevance scoring using scDRS. GWAS summary statistics were obtained for AD, PD, ALS, and HD.
| Script | Description |
|---|---|
0_setup.sh |
Environment setup |
1_download_gwas.sh |
Download GWAS summary statistics |
2_prep_magma_input.py |
Prepare MAGMA input |
3_run_magma.sh |
Run MAGMA gene-level statistics |
4a_make_cov.py |
Build covariate matrix |
4b_prep_adata_new.py |
Prepare AnnData for scDRS |
4c_postprocess.py |
Postprocess scDRS results |
4_run_scdrs.py / .sh |
Run scDRS scoring |
4_submit_scdrs_all.sh |
Submit scDRS jobs (SLURM array) |
5_plot_summary_heatmap.R |
Summary heatmap |
5_plot_violin_region.py |
Violin plots by brain region |
5b_plot_umap_diseaseonly.py |
UMAP colored by scDRS score |
Pseudobulk differential expression pipeline using limma-voom. Two granularities are provided: cell-type level and class level (broader cell class, e.g., Excitatory Neurons, Glia).
| Script | Description |
|---|---|
generate_pseudobulk_v5.py |
Build pseudobulk counts (cell-type level) |
generate_pseudobulk_v5_class.py |
Build pseudobulk counts (class level) |
run_limma_deg_v5.R |
Limma-voom DEG (cell-type level) |
run_limma_deg_v5_class.R |
Limma-voom DEG (class level) |
submit_step1_v5.sh |
SLURM submission — pseudobulk generation |
submit_step1_v5_class.sh |
SLURM submission — class-level generation |
submit_step2_v5.sh |
SLURM submission — DEG |
submit_step2_v5_class.sh |
SLURM submission — class-level DEG |
Python (see requirements.txt):
pip install -r requirements.txtR: Requires R 4.4.0 with limma, edgeR, and clusterProfiler. On OSC:
module load gcc/12.3.0 R/4.4.0| Resource | Link |
|---|---|
| ssKIND web portal | https://bmblx.bmi.osumc.edu/ssKIND/ |
| ssKIND datasets | https://bmblx.bmi.osumc.edu/sskind/datasets |
| Data collection & processing agents (AI-integrated) | https://github.com/OSU-BMBL/ssKIND-collection-agents |
| NeuroCLIP pretrained weights | Google Drive |
Wu W, Kim TY, Xu J, Cheng H, Wang C, et al. An AI-powered single-cell and spatial omics ecosystem for neurodegenerative disease. [journal], 2026.
© BMBL and Matrix Lab. This model and associated code are released under the BSD 3-Clause License and may only be used for non-commercial, academic research purposes with proper attribution.