This pipeline merges multiple datasets to define an atlas of 3' UTRs. At its core is a tabulation of how many distinct cell types use a particular tandem (non-intronic) isoform at either 5% or 10% frequency. Genes that have more that two isoforms of this type are classified as multi-UTR genes.
The accompanying manuscript is openly available at:
Fansler, M.M., Mitschka, S. & Mayr, C. Quantifying 3′UTR length from scRNA-seq data reveals changes independent of gene expression. Nat Commun 15, 4050 (2024). https://doi.org/10.1038/s41467-024-48254-9
The pipeline relies on Snakemake and Conda/Mamba. If Conda is not installed, we recommend a Miniforge variant, specifically Mambaforge.
To run with the same Snakemake version, please recreate the environment using:
# replace 'mamba' with 'conda'
mamba env create -f envs/snakemake_5_31.min.yamlAfter activating the above environment (conda activate snakemake_5_31), the pipeline can be run with:
snakemakewhere the Snakefile is in the working directory.
One will need to update the config.yaml file to provide the file locations.
The datasets are assumed to result from the scUTRquant pipeline. They must be added to config.yaml, under the sce object, and added to the merge_sces rule in Snakefile. The colData columns retained in the merged dataset are:
- cell_id
- tissue
- cell_type
- cluster
- sample
- age
Conforming data is done in scripts/merge_sces.R.