A pipeline for Detecting Allele Specific Expression

Data: RNA-seq data

Source: Allim and Fear et.al 2016.

Goal: This pipeline is designed to detect allele specific expression in population RNA-seq data.

Installation

Download the files.

Requirement

bamtools

samtools

bedtools

hisat2

biopython, numpy, collections (python)

Pipeline workflow

1 Extract all the homologous single nucleotide mismatches from two parents

2 Replace all these sites into 'N' and generate a masked reference

3 Estimate mapping bias by simulation.

3.1 Generate reference sequence for two parents by replacing mismatches

3.2 Simulate non-error short reads from two parents

3.3 Align two parents simulated reads to the masked reference

3.4 Count the number of read for each single nucleotide mismatch

4 Estimate the expression for each exon mismatch

3.1 Align all the raw data to the masked reference

3.2 Count the number of read for each single nucleotide mismatch

Usage

Usage: ./pipeline.pl --vcf1 --vcf2 --ref --fastq --outDir --bin --exon
Options:
--vcf1 STR          Parent1 genome vcf file
--vcf2 STR          Parent2 genome vcf file
--ref STR           Genome reference file
--fastq STR         A file contains all the fastqs.
                    Header format: fastq1	fastq2	Info1	Info2	Info3...
                    Body format:   fq1	fq2	Info1Value	Info2Value	Info3Value...
                    Note1: Info[1-n] and their values will appear in the final table header and body, respectively.
                    Note2: The combination of 'Info1-Info2-...-Info[n]' should be unique.
--bin STR           Bin directory. 'InstallationPath/bin'
--exon STR          A non-overlap file contains all the exon and gene infomation.
                    Format: chromosome	start	end	exonName	geneName
--mode (run|script) run: run the pipeline directly [run]
                    script: generate script instead of running (step1.sh step2[id1,id2,...].sh step3.sh)
--vcf1Name STR      Name for vcf1. [190]
--vcf2Name STR      Name for vcf2. [226]
--outDir STR        Output directory
--cpu INT           Number of cpu. [1]
--readlength INT    Maximum read length for simulation. [151]
--insertsize INT    Read insert size for simulation.    [350]
--hisat2 STR        Path for hisat2 [hisat2]
--hisat2-build STR  Path for hisat2-build [histat2-build]
--samtools STR      Path for samtools [samtools]
--bedtools STR      Path for bedtools [bedtools]

Example

Example data and script are provided in "example" directory. cd example; sh example.sh

Result

simulation.csv

chromosome	site	Parent1Allele1	Parent1Allele2	Parent2Allele1	Parent2Allele2	exonID	geneID
scf4459	10579	290	0	0	299	maker-scf4459-snap-gene-0.23-mRNA-1_exon_2	maker-scf4459-snap-gene-0.23
scf4459	10766	294	0	0	291	maker-scf4459-snap-gene-0.23-mRNA-1_exon_3	maker-scf4459-snap-gene-0.23

ASE.csv

chromosome	site	Parent1_hybrid	Parent2_hybrid	OtherAllele	exon_id	gene_id	Exon_expression	Gene_expression	Run	line	id
scf4459	10579	16	0	0	maker-scf4459-snap-gene-0.23-mRNA-1_exon_2	maker-scf4459-snap-gene-0.23	20	264	Run_1	33	1
scf4459	13114	1	0	0	maker-scf4459-snap-gene-0.23-mRNA-10_exon_4	maker-scf4459-snap-gene-0.23	26	264	Run_1	33	1

Contact

Email: uqhshao at uq.edu.au or haojingshao at gmail.com

Citation

Manuscript in preparation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

A pipeline for Detecting Allele Specific Expression

Installation

Requirement

Pipeline workflow

Usage

Example

Result

Contact

Citation

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
bin		bin
example		example
LICENSE		LICENSE
README.md		README.md
pipeline.pl		pipeline.pl

License

zm-git-dev/ASE-pipeline

Folders and files

Latest commit

History

Repository files navigation

A pipeline for Detecting Allele Specific Expression

Installation

Requirement

Pipeline workflow

Usage

Example

Result

Contact

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages