xloci

get sequences from 2bit/fa using bed/gtf/gff docs . usage . install . conda

Overview

This tool provides an easy way to get any sequence (exon, intron, cds, utr, etc.) completely agnostic of the underlying format, either for reference sequences (2bit, fa, fa.gz) or regions (bed, gtf, gff, gz, bz2, zstd).

Quick Start

Installation

to install xloci on your system follow this steps:

get rust: curl https://sh.rustup.rs -sSf | sh on unix, or go here for other options
run cargo install xloci (make sure ~/.cargo/bin is in your $PATH before running it)
use xloci with the required arguments
enjoy!

Build

to build xloci from this repo, do:

get rust (as described above)
run git clone https://github.com/alejandrogzi/xloci.git && cd xloci
run cargo run --release -- -i <GTF/GFF> -o <BED>

Container image

to build the development container image:

run git clone https://github.com/alejandrogzi/xloci.git && cd xloci/assets
initialize docker with start docker or systemctl start docker
build the image docker image build --tag xloci .
run docker run --rm -v "[dir_where_your_gtf_is]:/dir" xloci -s /dir/<SEQUENCE> -r /dir/<REGIONS>

Conda

to use xloci through Conda just:

conda install xloci -c bioconda or conda create -n xloci -c bioconda xloci

Nextflow

to use xloci through Nextflow just:

nextflow run alejandrogzi/xloci -r <REGIONS> -s <SEQUENCE> or borrow the xloci.nf file from this repo

Usage

Usage: xloci [OPTIONS] --regions <REGIONS> --outdir <OUTDIR>

Options:
 -s, --sequence <SEQUENCE>
         Path to genome sequence file (.fa, .fa.gz, or .2bit); reads from stdin when omitted
 -r, --regions <REGIONS>
         Path to genomic regions file (BED, GTF, or GFF format)
 -o, --outdir <OUTDIR>
         Output directory for extracted sequences
 -c, --chunks <CHUNKS>
         Number of records per parallel processing chunk [default: 1000]
 -u, --upstream-flank <UPSTREAM_FLANK>
         Bases to extend upstream of features [default: 0]
 -d, --downstream-flank <DOWNSTREAM_FLANK>
         Bases to extend downstream of features [default: 0]
 -f, --feature <FEATURE>
         Type of genomic feature to extract [default: exon] [possible values: transcript, exon, intron, cds, utr]
 -I, --ignore-errors
         Continue processing on errors instead of panicking
 -L, --level <LEVEL>
         Logging verbosity level [default: info]
 -p, --prefix <PREFIX>
         Stem for output FASTA files (writes <prefix>.fa or <prefix>.fa.gz) [default: output]
 -X, --translate
         Translate sequences to protein
 -U, --unmask
         Convert soft-masked bases to uppercase in output
 -S, --split-extraction
         Emit one output record per extracted feature piece
     --as-tsv
         Write tab-separated output instead of FASTA
     --add-tab
         Separate flank columns in TSV output (requires --as-tsv and at least one flank)
 -G, --generic-id
         Use genomic coordinates as identifiers instead of record names
 -A, --as-chunk
         Keep chunk outputs and skip merging into a single file
 -B, --include-bed
         Also emit chunked BED outputs (requires --as-chunk)
 -Z, --compress
         Gzip-compress output files
 -t, --threads <THREADS>
         Number of threads [default: 16]
 -h, --help
         Print help
 -V, --version
         Print version

Benchmarks

Feature comparison

Tool	BED12 support	GTF/GFF support	FASTA support	.2bit support	spliced/exon extraction	strand-aware	notes/limitations	install source	link
xloci	✅	✅	✅	✅	yes (-f exon)	yes (default RC on minus)	genepred overhead	cargo/git/docker/bioconda(*)	this repo
bedtools getfasta	✅	❌	✅	❌	yes (BED12 blocks via -split)	yes (-s)	.2bit not supported; splicing semantics are BED12-block based	Bioconda; upstream repo	1
gffread	✅	✅	✅	❌	yes (-w spliced exons; -x spliced CDS)	unspecified	Documented speedup with FASTA .fai; .2bit not documented	binaries/source/GitHub (official page)	21
agat_sp_extract_sequences.pl	❌	✅	✅	❌	yes (-t exon --merge, --mrna, etc.)	yes (default RC on minus; controls available)	no BED input; .2bit not documented	unspecified	13
UCSC twoBitToFa	✅	❌	❌	✅	yes (“exclude introns” from BED blocks)	yes (RC on - strand)	Designed for .2bit; for GTF/GFF you must convert to BED first	Bioconda recipe; UCSC docs	22
GenomeTools gt extractfeat	❌	✅	✅	❌	join support yes (-join)	unspecified	Uses GFF3 graphs; strand behavior not stated in the short manual excerpt	unspecified in cited excerpts	16
gff3_to_fasta (GFF3 Toolkit)	❌	✅	✅	❌	yes (-st trans spliced transcripts)	unspecified	GFF3-only per docs; swiss-army script with multiple sequence types	Python script; packaging unspecified	19
TopHat gtf_to_fasta	❌	✅	✅	❌	yes (exon concatenation)	no (inferred from shown source excerpt)	Legacy; strand/orientation behavior is not presented as a documented feature here	unspecified	20

Runtime comparison

Note

Benchamrk was done with hyperfine on a AMD Ryzen 7 5700X with 128 GB of RAM and 16 cores. AGAT was excluded from FASTA + GTF because of extremely long runtimes (over 10 minutes) and poor performance. GFF3 Toolkit was not included because of problems with the installation. GenomeTools was excluded because of problems with the installation.

2bit + BED

Command	Mean [s]	Min [s]	Max [s]	Relative
`xloci -s tmp/hg38.2bit -o output -r tmp/gencode.v44.annotation.bed`	5.567 ± 0.028	5.538	5.593	1.00
`twoBitToFa -bed=tmp/gencode.v44.annotation.bed tmp/hg38.2bit output.fa`	58.787 ± 1.001	58.075	59.932	10.56 ± 0.19

FASTA + BED

Command	Mean [s]	Min [s]	Max [s]	Relative
`xloci -s tmp/hg38.fa -o output -r tmp/gencode.v44.annotation.bed`	4.164 ± 0.012	4.152	4.176	1.00
`bedtools getfasta -fi tmp/hg38.fa -bed tmp/gencode.v44.annotation.bed -split -name -fo output.fa`	12.167 ± 0.116	12.047	12.277	2.92 ± 0.03
`gffread -w output.fa -g tmp/hg38.fa --in-bed tmp/gencode.v44.annotation.bed`	6.550 ± 0.101	6.485	6.666	1.57 ± 0.02
`bed2gtf -b tmp/gencode.v44.annotation.bed -o transcripts.gtf -n && agat_sp_extract_sequences.pl -g transcripts.gtf -f tmp/hg38.fa -t exon --merge --cpu 16`	918 ± 0.151	917.123	923.312	220.67 ± 0.02

2bit + GTF

Command	Mean [s]	Min [s]	Max [s]	Relative
`xloci -s tmp/hg38.2bit -o output -r tmp/gencode.v44.annotation.gtf`	10.835 ± 0.011	10.827	10.847	2.08 ± 0.00
`gxf2bed -i tmp/gencode.v44.annotation.gtf -o transcripts.bed && xloci -s tmp/hg38.2bit -o output -r transcripts.bed`	10.889 ± 0.037	10.863	10.931	2.09 ± 0.01
`gffread tmp/gencode.v44.annotation.gtf --bed -o transcripts.bed && cat -p --color never transcripts.bed \| choose :11 -o '\t' > tmp.bed && xloci -s tmp/hg38.2bit -o output -r tmp.bed`	9.080 ± 0.031	9.045	9.103	1.74 ± 0.01
`gffread tmp/gencode.v44.annotation.gtf --bed -o transcripts.bed && cat -p --color never transcripts.bed \| choose :11 -o '\t' > tmp.bed && twoBitToFa -bed=tmp.bed tmp/hg38.2bit output.fa`	5.215 ± 0.008	5.209	5.224	1.00
`gxf2bed -i tmp/gencode.v44.annotation.gtf -o transcripts.bed && sort -k1,1 -k2,2n -k3,3n transcripts.bed > tmp.bed && twoBitToFa -bed=tmp.bed tmp/hg38.2bit output.fa`	11.061 ± 0.022	11.048	11.087	2.12 ± 0.01

FASTA + GTF

Command	Mean [s]	Min [s]	Max [s]	Relative
`xloci -s tmp/hg38.fa -o output -r tmp/gencode.v44.annotation.gtf`	9.417 ± 0.024	9.403	9.445	1.80 ± 0.01
`gffread tmp/gencode.v44.annotation.gtf --bed -o transcripts.bed && cat -p --color never transcripts.bed \| choose :11 -o '\t' > tmp.bed && bedtools getfasta -fi tmp/hg38.fa -bed tmp.bed -split -name -fo output.fa`	5.230 ± 0.008	5.221	5.237	1.00
`gxf2bed -i tmp/gencode.v44.annotation.gtf -o transcripts.bed && bedtools getfasta -fi tmp/hg38.fa -bed transcripts.bed -split -name -fo output.fa`	17.477 ± 0.045	17.436	17.525	3.34 ± 0.01
`gffread -w output.fa -g tmp/hg38.fa tmp/gencode.v44.annotation.gtf`	10.368 ± 0.057	10.322	10.432	1.98 ± 0.01
`gtf_to_fasta tmp/gencode.v44.annotation.gtf tmp/hg38.fa output.fa`	40.49 ± 0.024	40.01	40.76	7.74 ± 0.01

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.github/workflows		.github/workflows
assets		assets
xloci		xloci
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

xloci

Overview

Quick Start

Installation

Build

Container image

Conda

Nextflow

Usage

Benchmarks

Feature comparison

Runtime comparison

2bit + BED

FASTA + BED

2bit + GTF

FASTA + GTF

About

Uh oh!

Releases 5

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

xloci

Overview

Quick Start

Installation

Build

Container image

Conda

Nextflow

Usage

Benchmarks

Feature comparison

Runtime comparison

2bit + BED

FASTA + BED

2bit + GTF

FASTA + GTF

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 5

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages