NvFaidx + GenomeIntervalDataset

PyTorch multi-processing safe fasta and bed dataset, adapted from NVIDIA/bionemo-framework and lucidrains/enformer-pytorch

Installation

pre-built binaries available here
select the right python version
pip install https://github.com/johahi/nvfaidx/releases/download/v0.0.1/nvfaidx-0.0.1-cp39-cp39-linux_x86_64.whl

Usage

from nvfaidx import GenomeIntervalDataset
genome_ds = GenomeIntervalDataset(
    bed_file = 'some_bed.bed',
    fasta_file = 'some_fasta.fa',
    # schema_overrides=[pl.String, pl.Int64,pl.Int64], (required if case the bed file chromosomes are oddly named)
    context_length = 384, # automatically pads or crops sequences to desired length
    return_seq_indices = False # returns one-hots
    )

In case return_seq_indices = True, maps

A : 1
C : 2
G : 3
T : 4
N : 5
. : 6

Name		Name	Last commit message	Last commit date
Latest commit History 340 Commits
.devcontainer		.devcontainer
.github/workflows		.github/workflows
LICENSE		LICENSE
pysrc/nvfaidx		pysrc/nvfaidx
rust/src		rust/src
tests/bionemo/noodles		tests/bionemo/noodles
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md
license_header		license_header
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NvFaidx + GenomeIntervalDataset

Installation

Usage

About

Uh oh!

Releases 4

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

NvFaidx + GenomeIntervalDataset

Installation

Usage

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages