fasta2embeddings

Convert a fasta/fastq file into embeddings via Evo 2.

Usage

You can convert any fasta/fastq sequence into embeddings. The file can also be gzipped. Multiple files are not supported for now.

# With Nextflow
nextflow run artorias111/fasta2embeddings --sequence /path/to/sequence.fasta -c /custom/config 

# run with aliasing on Illinois Campus Cluster
embeddings --sequence /path/to/sequence.fa

Config structure

nextflow.config contains sensible defaults. You need to provide a custom config for your specific HPC/local computer. See taiga_evo7b.config for an example. Once you create a custom config file, it's augmented with the default nextflow.config, and you can pass the custom config with -c in your nextflow run command.

The pipeline expects the following in your environment (can be conda or venv):

Evo 2 (Cuda 12.8) : https://github.com/ArcInstitute/evo2
EasyEvo2 : https://github.com/ylab-hi/EasyEvo2

Output

Embeddings are in the safetensors format (https://github.com/huggingface/safetensors).
A word of caution: The embeddings are the same length as the sequence (the exact dimensions can differ based on the Evo 2 model you're using) - see EasyEvo2's documentation. If your sequences are of different lengths, your embeddings will also reflect the same. That's not ideal for most downstream analyses without filtering. On the flipside, an ideal case to use without filtering would be to generate embeddings for k-mers.

There's two directories: work and *.safetensors. The *.safetensors directory contains a cleaned up collection of output files. work contains all the intermediate files and also a copy of the final output. You can safely remove this once you have all your safetensors. See https://www.nextflow.io/docs/latest/workflow.html#outputs for more information.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
bin		bin
data		data
.gitignore		.gitignore
README.md		README.md
main.nf		main.nf
nextflow.config		nextflow.config
taiga_evo7b.config		taiga_evo7b.config

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

fasta2embeddings

Usage

Config structure

Output

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

fasta2embeddings

Usage

Config structure

Output

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages