GitHub - selkamand/seqlib: A rust library for working with DNA and RNA sequences

seqlib is a Rust library for working with DNA and RNA sequences.

It provides a robust core representation for biological sequences with:

explicit DNA / RNA alphabets
full IUPAC ambiguity support
compiler-enforced correctness for operations like complementation

seqlib is designed as a library dependency, not a command-line tool.
It is developed to support the scarscape project, but is usable on its own for any Rust code that needs reliable nucleotide sequence handling.

Design goals

Correctness first: invalid sequences are rejected at construction
Type safety: DNA and RNA sequences are distinct types, not runtime flags
Strict alphabets: DNA and RNA reject invalid bases at construction (e.g. U in DNA, T in RNA)
Explicit ambiguity handling: ambiguous IUPAC bases (S/W/N) are modeled, not ignored
Ergonomic by default: core sequence operations use copy-on-modify semantics, making them easy to compose and safe for downstream use.
Explicit performance opt-ins: in-place modifying methods are provided to maximise performance or minimise memory-footprint
Small surface area: no I/O, no parsing frameworks, no CLI
Composable: intended to be embedded in larger bioinformatics pipelines

Adding to your project

Add seqlib library to your project:

cargo add seqlib --git https://github.com/selkamand/seqlib

Core types

Most users should work with the concrete sequence types:

DnaSeq — validated DNA sequences (A, C, G, T + IUPAC ambiguity codes)
RnaSeq — validated RNA sequences (A, C, G, U + IUPAC ambiguity codes)

These are type aliases over a generic Seq<B> implementation and are the recommended entry points for sequence creation and manipulation.

The generic Seq<B> type exists to support reusable algorithms and future extensions, but most downstream code should not need to interact with it directly.

Basic usage

use seqlib::sequences::DnaSeq;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let seq = DnaSeq::new("ACGTN")?;

    println!("Sequence: {}", seq);
    println!("Length: {}", seq.len());
    println!("Reverse complement: {}", seq.reverse_complement());

    Ok(())
}

Invalid input is rejected immediately:

let bad = DnaSeq::new("ACGTX"); // returns Err(...)

Copy-on-modify vs in-place mutation

seqlib is designed to be ergonomic and safe by default.

All standard sequence operations—such as reverse complementation, subsequence extraction, and reversal—use copy-on-modify semantics. These methods return new sequences rather than mutating existing ones, making them easy to compose, store, and pass through downstream code without surprising side effects.

For performance-critical or memory-sensitive workflows, seqlib also exposes explicit in-place mutation methods (e.g. reverse_complement_in_place). These methods are clearly named and opt-in, allowing callers to trade ergonomics for efficiency when appropriate.

Ambiguity handling

seqlib supports IUPAC ambiguity codes (e.g. N, R, Y, S), but does not treat ambiguous symbols as wildcards in higher-level analyses.

Ambiguous bases represent unknown concrete bases, not flexible matches. As a result, operations that require certainty (such as palindromic sequence detection) will conservatively return false if ambiguity is present.

Palindromic sequences

seqlib defines palindromic sequences using a strict, certainty-based definition.

A sequence is considered palindromic only if:

it contains no ambiguous bases,
its length is non-zero and even, and
it is identical to its reverse complement at the level of concrete bases.

This guarantees that palindromic sequences can be counted and filtered without overcounting symbolically palindromic but ambiguous inputs such as NNNNNN.

use seqlib::sequences::DnaSeq;

assert!(DnaSeq::new("GAATTC").unwrap().is_palindromic());
assert!(!DnaSeq::new("NNNNNN").unwrap().is_palindromic());
assert!(!DnaSeq::new("AAA").unwrap().is_palindromic());

This conservative definition is intentional and designed for statistical and motif-based analyses where false positives must be avoided.

Features

DNA and RNA alphabets with full IUPAC ambiguity support
Infallible complement and reverse-complement operations
Explicit ambiguity detection
Allocation-aware APIs with both copy-on-modify and in-place variants

Examples

See the examples/ directory for typical usage patterns and small self-contained demonstrations.

Out of Scope

To keep seqlib focused the following features will not be implemented in this library.

No FASTA / FASTQ parsing
No file I/O
No CLI
No alignment, variant calling, or translation logic
No soft-masking or case preservation

seqlib is intended to be a foundation, not a full bioinformatics toolkit.

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
.github/workflows		.github/workflows
assets		assets
examples		examples
src		src
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Design goals

Adding to your project

Core types

Basic usage

Copy-on-modify vs in-place mutation

Ambiguity handling

Palindromic sequences

Features

Examples

Out of Scope

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Design goals

Adding to your project

Core types

Basic usage

Copy-on-modify vs in-place mutation

Ambiguity handling

Palindromic sequences

Features

Examples

Out of Scope

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages