#genomics #bioinformatics #api-bindings #sex-determination #rad-seq

radsex-core

Core library for RADSex: sex-determination analysis from RAD-Sequencing data

1 unstable release

Uses new Rust 2024

0.2.3 Jun 4, 2026

#226 in Biology


Used in rsx-cli

GPL-3.0-or-later

250KB
6K SLoC

rsx

High-performance streaming toolkit for RAD-seq sex determination.

Rust rewrite of RADSex, drop-in CLI replacement with Python bindings and C FFI.

CI Documentation Crates.io PyPI License: GPL v3

Install

Pre-built binaries for Linux (x86_64/aarch64), macOS (x86_64/arm64), Windows (without map):

# See https://github.com/HaoZeke/rsx-rs/releases for the latest
curl -sSfL https://github.com/HaoZeke/rsx-rs/releases/download/v0.2.1/rsx-installer.sh | sh

From source

git clone https://github.com/HaoZeke/rsx-rs.git
cd rsx-rs
cargo build --release
# binary at target/release/rsx

Via pixi (dev / reproducible)

pixi run build
# or pixi run -e dev build-portable

Python bindings

pip install pyrsx

See the Python README for the high-level MarkerTable, TriageResult, narwhals, and plotting APIs.

30-second quickstart

# CLI
rsx process -i reads/ -o markers.tsv -T 8 -d 5
rsx distrib -t markers.tsv -p popmap.tsv -o distrib.tsv -G M,F
rsx signif -t markers.tsv -p popmap.tsv -o signif.tsv -G M,F --bayes
rsx map -t markers.tsv -p popmap.tsv -g genome.fa -o aligned.tsv -G M,F

# Python (high-level)
import pyrsx
pyrsx.process("reads/", "markers.tsv", threads=8, min_depth=5)
pyrsx.signif("markers.tsv", "popmap.tsv", "signif.tsv", test="fisher", correction="fdr", bayes=True)
tbl = pyrsx.MarkerTable.from_path("markers.tsv")
...

Full pipeline, memory guarantees, and all 10 commands (including new merge, pca, triage) are documented at https://rsx.rgoswami.me .

Features

  • All original RADSex commands + merge (external sort for 75M+ markers, ~500 MB RAM), pca (streaming Tucker), triage (Bayes + strict candidate ranking).
  • Bounded-memory streaming for every command — no O(n_markers) accumulation.
  • 2-6x+ faster than C++ RADSex on literature panels (byte-identical output when groups specified).
  • Python bindings (low-level + ergonomic MarkerTable / Arrow / narwhals), C API via cbindgen.
  • Optional: parquet I/O, MPI, minimap2 mapping (feature-gated for Windows).
  • Reproducible: pixi environments, ASV + literature benchmark harness, SymPy/Sollya proofs for the math.

Documentation

  • Full site: https://rsx.rgoswami.me (tutorials, command reference, architecture, HPC design, R + Python integration).
  • Paper (BMC Bioinformatics, in submission): see the companion manuscript repository or the forthcoming published version.
  • Reproducibility materials: the companion rsx_bmc_repro package (snakemake-orchestrated, MCA/Zenodo archive shape matching the rest of the collection) + the org files under repro/ in this repo.

Citation

Please cite the software article (when published) and the RADSex reference it extends.

See CITATION.cff (root) for the machine-readable entry.

For the benchmark data / figures used in the paper, also cite the deposited reproducibility archive (Zenodo / Materials Cloud Archive entry, to be minted from the rsx_bmc_repro package after the heavy builder runs).

RADSex reference: Feron et al., Mol Ecol Resour 2021. https://doi.org/10.1111/1755-0998.13360

Contributing

See CONTRIBUTING.md. We use pixi for dev envs, conventional commits, and the usual Rust cargo fmt && cargo clippy -D warnings && cargo test.

License

GPL-3.0-or-later. See LICENSE.

The C++/Python/RADSex heritage is similarly licensed; see original sources.


lib.rs:

Core library for RADSex: sex-determination analysis from RAD-Sequencing data.

This crate provides the computational core for analyzing RAD-seq data to identify sex-linked markers. It exposes both a Rust API and a C-compatible FFI layer (via cbindgen) for integration with C++, R, and other languages.

Key types

  • markers_table::MarkersTableStream: streaming / bounded-memory reader over marker depth tables (the central data structure for all commands).
  • commands: implementations of process, distrib, signif, pca, merge, etc.
  • stats: Bayesian and frequentist tests (Bayes factor, posterior, chi2, Fisher, G-test).

All commands are designed to run with O(n_individuals) or bounded temporary memory even on 50 GB+ tables.

Dependencies

~6–18MB
~375K SLoC