#genomics #genotype #bioinformatics #plink #snps

bed-reader

Read and write the PLINK BED format, simply and efficiently

33 releases (4 stable)

1.0.6 Nov 2, 2024
1.0.5 Jul 4, 2024
1.0.2 Feb 6, 2024
1.0.0 Nov 12, 2023
0.2.23 Jun 30, 2022

#39 in Biology

Download history 64/week @ 2026-02-18 5/week @ 2026-02-25

2,138 downloads per month

Apache-2.0

2MB
35K SLoC

JavaScript 25K SLoC // 0.2% comments Rust 6K SLoC // 0.1% comments Python 4K SLoC // 0.3% comments Jupyter Notebooks 94 SLoC // 0.1% comments Batch 1 SLoC

bed-reader

github crates.io docs.rs build status

Read and write the PLINK BED format, simply and efficiently.

Highlights

  • Fast and multi-threaded
  • Supports many indexing methods. Slice data by individuals (samples) and/or SNPs (variants).
  • The Python-facing APIs for this library is used by PySnpTools, FaST-LMM, and PyStatGen.
  • Supports PLINK 1.9.
  • Read data locally or from the cloud, efficiently and directly.

Install

Full version: Can read local and cloud files

cargo add bed-reader

Minimal version: Can read local files, only

cargo add bed-reader --no-default-features

Examples

Read all genotype data from a .bed file.

use ndarray as nd;
use bed_reader::{Bed, ReadOptions, assert_eq_nan, sample_bed_file};

let file_name = sample_bed_file("small.bed")?;
let mut bed = Bed::new(file_name)?;
let val = ReadOptions::builder().f64().read(&mut bed)?;

assert_eq_nan(
    &val,
    &nd::array![
        [1.0, 0.0, f64::NAN, 0.0],
        [2.0, 0.0, f64::NAN, 2.0],
        [0.0, 1.0, 2.0, 0.0]
    ],
);
# use bed_reader::BedErrorPlus; // '#' needed for doctest
# Ok::<(), Box<BedErrorPlus>>(())

Read every second individual (samples) and SNPs (variants) 20 to 30.

use ndarray::s;

let file_name = sample_bed_file("some_missing.bed")?;
let mut bed = Bed::new(file_name)?;
let val = ReadOptions::builder()
    .iid_index(s![..;2])
    .sid_index(20..30)
    .f64()
    .read(&mut bed)?;

assert!(val.dim() == (50, 10));
# use bed_reader::{Bed, ReadOptions, BedErrorPlus, assert_eq_nan, sample_bed_file}; // '#' needed for doctest
# Ok::<(), Box<BedErrorPlus>>(())

List the first 5 individual (sample) ids, the first 5 SNP (variant) ids, and every unique chromosome. Then, read every genomic value in chromosome 5.

# use ndarray::s; // '#' needed for doctest
# use bed_reader::{Bed, ReadOptions, assert_eq_nan, sample_bed_file};
# let file_name = sample_bed_file("some_missing.bed")?;
use std::collections::HashSet;

let mut bed = Bed::new(file_name)?;
println!("{:?}", bed.iid()?.slice(s![..5])); // Outputs ndarray: ["iid_0", "iid_1", "iid_2", "iid_3", "iid_4"]
println!("{:?}", bed.sid()?.slice(s![..5])); // Outputs ndarray: ["sid_0", "sid_1", "sid_2", "sid_3", "sid_4"]
println!("{:?}", bed.chromosome()?.iter().collect::<HashSet<_>>());
// Outputs: {"12", "10", "4", "8", "19", "21", "9", "15", "6", "16", "13", "7", "17", "18", "1", "22", "11", "2", "20", "3", "5", "14"}
let val = ReadOptions::builder()
    .sid_index(bed.chromosome()?.map(|elem| elem == "5"))
    .f64()
    .read(&mut bed)?;

assert!(val.dim() == (100, 6));
# use bed_reader::BedErrorPlus; // '#' needed for doctest
# Ok::<(), Box<BedErrorPlus>>(())

From the cloud: open a file and read data for one SNP (variant) at index position 2. (See "Cloud URLs and CloudFile Examples" for details specifying a file in the cloud.)

use ndarray as nd;
use bed_reader::{assert_eq_nan, BedCloud, ReadOptions};
# #[cfg(feature = "tokio")] Runtime::new().unwrap().block_on(async { // '#' needed for doc test
let url = "https://raw.githubusercontent.com/fastlmm/bed-sample-files/main/small.bed";
let mut bed_cloud = BedCloud::new(url).await?;
let val = ReadOptions::builder().sid_index(2).f64().read_cloud(&mut bed_cloud).await?;
assert_eq_nan(&val, &nd::array![[f64::NAN], [f64::NAN], [2.0]]);
# Ok::<(), Box<dyn std::error::Error>>(()) }).unwrap();
# #[cfg(feature = "tokio")] use {tokio::runtime::Runtime, bed_reader::BedErrorPlus};

Dependencies

~32–48MB
~601K SLoC