Skip to content

masikol/cigar_collapser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

cigar_collapser

  1. Description.

  2. Installation.

  3. Usage and examples.

Description

A program that collapses CIGAR strings from SAM/BAM files making CIGAR short and easy to see which part of the read is actually aligned and which is clipped (see simple examples below).

It might useful when inspecting long reads (e.g. Oxford nanopore, PacBio), whose CIGAR strings are often very long too.

Installation

Way 1. Download a pre-built binary for Linux from GitHub

Releases binaries are here: https://github.com/masikol/cigar_collapser/releases.

Way 2. Install using Cargo

cargo install cigar_collapser

Here you can find out how to install Cargo: https://doc.rust-lang.org/cargo/getting-started/installation.html.

Way 3. Build from source

  1. Download a release source code from https://github.com/masikol/cigar_collapser/releases.

  2. Unpack the downloaded archive, say repeat_collapser_v1.0.0.tar.gz.

  3. Then test it and compile with Cargo:

cd repeat_collapser_v1.0.0/
cargo build --release

And find the built binary: target/release/cigar_collapser.

To test the installation:

cd repeat_collapser_v1.0.0/
cargo test

Usage and examples

No options, just pass CIGAR strings as CLI arguments or send them as stdin stream one per line.

Pass CIGAR strings as command line arguments

Command:

./cigar_collapser 123S10M10I5D333H 123S10I5D20X30=333H

Output:

          123-S|             20|          333-H
          123-S|             60|          333-H

For example, the first output line effectively means that:

  1. First 123 bases of the read are soft-clipped.

  2. Then, 20 bases are aligned to the reference.

  3. Last 333 bases are hard-clipped.

Pass CIGAR strings to read from stdin

Command:

echo '
123S10M10I5D333H
123S10I5D20X30=333H
' | ./cigar_collapser

Output:

          123-S|             20|          333-H
          123-S|             60|          333-H

Inspect a specific read in a BAM file

Command:

samtools view nanopore_mapped.bam \
    | cut -f1,6 \
    | grep '3298da8e-7f61-4c46-af7b-4e5ac01ccc99' \
    | cut -f2 \
    | cigar_collapser

Output:

        3,106-S|          5,035|          355-S
              .|          3,078|        5,418-H

Rust API

There is a single public function in the crate: collapse_cigar.

Signature:

pub fn make_collapsed_cigar(str_arg: &String) -> Result<String, String>

Example code:

use cigar_collapser::collapse_cigar;

fn main() {
    let cigar_str = String::from(
        "123S10I5D20X30=333H"
    );
    let collapsed_str = collapse_cigar(&cigar_str).unwrap();
    println!("{}", collapsed_str);
}

Output:

          123-S|             60|          333-H

About

A program that collapses CIGAR strings from SAM/BAM files making it short and easy to see which part of the read is clipped. Useful for long reads

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages