A program that collapses CIGAR strings from SAM/BAM files making CIGAR short and easy to see which part of the read is actually aligned and which is clipped (see simple examples below).
It might useful when inspecting long reads (e.g. Oxford nanopore, PacBio), whose CIGAR strings are often very long too.
Releases binaries are here: https://github.com/masikol/cigar_collapser/releases.
cargo install cigar_collapserHere you can find out how to install Cargo: https://doc.rust-lang.org/cargo/getting-started/installation.html.
-
Download a release source code from https://github.com/masikol/cigar_collapser/releases.
-
Unpack the downloaded archive, say
repeat_collapser_v1.0.0.tar.gz. -
Then test it and compile with Cargo:
cd repeat_collapser_v1.0.0/
cargo build --releaseAnd find the built binary: target/release/cigar_collapser.
To test the installation:
cd repeat_collapser_v1.0.0/
cargo testNo options, just pass CIGAR strings as CLI arguments or send them as stdin stream one per line.
Command:
./cigar_collapser 123S10M10I5D333H 123S10I5D20X30=333HOutput:
123-S| 20| 333-H
123-S| 60| 333-H
For example, the first output line effectively means that:
-
First 123 bases of the read are soft-clipped.
-
Then, 20 bases are aligned to the reference.
-
Last 333 bases are hard-clipped.
Command:
echo '
123S10M10I5D333H
123S10I5D20X30=333H
' | ./cigar_collapserOutput:
123-S| 20| 333-H
123-S| 60| 333-H
Command:
samtools view nanopore_mapped.bam \
| cut -f1,6 \
| grep '3298da8e-7f61-4c46-af7b-4e5ac01ccc99' \
| cut -f2 \
| cigar_collapserOutput:
3,106-S| 5,035| 355-S
.| 3,078| 5,418-H
There is a single public function in the crate: collapse_cigar.
Signature:
pub fn make_collapsed_cigar(str_arg: &String) -> Result<String, String>Example code:
use cigar_collapser::collapse_cigar;
fn main() {
let cigar_str = String::from(
"123S10I5D20X30=333H"
);
let collapsed_str = collapse_cigar(&cigar_str).unwrap();
println!("{}", collapsed_str);
}Output:
123-S| 60| 333-H