Skip to content

lh3/panmask

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Introduction

Panmask provides a list of easy/hard regions for short-read variant calling against the human genome GRCh38. The easy regions harbor small variants that are easy to call, with most variant callers achieving 98-99.5% accuracy in the regions. They cover 87.9% of GRCh38, 92.6% of coding regions and 95.8% of pathogenic variants in ClinVar. The panmask regions may help to reduce variant calling artifacts and simplify variant filtering. They can be downloaded from Zenodo.

Other Datasets

GRCh38 easy regions (where variant calls tend to be accurate in most samples):

HG002 confident regions (where small variant calls can be trusted):

Other datasets used for evaluation:

Short-read small variant calls, published in Baid et al (2020). Only VCFs called from HG002 PCR-free NovaSeq data at 30X are used. Data files in this repo are released under CC0 and will be available at GigaDB.

About

Easy genomic regions for short-read variant calling

Resources

License

Stars

Watchers

Forks