0% found this document useful (0 votes)
64 views38 pages

Describing Variants: Recommendations For The Description of DNA Changes

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
64 views38 pages

Describing Variants: Recommendations For The Description of DNA Changes

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

Describing variants

"mutation nomenclature"

recommendations for the


description of DNA changes

Johan den Dunnen


chair SVD-WG

VarNomen @ HGVS.org
http://www.HGVS.org/varnomen/

© JT den Dunnen
HGVS / HVP / HUGO
Sequence Variant Description
working group
Working Group Members:
• Anne-Francoise Roux (EGT)
• Donna Maglott (NCBI/EBI)
• Jean McGowan-Jordan (ISCN)
• Peter Taschner (LSDBs)
• Raymond Dalgleish (LSDBs)
• Reece Hart (industry)
• Johan den Dunnen (chair)
• HGVS - Marc Greenblatt
• HUGO - Stylianos Antonarakis

© JT den Dunnen
Nomenclature
( describing DNA variants )

Stable

Meaningful Memorable

Unequivocal

© JT den Dunnen
Definitions
• prevent confusion
do not use "mutation"
use variant, disease-associated variant
do not use "polymorphism"
use variant, not disease-associated variant
do not use "pathogenic"
use disease-associated, a disease-associated variant

• better use neutral terms


sequence variant
alteration
CNV (Copy Number Variant)
SNV (Single Nucleotide Variant, not SNP)

© JT den Dunnen
Variant description
the basis http:// www.HGVS.org / varnomen

Hum Mutat (2016) 37:564-569

© JT den Dunnen
www.HGVS.org/varnomen
Follow the recommendations

when you disagree, start a debate -do not use
private rules, this only causes confusion

© JT den Dunnen
facebook & twitter

© JT den Dunnen
Versioning

version presented is 15.11 (Nov.2015)

© JT den Dunnen
Variant types
• change in sequence
ACATCAGGAGAAGATGTTC GAGACTTTGCCA
ACATCAGGAGAAGATGTTT GAGACTTTGCCA
ACATCAGGAGAAGATGTT GAGACTTTGCCA
ACATCAGGAGAAGATGTTCCGAGACTTTGCCA

• change in amount

Structural Variation (SV)


(Copy Number Variation)

• change in position

© JT den Dunnen
DNA, RNA, protein
• unique descriptions
prevent confusion

• DNA
A, G, C, T
g.957A>T, c.63-3T>C

• RNA
a, g, c, u
r.957a>u, r.(?), r.spl?

• protein ( mostly deduced )


three / one letter amino acid code
* = stop codon
p.(His78Gln)

© JT den Dunnen
Reference sequence
• use official HGNC gene symbols
• provide reference sequence
covering complete sequence
largest transcript
preferably a LRG
e.g. LRG_123
give accession.version number
e.g. NM_012654.3

• indicate
DNA
type of Reference Sequence
coding DNA c. genomic g.
mitochondrial m. non-coding RNA n.
RNA r.
protein p.

© JT den Dunnen
The LRG

EBI, NCBI, Gen2Phen

© JT den Dunnen
Numbering residues
• start with 1
genomic 1 is first nucleotide of file
no +, - or other signs
coding DNA 1 is A of ATG
for introns refer to genomic Reference Sequence

• repeated segments
assume most 3' as changed
( ...CGTGTG TG A… )

• coding DNA only


5' of ATG …, -3, -2, -1, A, T, G, …
no nucleotide 0
3' of stop *1, *2, *3, …
no nucleotide 0
intron
position between nt's 654 and 655
c.654+1, +2, +3, ………, -3, -2, c.655-1
change + to - in middle

© JT den Dunnen
Numbering
• RNA ( deduced mostly )

like coding DNA

• protein ( deduced only )

from first to last amino acid


rule of thumb: c. nucleotide position divided
by 3 roughly gives amino acid residue
description between parantheses

© JT den Dunnen
Reference Sequence
coding DNA reference sequence (c.)

non-coding DNA reference sequence (n.)

© JT den Dunnen
coding DNA or genomic ?
• human genome sequence
complete
covers all transcripts
different promoters, splice variants, diff.
polyA-addition, etc.
but
hg19 chr2:g.121895321_121895325del
is long & complicated
huge reference sequence files
new builds follow each other regularly
carries no understandable information

• coding DNA
does not cover all variants
but gives a clue towards position

© JT den Dunnen
Numbering - genomic 3

• g.12158663A>G
• g.23669859>C
• g.89112396G>A no relation to

• g.112775623C>G RNA & protein

• g.56569443A>T
• g.12741333T>G
• g.188153979G>C
© JT den Dunnen
Numbering - coding DNA
• c.1637A>G
protein coding region

• c.859+12T>C
in intron (5' half) relation to

• c.2396-6G>A
in intron (3' half)
RNA & protein

• c.-23C>G
5' of protein coding region (5' of ATG)

• c.*143A>T
3' of protein coding region (3' of stop)

• c.-89-12T>G
intron in 5' UTR (5' of ATG)

• c.-649+intron
79G>C
in 3' UTR (3' of stop)
© JT den Dunnen
Types of variation
• simple
substitution c.123A>G
deletion c.123delA
duplication c.123dupA
insertion c.123_124insC
other
conversion, inversion, translocation, transposition

• complex
indel c.123delinsGTAT

• combination
two alleles
of variants
c.[123A>G];[456C>T]
>1 per allele c.[123A>G;456C>T]

© JT den Dunnen
Substitution
• substitution designated by ">"
> not used on protein level

• examples
genomic g.54786A>T
cDNA c.545A>T
( NM_012654.3 : c.546A>T )

RNA r.545a>u

protein p.(Gln182Leu)

© JT den Dunnen
Deletion
• deletion
designated by "del"
range indicated by "_"

• examples
c.546del
c.546delT

c.586_591del
c.586_591delTGGTCA, NOT c.586_591del6

c.(780+1_781-1)_(1392+1_1393-1)del
exon 3 to 6 deletion, breakpoint not sequenced

© JT den Dunnen
Duplication
• duplication
designated by "dup"
range indicated by "_"

• examples
c.546dup
c.546dupT

c.586_591dup
c.586_591dupTGGTCA, NOT c.586_591dup6
do not describe as insertion

c.(780+1_781-1)_(1392+1_1393-1)dup
exon 3 to 6 duplication, breakpoint not sequenced
NOTE: dup should be in tandem

© JT den Dunnen
Insertion
• insertion
designated by "ins"
range indicated by "_"
! give inserted sequence

• examples
c.546_547insT
NOT c.546insT or c.547insT

c.1086_1087insGCGTGA
NOT c.1086_1087ins6

c.1086_1087insAB567429.2:g.34_12567
when large insert submit to database and
give database accession.version number

© JT den Dunnen
Inversion
• inversion
affecting at least 2 nucleotides
designated by "inv"
range indicated by "_"

• example
c.546_2031inv
NOT c.2031_546inv

© JT den Dunnen
Conversion
• conversion
affecting at least 2 nucleotides
designated by "con"
range indicated by "_"

• examples
c.546_657con917_1028
c.546_2031conNM_023541.2:c.549_2034

© JT den Dunnen
Sequence repeats
• mono-nucleotide
g.8932A(18_23)
stretches
() = uncertain
c.345+28T(18_23)
alleles 345+28T[18];[21]

• di-nucleotide stretches
c.1849+363CAG(13_19)
c.1849+363_1849+365(13_19)

• larger
g.532_3886(20_45)
3.3 Kb repeat

© JT den Dunnen
SNVs (SNPs)

• SNV's
at least once give description based
on genome reference sequence

hg19 chr9:g.3901666T>C

rs12345678:T>C
dbSNP entry

© JT den Dunnen
Characters & codes
• codes used
+, -, *
> substitution (nucleotide)
_ range
; separate changes (in/between alleles)
, more transcripts
() uncertain
[] allele
= equals reference sequence
? unknown
del deletion
dup duplication
ins insertion
inv inversion
con conversion
ext extension
fs frame shift

© JT den Dunnen
Uncertainty breakpoints
• Copy Number Variants
( last-normal_first-changed ) _ ( last-changed_first-normal ) del

BAC / PAC probe


chrX:g.(32218983_32238146)_(32984039_33252615)del
hg19

SNP-array
chrX:g.(32218983_32238146)_(32984039_33252615)del
GRCh36.p2
(rs2342234_rs3929856)_(rs10507342_rs947283)del

© JT den Dunnen
Uncertainty breakpoints 2

• whole exon changes


c.(423+1_424-1)_(631+1_632-1)del
intragenic deletion

c.(?_-79)_(631+1_632-1)del
deletion incl. 5' end

c.(423+1_424-1)_(*763_?)del
deletion incl. 3' end

c.(?_-79)_(*763_?)del
whole gene deletion, start/end undefined

describe what was actually tested

© JT den Dunnen
Alleles
• allele
indicated by "[ ]", separated by ";"

• 2 c.changes, 2 alleles
[428A>G] ; [83dupG]

• 1 c.allele, several changes


[12C>G ; 428A>G ; 983dupG]

• 2 c.changes, allele unknown


428A>G (;) 83dupG

• mosaicism
special cases spaces in
description used
c. 428A= / A>G for clarity only
chimerism
c. 428A= // A>G

© JT den Dunnen
Complex
• deletion / insertions
"indel"

c.1166_1177delinsAGT

• descriptions may become complex


when only an expert understands the
"code" consider database submission
description: c.875_941delinsAC111747.1

© JT den Dunnen
Changes in RNA
• description like DNA
r. / a, g, c, u

• examples
r.283c>u
r.0 no RNA from allele
r.? effect unknown
r.spl affects RNA splicing
r.(spl?) may affect splicing
r.283= no change
(equals reference sequence)

r.[=, 436_456del]
two transcripts from 1 allele

© JT den Dunnen
Changes in RNA 2

• one allele, 2 transcripts


effect on splicing not 100%

c.456+3G>C

on RNA r.[=, 436_456del]

> p.[=, Arg146_Lys152del]

© JT den Dunnen
Changes in protein
• description like DNA
p. / Ala, Cys, Gly, His, …, Ter
p. / A, C, D, E, F, G, H, …, *

• examples
nonsense
no RNA data
r.(?)
p.Trp65* (p.W65* / p.Trp65Ter) p.(Trp56*)
no stop
p.*1054Glnext*31
p.0 - no protein
p.Met1? - likely, but unknown effect
NOT p.Met1Val
fs - frame shift

© JT den Dunnen
Frame shifts
• short form (sufficient)
p.Arg83fs

• long from (more detail)

p.(Arg83Serfs*15) (no RNA analysis)


indicate
first amino acid changed do not try to include
position changes at DNA level
first changed amino acid
length shifted frame
(from first changed to * incl.)
do not describe del, dup, ins, etc.

© JT den Dunnen
Recent additions
• added versioning
to support users
easier to find latest changes
allows statement "following HGVS version 2.0"

• stricter definitions
separate different classes
added hierarchy
computer-generated description
automated error-checking ( Mutalyzer )

• simplified use special characters


"_", ";", "+", "*", …
improved consistency

© JT den Dunnen
Acknowledgement
Presentation prepared by:
Johan den Dunnen
Human Genetics & Clinical Genetics
Leiden University Medical Center
Leiden, Nederland

chair SVD-WG

date: April 2017

© JT den Dunnen

You might also like