0% found this document useful (0 votes)
19 views44 pages

Cancerbio 8

cancer bio week 8 slides

Uploaded by

elenamrdja
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views44 pages

Cancerbio 8

cancer bio week 8 slides

Uploaded by

elenamrdja
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

Cancer Bio I

SEMESTER 1

Mireia Blanco Gómez


Life Sciences Engineering

1
Cancer Genomics
Today we will talk about alterations of DNA sequence.

Cancer cells are associated with genetic abnormalities

One of the most common truths about cancer is that it is driven by genetic alterations (well, this is not
completely true but genetic alterations are the major drivers).

The evidence of this goes back a century ago → studies made by Boveri in Sea Urchin.

He caused abnormal segregation of chromatids during cell replication

A malignant tumour cell is [...] a cell with a specific abnormal chromosome constitution

By forcing abnormal mitosis → he generated:

Non viable cells

Few viable cells

exhibited cancer phenotype

Longer survival, faster +


uncontrolled proliferation

So! He deduced that a malignant tumour cell is a cell that has an abnormal chromosome constitution!

2
He linked genetic abnormalities to cancer phenotype.

But!! As many times in science → these (correct) ideas were dismissed by the scientific community.
Why?! Because at that time (~first half of the 20th century) cancer was thought to be an infectious disease (=
caused by viruses).

At that time → cancer was caused by viruses.

This idea was supported by the work of many researches: one of them Rous.

 he injected a specific virus that caused sarcoma (Rous sarcoma virus), and it’s a retrovirus.

Upon injection → the chicken would develop tumours.

So…if we inject a virus and it causes cancer = cancer has to be a viral disease!!

When the retrovirus was injected, the viral RNA was retrotranscribed into the DNA of the
chicken so that the cells would start expressing viral proteins.

1910-1970: Viruses drive cancer

3
From tumor virology to oncogenes

Now that they found that cancer was induced when injecting the virus, they wanted to know
which protein was responsible for cancer emergence?

To answer the question → cut the viral sequences to determine when the virus was NO longer
able to generate a tumour.

if you cut the tail of the viral sequence → no more cancer

So the viral gene had to be there!!! → that’s where the oncogenic driver of the RSV virus had to be!!

Because yes, cancer is caused by virus (it was thought so at the time), but which gene? Which
protein? → in this case a gene that was in the tail .

2 scientists Bishop + Varmus designed some probes that would specifically recognize the
transformation-associated (src) sequences of the RSV genome to understand its origins and functions. It
was used to follow the fate of the scr gene after the cells were infected. The notion was that uninfected
chicken cells would carry no scr-related DNA sequences and would become very detectable after
infection.

this would give evidence of the insertion into the genome of the chicken.

4
But…what they found was very surprising → the viral protein causing cancer was already in the native
chicken that were NOT exposed to the virus!!!! What they found is that after infection → the gene was
abnormally activated!

THIS was a strong demonstration that cancer was produced by oncogenes that were already present in
healthy cells → cancer was induced when their expression was abnormal.

The oncogenic component of RSV (the cancer driver!) is a cellular gene already present in healthy

THIS is the first time that we really talk about cancer genomics → the first time that we talk about
oncogenes and tumour suppressor genes.

Proof that cancer is a genetic disease NOT a viral one

Ojo! Virus can contribute to causing it by overactivating proteins that are normally present in healthy
cells, but cancer is NOT driven by viruses.

5
Oncogenes & Tumor Suppressors

Since the discovery of Scr → more oncogenes + tumour suppressors were discovered also using the
same technique (= through the identification of viral sequences driven tumorigenesis)

We distinguish

Oncogenes → they act as “accelerators” and are found overactivated

Tumour suppressor genes → they normally act as “brakes” and are found inactivated.

This is the start of cancer genomics.

Discovery of cancer cell alterations

Another boost was the sequencing of the human genome + development of NGS → allowing to sequence
the human genome much faster and in a cheaper way.

Thanks to the new sequencing techniques hundreds of thousands of tumours could be sequenced and
it lead to the discovery of many oncogenes + tumour suppressor genes

Remember! They are normally expressed in our cells BUT


overexpressed/inactivated to drive tumorigenesis.

6
So, a key feature of cancer is that it has genetic alterations → over-activation or inactivation of genes.
A way for this to happen can be viruses BUT actually only a very low percentage of cancers are caused by viral
infection→ in most cases: alterations in the DNA sequences.

Cancer Genomic Alterations

We have different types of alterations.

Somatic mutations → the most common type of alterations in cancer, and the majority are
single base substitutions (changes one base for another)
Copy number alterations→ deletion (=loss of an entire piece of DNA) or amplification
(=generate many copies of the same piece of DNA). They are usually caused by mistakes during
replication
Chromosomal translocation → swapping pieces of chromosomes
7
mRNA deregulation → affecting translational regulation
epigenetic silencing → changing the epigenetic status of the DNA

We will focus today on somatic mutations, copy number alterations and chromosomal translocations.

Somatic mutations emerge for many reasons and they are the most common in cancer
In cancer we have an accumulation of mutations.

Let’s talk about each of them in a bit more detail!

Somatic Mutations

There are different types of somatic mutations:


8
Single nucleotide changes = 1 change in one nucleotide → change in 1 amino acid (for example:
from A to T)

They are also known missense mutations, and (again) are caused by a substitution in
one base that leads to a change in the amino acid in the protein.

Missense = change 1 aa

An example → BRAF → from A to T in the position 600

When we look at somatic mutations we have to understand what happens at the protein level → the
changes have to affect the protein to matter (to have an impact).

Other types of single nucleotide changes are:

Silent mutation → also known as synonymous, it does NOT change the amino acid (=
functionally neutral)

Nonsense → it’s a missense mutation that causes the introduction of a stop codon (=
truncated protein)

Frame-shift mutations: insertion or deletion that change the reading frame

9
In this case, the mutation affects more than 1 nucleotide, either by inserting new ones
(1, 2 or more) or by deleting some (1,2 or more) → changing the reading frame for translation.

This causes that the amino acids after the mutation → all are changed! They are highly
deleterious mutations since now the sequence will code for a different protein!!

(we can also have in frame mutations if the addition/deletion is a multiple of 3, so that the only
consequence is the addition/deletion of some aa without changing the rest, but this is very rare).

HOTSPOT mutations (activating an oncogene)

Some mutations are very particular. Let’s understand this using the example of BRAF (one of the first
oncogene mutations studied).

In the case of BRAF → the mutations always happen in the same position + same change

If we sequence the tumour of different patients → 235 out of the 399 share the same mutation
in the position 600, and always it’s a change of a valine to a glutamate.

It is ALWAYS this! Always this bases that is changed, in more than 50% of the patients.

Why is this so important?! Why so selected?

To be able to answer this we need to understand how BRAF works → it’s a kinase that is involved in a
signalling cascade that drives cell growth. It’s a normal protein that when stimulated by a RTK will activate other
multiple kinases and promote cell growth.
10
The mutation causes this kinase to be constitutively active = always on! And all the feedback mechanisms
to shut it down won’t work anymore when we have the mutation. So…this mutation causes the cell to have a
constitutively signal to keep on proliferating!!

We can see how dramatic the effect of a change of ONE nucleotide can be.

When a mutation is always the same + at same place → it’s what is known as HOTSPOT.

Just to finish with BRAF→ this oncogene is the driver of many cancers (melanoma, colorectal…) and
there has been a specific drug developed to target this mutation.

Truncating Mutations (inactivating a tumor suppressor)

We have talked about oncogenes, but what can we see in tumour suppressors?

If we have a look at the most common types of mutations in tumour suppressor genes → they are
nonsense + frameshift → messing up with the reading frame.

What’s the expected outcome? Inactivating mutations!

In the following image, each dot = 1 patient in Colorectal Cancer, and we can see the mutational
landscape of TP53 → there are missense mutations + nonsense + frameshift, but the common characteristic is
that there is a downregulation of p53!

For cancer cells, the most important thing is to inactivate the tumour suppressors (eliminating all the
“brakes”), so they make proteins that are not functional.

Nota: p53 is the major regulator of self-preservation in the cell (participating in DNA repair, apoptosis,
senescence…) and it’s the MOST frequently mutated gene in all cancers.

11
So → Oncogene = over-activation

Tumour suppressor = downregulation

Non-coding Mutations

We can map the nucleotide changes…so we are looking at the coding sequences. But…then what about
the non-coding sequences?

How can we study their impact? It’s “easy” to look at the changes that occur in the coding sequence→
we go directly to the protein, but in non-coding?

The interest of mutations in the non-coding sequences has increased in the past 10 years and it all
started with the discovery of a highly recurrent mutation in TERT (=oncogene) which is in the promoter
sequence!! = non-coding sequence.

In TERT → the mutation always happens in the position 228 or 250 → increased activation of TERT.

it’s a non-coding mutation that overactivates the production of TERT

It does so by creating an alternative binding site for transcription factors, and all is caused by just 1
change in the promoter! Now the protein is activated, when normally it should NOT.

This mutation is found with high frequency in melanoma.

12
After this discovery → the search for non-coding mutation started, using a lot of resources and tools.

But…the official statement after was a paucity of non-coding drivers in cancer.

In other words, they didn’t find anything. 2 possibilities= there is noting or we don’t know how
to find them.

TERT is the ONLY non-coding strong driver, the rest is very difficult to find.

Copy Number Alterations

Deletion: Loss of chromosomal regions (Heterozygous or Homozygous) → disappearance of the tumour


suppressor so that it is NOT produced anymore.

Amplifications: Acquire one or more copy of chromosomal regions (Duplication or Amplification) →


multiple copies of the oncogene

13
Focal Deletions (inactivating a tumor suppressor)

In the following, each line is a patient (genome) and in the x-axis we have the locus of chromosome 9
from 20 to 24 megabases. The colour indicates the strength of the deletion, dark blue = loss of 2 copies, light
blue =loss of 1 copy, white =loss of no copies.

Nota: remember that cancer cells are abnormal and therefore have an abnormal number of
chromosomes, but let’s forget that for now.

We can see that there is a region that always overlaps → the CDKN2A locus. Most of the deletions
involve the loss of this locus.

If there is the loss of a part of the DNA → there is a dramatic change in RNA expression! This makes
sense since you cannot translate what you don’t have = Strong downregulation through copy number deletion
(no mutation in this case).

14
Focal Amplifications (activating an oncogene)

In oncogenes we can see amplification of the gene → make more copies (can generate as many as I
want, and there can be up to 20-25 copies!). in the following image, the stronger the red = the more copies there
are.

The more copies there are, the more the gene is expressed → there is an overproduction of the protein.

But it doesn’t end up here! Analysing the amplifications we can see that some of the amplified genes
have a lot of mutations!!

So we have both! (why have 1 when you can have 2? That’s the logic that some cancer cells, such as in
lung cancer, apply). The oncogene is activated twice!!

multiple copies of the mutated gene → to really boost the signal.

Nota: EGFR is a growth factor receptor that is upstream of BRAF and when has the ligand bound it leads
to a proliferation signal→ when mutated it doesn’t care of the presence of external signals and will always signal
on its own.

Tumor subtypes defined by copy number alterations

In the following image, each column is a patient and in the y-axis we have the genome. In blue =
deletions, in red = amplification. It’s an analysis of patients with endometrial cancer.

We can see that there are 2 groups:

15
One group that has all white = NO copy number alteration (best prognosis)

One group that has a lot of red and blue = copy number changes in all the genome (worse
prognosis) → this is known as chromosomal instability.

So! We can see that it’s all very heterogeneous.

Chromosomal Structural Variants

There are other structural changes:

Inversion → a piece of DNA is cut and the direction is switched + put back in the same place

we have a break + the piece is reattached in the opposite direction

Translocation → breaks of chromosomes and the parts are reattached to a different chromosome.

How can these mistakes happen? → they occur when there is DNA damage (like DSB) and the repair
machinery tries to repair the damage and normally succeeds BUT sometimes, if the repair machinery is NOT
functional (or not property) then they CANNOT repair well and these mistakes happen.

Besides, cancer cells → are highly proliferative so they lack many of the checks that the healthy cells
have to try to avoid this sort of errors.

16
 Translocations became apparent from “chromosome painting” (long before NGS techniques)

These paintings allow to see the genotype of a tumour cell → see the translocations + amplifications +
deletions.

How can this help cancer cells?

 Change the promoter so that it leads to the activation of an oncogene or the opposite (it
depends on the promoter), but basically it changes the expression of the protein (para bien o
mal)
 Changing the protein → chimeric protein by attaching 2 different proteins
 Cut the protein → no longer active.

17
A very common change is the transfer of a coding sequence to a different promoter → it’s found in
almost everyone (but that doesn’t mean that it always leads to cancer). In some cases, this translocation causes
an overactivation of a protein → by causing the protein to be next to an active promoter so that the protein also
becomes very activated.

an example of this → t(14,18) translocation causing Bcl2 (anti-apoptotic protein) to become
overactivated by becoming under the IgH promoter. This can be found in all patients with
follicular lymphoma (it’s its hallmark) and is the driver of lymphomagenesis (with other
interactions).

Another possible change, as mentioned is the fusion of 2 proteins → forming a new one, known as
chimeric protein.

an example of this is the fusion of BCR+ABL generating a new protein that has a hyperactive
kinase activity. It’s very typical in leukemia (a driver).

18
So we have seen the most common genetic alterations that occur in cancer. Now we understand (I hope)
how oncogenes get overactivated and tumour suppressors get inactivated.

But we have NOT discussed something very important! How do these alterations appear?

We are NOT born with somatic mutations (to inherit them = have to be in the germline). So…if we are
not born with them, that means that we have to acquire them!

How do mutations emerge?

There are many different sources of alterations:

 Viral infections → cancer is NOT viral disease BUT those viruses that can cause genetic
alterations (like HPV or EBV) are associated with specific cancer types.
o Again, the virus is NOT causing cancer just by being there but because it damages the
DNA inducing genetic alterations that lead to cancer development.
 Exogenous mutagens → there are MANY factors that can induce DNA damage like UV,
radioactivity, tobacco (in these cases the mechanism through which they cause cancer is quite
well understood), but there are many other exogenous factors that can cause cancers, although
in many cases we don’t know the mechanisms and all we have are correlations (= incidence of
cancer ↑ with the factor)
 Unrepaired replication errors → this is one of the major sources, and it accumulated with aging.
When the cell replicates there are errors that are introduced, and most of them are repaired. Of
those that are left unrepaired, the cells that harbour them will commit suicide, BUT! in some
rare cases the mistake can lead to the overactivation of an oncogene (now becomes oncogene)
OR the inactivation of a tumour suppressor gene, thus conferring properties that lead to
oncogenesis.

Can we understand where the different mutations come from? → we can look for clues by sequencing
the DNA and look for the accumulation of mutations.

19
What do you get after sequencing 1000 tumors?

The following image shows one of the first pan-cancer analyses, each column is a tumour type and each
dot is a patient. In this case all coding sequences were sequenced. On the y axis the number of mutations is
shown, and 1 mutation/Mb would indicate that, since 2% of the genome are coding sequences, then 1 = 60
mutations across all genome (again, in coding sequences).

We can observe that some tumours are very few mutations (~10 mutations, and these are mostly
pedriatic tumours→no time to accumulate as many mutations + normally have more epigenetic modifications)
whereas others have a lot.

20
Examples of tumours that have a high mutational burden are melanoma + lung squamous cell carcinoma
→ and these have strong mutagens associated!! Lung = smoking + melanoma = UV light.

Strong mutagens act promoting tumorigenesis → we have an accumulation of many mutations well
before the formation of the tumour.

Scary thought → if we take our skin + sequence it, we will find many mutations!! That’s because we are
constitutively exposed to UV light (and we don’t really protect ourselves from it), so we accumulate many
mutations before tumours arise (in most cases, the tumour will never arise).

The different mutational rate across cancer types → associated with mutagens.

21
What’s more! Within the same tumour type → we have different mutational rates, some can have a lot,
some can have few.

Heterogenous mutation rates in the same tumor type

If we look in 1 tumour type → we can see that some patients don’t have many mutations and that there
is a jump in the mutational burden frequency (best not to use rate because rate implies a timeline and we don’t
have any idea of the time)→ for example in lung carcinoma some patients have a lot of mutations and also have
MLH1 silencing whereas other patients also with a lot of mutations have instead POLE mutated → these 2
subsets of patients have mutations in DNA repair proteins!! → which accounts for the high mutational burden.

In these patients (those with MLH1/Pole mutated) → they have 1 order of magnitude more of mutations
than the patients that don’t have these genes mutated.

22
It’s like what we saw before of the 2 groups that 1 had a lot of chromosomal instability and the other
one none → this other one usually has the genetic instability (=high mutational burden).

2 pathways that characterise different tumour subtypes

Genetic instability normally correlated with a better prognosis…how can this be? Because the cells, since
they are very unstable it takes very little to push them over the edge and kill them!!

Also → having a lot of mutations = tumour more immunogenic → can be treated with immunotherapies.

Just a few more words on MLH1 and POLE.

 MLH1→ normally mutated through epigenetic inactivation


 POLE → 2 hotspot mutations (= same mutation at the same position) also causing its inactivation
(or a different behaviour)

Both genes → important DNA repair regulators.

23
DNA repair defects lead to high mutation rates

MLH1 participates in Mismatch repair


DNA repair
POLE participates in BER.

In cells that have these mutations → they accumulate maaany more mutations because the errors that
are (naturally) made CANNOT be repaired!

So, we have seen that mutations can have many different origins → can we trace them? Can we tell
what is happening in a more systematic manner?

24
Mutational Signatures

In some mutations → we always find a very specific type of substitution.

There are 6 possible combinations (and 6 more complementary to them)

C→T

C→A

C→G

T→C

T→A

T→G

We can see that the spikes (see next image) are very different within the different types of
tumours.

Nota: C→T is associated with aging.

For example in melanoma → we see many C→T, in lung carcinoma we have more C→ A.

This tells us something!! UV causes C→T and smoking causes C→ A!!!

25
So… just by looking at the type of mutation we can link the mutation to a cause!!

We can trace the mutation back to its mutagen.

We can even go further and distinguish even further if we don’t just look at the mutated site (that is the
nucleotide that changes) but we also look at its sides!! (the one before and the one after = the triplet).

So! Now we have a lot of other possibilities!!!

in this way we can really build the different signatures! (many can have C→T but a specific mutagen
will cause this change if after there is a G, for example) → we can really look into much more detail

Measure the type + source

26
In each patient we can have more than 1 source of mutations.

Let’s give an example to understand this.

In melanoma → most mutations are caused by UV light, but not only that! we can also have
other mutations that are due to aging that affect DNA repair and we accumulate more mutations
because of it (just like we explained a bit ago).

So! Each tumour can be caused by multiple processes that cause mutations → some mechanisms will
be more important (and prevalent) at the beginning, whereas other will come into place at later stages.

Example → many people develop lung cancer years after quitting smoking → tobacco caused
the initial mutations but then other mutations accumulated because of other reasons.

We can look at thousands of patients and build the spectrum of mutations → they are caused by different sources
and their sum will give the pattern → we can do a deconvolution and see the different origins.

So the question that we are trying to answer is:

Can I predict the mutational processes and how much each contributed to the observed mutations?

27
There are different techniques to obtain the orthogonal signatures

A method that exists → Non-Negative Matrix Factorization (NMF) but for that we have to have a large
number of samples → that way we can discover the orthogonal axis.

We obtain the orthogonal signatures that explain the mutations in the patients.

it’s very important to look at the nucleotides that are flanking the mutation! → we can identify many
signatures of different origins of the mutations.

28
Nota: deamination can occur spontaneously and can lead to mutations if left unrepaired → it causes
C→T when after the C we have a G!

Mutational Signatures in 2022

There was a study to update the different mutational signatures → and also give some clinical
information. For example, using this now we can know whether a patient that has lung cancer was or not a
smoker.

We can use it to understand the origin of the tumour.

29
We have talked a lot about mutations, but…do all mutations provide a selective advantage?

The concept of mutational signatures was introduced in 2012 following the demonstration that analysis
of all substitution mutations in a set of 21 wholegenome-sequenced (WGS) breast cancers could reveal
consistent patterns of mutagenesis across tumours. These patterns were the physiological imprints of DNA

30
damage and repair processes that had occurred during tumorigenesis and could distinguish BRCA1-null and
BRCA2-null tumours from sporadic breast cancers. Subsequently, a landmark study applied this principle on ~500
WGS and ~6,500 whole-exome-sequenced tumours across 30 cancer types and revealed 21 distinct single-base
substitution mutational signatures (SBSs). Recently, an updated analysis of ~4,600 WGS and ~19,000
whole-exome-sequenced samples raised the number of known SBSs to 49.

In trying to identify mutational signatures in a data set, a ‘global’ approach can be adopted, where
signatures from all cancers, irrespective of the tissue type, are aggregated and averaged to derive a set of
consensus signatures2,3. This approach presupposes that more samples provide more power for discerning new
signatures. However, an aggregated analysis also assumes that signatures are identical across all tissues, ignoring
possible tissue-specific signature properties that reflect organ-specific biology, highlighted as probable
recently4. Indeed, the number of samples per tumour type has been imbalanced in past analyses, resulting in
signatures of certain tissue types being more influential and thereby introducing potential bias2,3. By contrast,
a ‘local’ approach restricts signature extractions within individual tissue types and subsequently compares locally
extracted signatures between different organs. This permits natural variation among different tissues to emerge.

What does a signature reveal biologically? A mutational signature is the outcome of a mutagenic process
comprising some form of DNA damage, subsequently acted upon by DNA repair and/or replicative machinery.
This definition, however, faces biological complexities that also limit mathematical analyses.

First, a single type of primary DNA damage could be acted on by more than one DNA repair or replicative
pathway, resulting in disparate outcomes. For example, APOBEC deamination of cytosine to uracil (C>U) may be
the initial insult. Uracil may enter the replication process uncorrected and pairs with A during normal DNA
replication, resulting in C>T mutations that are characteristic of SBS2 (reF.34). Alternatively, uracil may be
processed by uracil-DNA glycosylase (UNG) as part of the base excision repair pathway, resulting in a so-called
apurinic/apyrimidinic (AP) site that does not contain a DNA base35,36. Abasic sites may undergo strauss’s A
rule37 to produce SBS2 C>T mutations, or the predilection of DNA repair protein REV1 for insertion of C opposite
uninformative AP sites would result in C>G transversions; this would present as SBS13, of which C>G
transversions are a key feature.

Second, any given repair protein may have multiple functions and may act on different types of DNA
damage. When a repair protein is defunct, multiple compensatory pathways may be activated to deal with
various forms of DNA damage. Thus, a defect in a single gene such as BRCA1 could cause multiple signatures
because of the multifaceted role of BRCA1 and the multitude of compensatory repair pathways that are called

31
upon in its absence. Arguably, each signature should be considered as unique because different types of initial
DNA damage are required to generate substitution or rearrangement patterns. Thus, attempts to perform
mutational signature analyses by combining different mutation classes may seem mathematically novel but may
not be biologically correct. Furthermore, having signatures comprising mixed sources of DNA damage and repair
would handicap efforts to understand individual signatures mechanistically. It may be advisable to regard
signatures of different mutation classes as independent readouts and seek collinearity going forward.

To gain biological insights, we frequently associate mutational signatures with factors such as driver
mutations, germ line variation, epigenetic modifications, and environmental or therapeutic exposures. However,
these remain associations until causation is proven. For example, SBS1 (reF.2), characterized by C>T transitions
at methylated CpG dinucleotides, may be caused by spontaneous deamination of 5-methylcytosine, enzymatic
deamination of cytosine or polymerase errors44. The burden of SBS1 is associated with the age at diagnosis in
virtually all tumour tissues examined2,7. This translates neither to ageing being the cause of the signature nor
to the signature being the cause of ageing. It is merely an early association, detected because the deamination
of methyl-CpG dinucleotides happens spontaneously and continuously in all cells and is thus easily
detectable2,45,46. When millions of cancers have been sequenced, there will probably be enough data points
to show that many signatures show a correlation with age.

SBS1 is widely referred to as a clock-like signature7, although it remains unclear whether this ‘clock’
refers to mutation accumulation in terms of cell division or time. For instance, in 1 year, a cell could divide ten
times or 1,000 times. It is unclear whether the ‘clock’ refers to time regardless of the number of cell divisions or
to the number of cell divisions regardless of time. Furthermore, the term ‘clock’ communicates a uniform rate,
yet deamination is likely to vary over time or cell divisions. Analyses that use SBS1 to time cancer evolution
sometimes assume that SBS1 occurs at a homogenous rate7. However, mutation acquisition per cell division
may change as tissues evolve. In precancerous lesions, C>T substitutions at CpG dinucleotides can be
approximately tenfold higher than in normal cells due to replication stress47. Thus, care must be taken when
one is reporting an association as it could result in erroneous propagation of concepts and inappropriate use
of signatures.

Driver vs Passenger mutations

We talk about cancer evolution to discuss cancer progression (initiation→ progression → metastasis)
because cancer follows the Darwinian principles!

32
We have some mutations that are acquired randomly → some mutations will NOT change the
phenotype of the cells just like in species!! (thankfully species are not generated every time that there is a
mutation!). Sometimes, some mutations will provide an advantage → then we have a new specie that is more
fit than the others!

Nota: When we talk about fitness = fitness of the cell.

So! Some mutations will change the cell population by providing a new advantage. NOT all the mutations
provide an advantage, only some! (those will make the cells that carry the mutation to be more fit than the
others→ they will be selected).

The mechanism that Darwin proposed for evolution is natural selection. Because resources are limited
in nature, organisms with heritable traits that favor survival and reproduction will tend to leave more offspring
than their peers, causing the traits to increase in frequency over generations.

Once we know this we can distinguish between driver + passenger mutations:

 Driver mutations → functionally activate/inactivate an oncogene/tumour suppressor and


therefore give an advantage
 Passenger → does nothing (or too little to be relevant) → no phenotype is generated.

Ojo!! We have a problem → it’s not so easy to distinguish them because we have 1 driver and hundreds
of passengers!!!

33
When we talked about the mutational burden of the tumours we took all the data → we were NOT
making any distinctions!

Based on our current knowledge on driver mutations we can split the plot for drivers + passengers.

If we do that we can see that we have a lower number of drivers than the total number of mutations
(so we have a lot of passengers!), this can even go up to a 100 fold of difference.

So…if we want to find the driver mutations is just like finding a needle in a haystack.

We have a big problem! Because we sequence all (we are NOT looking 1 by 1 to see if the mutation can
cause the transformation of the cells), how can we distinguish between the 2 types of mutations?

Again, the most reliant form to distinguish between them is to design an experiment to prove that a
given alteration can induce cancer

Cell lines / Mouse models / tumor organoids / etc → see if we induce tumours

34
But if we have to do this for over 10’000 mutations it’s IMPOSSIBLE! We need to find ways to prioritize,
and in order to do so we have to understand what features that have to have the drivers to be considered as
such.

One of the keys → see the recurrence the mutation.

Main idea: A cancer driver gene is a gene that is mutated more frequently than expected in a
large tumor cohort.

But…this implies something very important! How can we know if something that is mutated is
mutated enough to be considered as more than normal? What can be expected?

So the classical approach is as follows:

Count the number of mutations observed in each gene in the cohort


Determine the expected background mutation rate (BMR)
Estimate significance of observed vs. expected number of mutations

The second step (calculating the BMR) is the problem! And it’s a biological problem → how frequently is
the genome mutated? It’s a very important problem!!

A study → sequenced 2 tumour types to look for recurrent mutations:

But! they did NOT correct for the background mutation rate → they didn’t considered that there is a
difference in the mutational rate across the different genomes + genes → if you do “good statistics” the number
is much lower than what they found.

35
It’s very important to understand what is expected.

We can now ask ourselves what influences the background? Is it the same across all patients? No!!
Remember!! The different tumours + within the same tumour → the mutational burden is different! That’s why
we need to customise it for the patient.

But there is more! The probability of having a gene mutated depends on its level of expression + time of
replication:

↑ expression +/or early replication = ↓ accumulation of mutations

↓expression +/or late replication = ↑ accumulation of mutations

Nota: normally those genes that are lowly expressed they replicate late → this also affects the efficiency
of DNA repair (better repair mechanisms in highly expressed + early replication)

So, basically → we have to customise if for the patient + gene.

36
What does influence the BMR?

Sample specific features


o Tissue-type
o Impact of specific alterations
Underlying mutational processes
o (e.g. UV-light or tobacco consumption)
Regional genome properties:
o Gene expression
o Replication time
o Heterochromatin vs. euchromatin

The BMR is very variable → across the Genome

Patient

Tumour type

It’s very difficult to estimate!

we have to take into account all the factors in order to know
what to expect.

BMR often inferred from silent and non-coding mutations in regions categorized based on covariates

So→ to estimate the BMR → we use the silent mutations.

Why?

37
Because these mutations do NOT given any selection pressure, and remember that tumours have a
evolutionary pressure!

If a mutation is oncogenic → it will be more frequently mutated because it will be able to induce
tumours! So there will be a selective pressure.

As we said → the BMR is influenced by many things but that are INDEPENDENT of the selective pressure
→ that’s why we look at silent mutations (that are 99% functionally neutral) → so they can be used to estimate
the probability of mutating the genome.

dN/dS ~ to estimate the BMR + see if it’s a driver or not

There are different approaches to discriminate drivers from passengers.

38
This was not given

Evidence of selection:

Recurrence ( it requires to estimate BMR )


Distribution of mutations (1D and 3D clusters)

Mutations in functional domains

Functional impact bias


o (also evolutionary conservation)

39
Cancer driver mutations (more information)

Cancer exploits an imbalance between cellular processes that lead to changes in DNA and those that
repair them. As a result, cancer cells may accumulate many (epi)genetic alterations throughout their lifetime,
and these are one or two orders of magnitude more numerous than in germline and normal somatic cells [1]. It
has been long recognized that cancerogenesis is a multistage process. One of the first lines of evidence came
from the observation that the death rate for some cancer types increased as the sixth power of the patient age.
Consequently a mathematical model was proposed suggesting several successive driving mutations and stages
of cancer [2]. Further studies confirmed a small number of mutations that drive cancerogenesis (driver
mutations) [3,4], with about one driver mutation per patient in sarcomas, thyroid, and testicular cancers, and
about four driver mutations per patient in bladder, endometrial, and colorectal cancers (Figure 1C) [5]. However,
most mutations in cancer are assumed to be largely neutral (passenger mutations) and do not contribute to
cancerogenesis. The vast majority of driver mutations represent single-nucleotide substitutions or point
mutations.

Detecting driver events in cancer is necessary for understanding the molecular mechanisms of cancer
and consequently for developing diagnostic, prognostic, and treatment strategies. Indeed, there is a causal
relationship between the presence of driver mutations in a tumor genome and the clinical phenotype of a given
patient. There are many computational methods to detect driver genes [6], but relatively few rank mutations
with respect to their driver status. This is explained by the greater complexity of the latter problem because the
presence of a driver mutation in a gene is sufficient to call it a driver, but not vice versa.

40
Cancer driver mutations may affect cell-cycle control, leading to insensitivity to growth inhibitory signals
and escape from immune surveillance. Therefore, driver mutations confer a selective advantage and are usually
under positive selection in cancer. Driver mutations can differ between cancer types and patients, and the same
mutation may drive cancer progression under some circumstances but be neutral in another environment.
Moreover, the classification into driver and passenger mutations is not binary; to make the story more complex,
some mutations (so-called 'latent drivers' [7]) might become drivers at a particular stage of cancer evolution or
when combined with other mutations in the same or different genes. Indeed, individually infrequent and
functionally weak mutations may collectively account for clonal selection of cancerogenic traits [8], whereas
multiple mutations of the same allele can lead to increased activity, cell proliferation, and tumor growth [9]. In
addition to mutations in coding genes, there are also many alterations affecting non-coding regions in cancer
[10].

The distribution of driver mutations over human genes is not uniform, and recent studies attempting to
reconstruct the evolutionary history of individual tumors showed that 50% of all early clonal driver mutations
are located in only nine driver genes, whereas subclonal mutations occur in 35 different genes, pointing to a
diverse set of drivers in later evolution [11]. Driver genes are usually classified into oncogenes and tumor-
suppressor genes. Oncogenes usually harbor gain-of-function mutations, which activate the protein and lead to
uncontrollable cell growth or proliferation. Tumor-suppressor genes, on the other hand, are responsible for
homeostasis during cell division and DNA replication, and there is strong positive selection in cancer for
deactivating mutations. However, some genes can have both tumor-suppressor and oncogene characteristics
under different circumstances.

Estimating the background somatic mutation rate in cancer

Somatic mutations observed in cancer patients occur as a result of two major mechanisms: background
mutability and natural selection. Background mutability refers to the somatic background mutation rate: this
provides the necessary variability and is shaped by multiple mutational and repair processes which are devoid
of a selection component. If mutations accumulate only as a result of neutral background mutational processes,
one should expect a positive linear correlation between the background mutation rate and the number of
mutations observed in cancer patients, which has been indeed observed in many genes, and the strongest trend
is found for synonymous mutations [12]. Mutations that occur at higher or lower frequencies than expected
from the background mutation rate model might be under selection in cancer [5]. The ratio of non-synonymous
to synonymous mutations (dN/dS) is routinely used in cancer evolutionary genomics to estimate the effect of
selection. Those genomic regions or nucleotide sites that are under positive selection would have ratio greater

41
than 1.0, and the link between dN/dS values and fitness selection coefficients in somatic cancer evolution has
been established [13]. There are also under-represented mutations which are under negative selection in cancer
because they result in cell death or senescence. Estimates of the effects of negative selection in cancer have
been controversial, mostly due to the low mutation counts, the presence of multiple gene copies, and the
recessiveness of deleterious mutations [14].

Accurate estimates of the neutral background mutation rate have proved to be difficult but are crucial
for identifying driver mutations. The background mutation rate depends on many endogenous and exogenous
factors. It has been shown that cell type-specific (epi)genomic features, such as replication timing, histone
modifications, and chromatin accessibility, may explain up to 86% of the variance in mutation rates in cancer
genomes at a megabase scale [15]. Taking into account these large-scale covariates, various computational
methods have been designed to estimate the regional variations of the background mutation rate and predict
significantly mutated genes [16]. Nevertheless, driver mutation prediction requires an estimate of the
background mutation rate at the single-nucleotide scale, which is more laborious. Previous studies pointed to
the local DNA sequence context as a major factor underlying the largest proportion of mutation rate variation
[17], and heptanucleotide context explains up to 80% of the variability in the pernucleotide substitution rate
[18]. In addition, local DNA structures (non-B DNA such as DNA stem-loops and quadruplexes) also contribute to
mutation rate variation at the scale of single or 10–50 bp [19–21]. Computational modeling of these mutational
processes will be described next, but it is often challenging to separate neutral background and natural selection
components from each other.

Driver mutations and mutational signatures

DNA molecules are constantly exposed to different mutagenic processes such as UV light, smoking,
reactive oxygen species (ROS), and many others. These mutagenic processes leave characteristic mutational
patterns which can be analyzed using mutational signatures or mutational motif frameworks [22,23]. Mutational
signatures are typically modeled as a multinomial distribution over a set of mutation categories. Most
commonly, mutation categories are defined as triplets of nucleotides where the nucleotide in the central
position of the triplet is mutated while the flanking nucleotides provide local context for the mutation (Figure
2). Mutational signatures are identified computationally based on mutation catalogs of large cohorts of cancer
genomes using methods such as non-negative matrix factorization [24–26], latent Dirichlet allocation [27], topic
models [28,29], and other approaches [30]. Currently, the most popular set of mutational signatures is provided
in the COSMIC (Catalogue of Somatic Mutations in Cancer) database [31]. Each mutational signature is aimed to
be linked to a different mutational process. For example, specific signatures have been linked to smoking,

42
homologous recombination deficiency, the mutagenic activities of the APOBEC enzyme family, and to
spontaneous or enzymatic deamination of 5-methylcytosine to thymine, among many others. In addition to
mutational signatures, computational methods infer the so-called signature exposures which measure the
number of mutations attributed to each signature in a given cancer genome. This is done jointly with de novo
inference of the mutational signatures using the methods listed in the preceding text or by dedicated approaches
that utilize previously inferred mutational signatures [32–34]. With these concepts at hand, researchers are
investigating whether some cancer driver mutations are caused by specific mutagenic processes and, conversely,
whether cancer driver mutations can drive such mutagenic processes [35–48].

Another powerful and straightforward approach to analyze characteristic mutational patterns is to use
mutational motifs. Mutational motifs represent a mutated nucleotide and its local DNA sequence where several
flanking nucleotides and the location of the mutated site can vary (Figure 2). Historically, motifs have been
derived from experimental studies and can be viewed as footprints of interactions between DNA sequence and
mutagens, thus providing information about the underlying molecular mechanisms of mutations. Several
computational methods have been developed to extract, analyze, and annotate mutational motifs from
mutation data.

The connection between the genetic background of an individual and the response to the treatment was
established a long time ago. Nowadays such approach is heavily based on genetic testing, but only a small
fraction of patients harboring potential driver mutations (biomarkers) are enrolled in genotype-matched trials
[88]. Prognostic biomarkers are used to derive the correct diagnosis, whereas pharmacogenomic biomarkers

43
may predict the response to the drug based on a specific genomic markup. Many mutation biomarkers have
been identified so far. In lung adenocarcinoma patients multiplexed assays showed that 25%, 17%, and 8% of
patients had KRAS, EGFR, and ALK alterations, respectively, and patients with driver mutations had 2.4 years
median survival compared to 2.1 years for patients without biomarkers [89]. Similarly, patients harboring specific
driver mutations in the POLE gene encoding the DNA polymerase epsilon had an excellent prognosis and
responded well to immunotherapy, allowing this biomarker to be used in pretreatment triage [90]. In addition
to individual mutations, clustered mutations in the TP53, EGFR, and BRAF genes have recently been found to be
associated with overall survival.

In addition, mutational signatures can be used as biomarkers and predictors of drug response [96].
Because many mutational signatures are caused by a malfunctioned DNA repair mechanism, they can be used
as biomarkers for drugs which target a complementary DNA repair process via synthetic lethality [97]. APOBEC
signatures, on the other hand, are predictive of a response to immunotherapy in several cancers [98,99]. This is
attributed, in part, to high tumor mutation burden which is known to be predictive of the response to
immunotherapy [100] and, in part, to immune-related APOBEC3B upregulation.

44

You might also like