0% found this document useful (0 votes)

16 views40 pages

Lecture 2

The document outlines homework assignments focused on the differences between DNA and protein sequencing, including their definitions, building blocks, and sequencing challenges. It also includes tasks related to gene analysis, such as extracting gene information, predicting gene structures, and exploring genomic databases like NCBI and Ensembl. The homework emphasizes understanding gene functions, their implications in cancer and immunity, and the methodologies used in sequencing and gene prediction.

Uploaded by

trieupg.22bi13431

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views40 pages

Lecture 2

Uploaded by

trieupg.22bi13431

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 40

HOMEWORK DAY 1

Problems and Solutions?

1
Note for HOMEWORK 1
Homework 1: DNA sequencing vs Protein sequencing
a. What is the difference between DNA sequencing and protein sequencing?
Answer 1?
DNA sequence Protein sequence
Definition DNA sequence is a series of Protein sequence is a series of amino
deoxyribonucleotides acids
Building block Deoxyribonucleotides Amino acid
Different types Four types of deoxyribonucleotides Twenty different amino acid
of monomers
Bonds between Phosphodiester bonds Peptide bonds
monomers
Function DNA mainly stores genetic Important in structure, function, and
information to make proteins in a cell regulation of the body’s tissues and
organs
Variety One DNA sequence can only be One protein sequence can have more than
translated into one possible protein one possible translation of DNA sequence
sequence
Deduce Can deduce to protein sequence Cannnot deduce to DNA sequence

2
Answer 2?

DNA sequencing Protein sequencing

DNA sequencing relies heavily upon PCR Protein sequencing is de novo, meaning it
primers, which works well for model species doesn’t rely on a database.
=>DNA sequencing proves difficult for non- => It can sequence any protein of any isotype
annotated genomes

DNA sequencing requires access to the intact Protein sequencing uses the protein itself
original cell line
=> So when the hybridoma is lost, DNA => providing the ability to sequence without
sequencing is no longer feasible accessing to the original cell line or hybridoma

DNA sequencing is blind to post-translational Protein sequencing can objectively uncover

modifications, which may have implications on post-translational modifications
protein functionality

Missing information: Principle and techniques?

- DNA sequencing: Traditional Sanger sequencing and next-generation sequencing
- Protein sequencing: two major direct methods (mass spectrometry & Edman
degradation using a protein sequenator (sequencer))

3
b. Why don't we sequence protein like we sequence DNA

- Because if we sequence protein like what we do with DNA, it may include both
introns and exons, that leads to the lack of accuracy of result.

- Due to the different structural components and the different nature of the
sequencing process. DNA sequencing relies on DNA polymerase and primer, taking
advantage of DNA replication to sequence. Protein sequencing uses the protein itself
, so it must be solved directly to give the position and structure of each amino acid.

- DNA sequencing is blind to the post translational modification, which may have
implications on protein functionally. Protein sequencing can objectively uncover post
translational modifications like N terminal pyroglutamate formation, glycosylation
sites and deamidation

4
b. Why don't we sequence protein like we sequence DNA

Missing points:
- The technique lacks high-throughput capabilities
- Cost:
> Protein sequencing cost: First 5 amino acids: $600; 50$ for each Additional amino acid
> DNA sequencing cost: a whole-exome sequence of human genome (30 x 106 bp, 1000$)

5
Note for HOMEWORK 2
Figure out how the genes assigned to each of you are implicated in cancers and/or immunity
(File: Gene List.xlsx)

Requirements: get the following information about each of the 3 genes assigned to you
• Gene symbol, full name, reviewed by RefSeq
• Summary of its function
• Location on the human genome (based on GRCh38)
– e.g. chromosome, start, end, strand
• How this gene is related to cancer
– Get one open-access reference that is most relevant to cancers and/or immunity in your
opinion. Please list the article title, the authors, their institutions, publication year, journal
name.
• Any situations (mutations, over-expression, etc.) of this gene associated with other (non-cancer
and non-immune) diseases
• Extract DNA sequence of these genes and translate the DNA sequences in 3 frames, and
determine the reading frame which contains an open reading frame (ORF).

6
Using NCBI RefSeqGene

https://www.ncbi.nlm.nih.gov/gene/?term=akt1

7
RefSeqGene - AKT1

• Gene symbol, full name, reviewed by RefSeq

• Summary of its function 8
RefSeqGene - AKT1

• Location on the human genome (based on GRCh38)

e.g. chromosome, start, end, strand
9
• How this gene is related to cancer

RefSeqGene - AKT1

10
• How this gene is related to cancer

11
• How this gene is related to cancer
– Get one open-access reference that is most relevant to cancers and/or
immunity in your opinion. Please list the article title, the authors, their
institutions, publication year, journal name.

12
• Any situations (mutations, over-expression, etc.) of this gene associated
with other (non-cancer and non-immune) diseases

RefSeqGene – AKT1

13
From NCBI RefSeqGene to ClinVar

14
From NCBI RefSeqGene to ClinVar

15
Extract DNA sequence of these genes and translate the DNA sequences in 3
frames, and determine the reading frame which contains an open reading
frame (ORF).

GenBank Record Fields

16
RefSeqGene - AKT1 transcript

17
Extract DNA sequence of a transcript of AKT1 genes

Searching for ORFs

a. Missing protocol
- Which program? Website?
- Parameter: strand? Inititation codons? genetic code? min ORF size?.. 18
b. Conlusion: which ORF should be chosen for further study?
Structure of an Eukaryotic genes

19
How gene structure is determined?

• Experiments
– Reverse transcription PCR (RT-PCR) -> sequencing
– 5’ Rapid Amplification of cDNA ends (5’ RACE) -> finding the 5’ most exon -
sequencing
– Transcriptome library -> single-pass sequencing
• Expressed sequence tags (EST)
• RNA-seq

• Computational prediction

20
How computer can predict
the gene structure?

 The site for transcription and translation elements.

 The homology sequence of known gene/protein.
21
Strategy: Splice site recognition

GT-AG rule

22
DONOR-SPLICE: splicing site at the beginning of an intron, intron 5' left end.
ACCEPTOR-SPLICE: splicing site at the end of an intron, intron 3' right end.
Programs for gene prediction

 geneid: https://genome.crg.es/software/geneid/geneid.html
- Available organism: Homo sapiens (human), Drosophila melanogaster (fruit fly), Tetraodon
nigroviridis (puffer fish), Oryza sativa (rice), ….

 GenScan: http://hollywood.mit.edu/GENSCAN.html
- Available organism: Vertebrate, Arabidopsis, maize

 Augustus: http://bioinf.uni-greifswald.de/augustus/submission.php
- Available organisms: animals, alveolata, plants and algae, fungi, bacteria, archaea

 Other genefinders: FGENESH, GRAIL, GLIMMERM, GENEID, GENEFINDER,

GENEMARK, ….

23
EXERCISE BREAK
Exploring ab initio gene prediction
1. Extract the FASTA sequence of the genomic region of the AKT1 gene (NCBI Reference
Sequence: NG_012188.1)
2. Predict gene structure of this DNA sequence
- Searching signals of the first exon with geneid: Select acceptors, donors, start and stop
codons. Look for them in the real annotation of the sequence
- Searching exons using both geneid and GeneScan/or Augustus (or at least by two gene
prediction programs)
> Select All exons and try to find the real ones
> Finding gene
> Compare the predicted gene with the GenBank Record gene from NCBI

24
One gene
=> multiple (alternatively spliced) transcripts
=> multiple proteins (with distinct functions)

http://commons.wikimedia.org/wiki/File:Transformer_splicing.gif 25
Browsing genes and genomes
with Ensembl

26
Contents

• Introduction to Ensembl database and browser

• EXERCISE: A light exploration of the Ensembl genome

browser with AKT1 genes

27
NCBI databases are not the ultimate
solution to the knowledge of genomes

28
Introduction
Why do we need/have genome browsers? So many!

29
The Human Genome Project (HGP)

• Draft
– Published on June 26,
2000
– Coverage: 90 %
– Error rate: 1 %

• Finish
– Published in 2003
– Coverage: > 99 %
3
– Error rate: 0.01 % 0

30
Any thing new for the human genome?
The truth is that what we do
not know is much more
than what we've known…
This is no longer true since Encyclopedia of DNA Elements
(ENCODE) Consortium found new evidence

Once nearly everyone believed that only

3% of the human genome are functional
regions
1.5% are protein-coding regions
1.5% are regulatory elements
97% are junk DNAs
Nature (2001), 409(6822): 860-921

32
Non-coding RNA: It’s Not Junk

• ~70% (3/4) of the human genome can be

transcribed …, functionally unknown!

• >20,000 non-coding RNAs, functionally

unknown!
Djebali, S., et al. (2012). "Landscape of transcription in human cells." Nature
489 (7414): 101-108.

33
Genomic sequences must be
annotated with functions
Human Genome Project

GRCh38.p4 (June 29, 2015)

Annotation of gene structures
Reference genome

Advanced annotation

Population variations
Gene regulation Pathways
Variation and diseases

34
The Ensembl project

• The goal of Ensembl was to automatically annotate

the genome, integrate this annotation with other
available biological data and make all this publicly
available via the web (since 1999).

www.ensembl.org
35
Ensembl Features

36
EXERCISE BREAK

Exercise 2: A light exploration of the Ensembl genome browser with AKT1

genes
- Extracting genomic information from Ensembl:
 Gene ID, Gene Name, Ensembl Gene ID (Gene stable ID), NCBI gene ID,
Uniprot/Swiss-Prot ID
 What is the description of this gene? Where is it located in the genome?
 How many contigs cover the gene region? Is AKT1 gene in the forward strand
or in the reverse strand? How many transcripts are annotated for AKT1? How
many of them code for protein?
 SNP or variants within the genome of interest? What SNPs are found in my
gene and are they located in introns, promoters or exons?

37
HOMEWORK Day 2
- Revise your Homework 2 from Day 1.

- Extract the FASTA sequence of the genomic region of your genes (from
Homework Day 1) and predict gene structure of these DNA sequences using
one gene prediction programs. Summary the exons and introns from your
prediction; and write your observation and conclusion.

- Finding transcript information about a specific gene using NCBI & Ensembl
and compare with your prediction from bioinformatics program.

- Exploring genomic information of your genes (from Homework Day 1) using

Ensembl (see exercise 2 for detail).

- Between Ensembl and NCBI, which one would you prefer when searching
information of human genes? Why?

DEADLINE: 10am Thursday 15th 2021

37
Sequencing Primary data

ORF finder Gene prediction

Take-home message?
NCBI Ensembl
END

Lecture 8 Chapter 11
No ratings yet
Lecture 8 Chapter 11
61 pages
Bio Info Merged
No ratings yet
Bio Info Merged
154 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
66 pages
BIO 411 - Decoding Understanding Genomes Lecture
No ratings yet
BIO 411 - Decoding Understanding Genomes Lecture
55 pages
Genomes 5 5th Edition Instant Access
100% (18)
Genomes 5 5th Edition Instant Access
16 pages
5 - Introduction To Molecular Patholgoy
No ratings yet
5 - Introduction To Molecular Patholgoy
99 pages
Semwork 1
No ratings yet
Semwork 1
19 pages
DNA Sequencing 2009 10
No ratings yet
DNA Sequencing 2009 10
24 pages
Stuvia 1321801 Summary Bhcs 2003 Genetics
No ratings yet
Stuvia 1321801 Summary Bhcs 2003 Genetics
58 pages
01 - Prelude - Biochemistry and The Genomic Revolution
No ratings yet
01 - Prelude - Biochemistry and The Genomic Revolution
13 pages
Biotechnology
No ratings yet
Biotechnology
29 pages
Lecture 1 - Genes and Genomics
No ratings yet
Lecture 1 - Genes and Genomics
51 pages
Adobe Scan 10-Feb-2025
No ratings yet
Adobe Scan 10-Feb-2025
17 pages
1 8 Genome 2
No ratings yet
1 8 Genome 2
36 pages
Chua Yuen Chong, Gerrard - BIO61604 - Pract 3 and 4
No ratings yet
Chua Yuen Chong, Gerrard - BIO61604 - Pract 3 and 4
20 pages
ch1 A Gentle Introduction To Genomics
No ratings yet
ch1 A Gentle Introduction To Genomics
21 pages
Genomic Medicine: Basic Molecular Biology
No ratings yet
Genomic Medicine: Basic Molecular Biology
23 pages
Neet Marathon Test in Biology
No ratings yet
Neet Marathon Test in Biology
24 pages
Unit 8
No ratings yet
Unit 8
102 pages
Reading The Blueprint of Life: DNA Sequencing
No ratings yet
Reading The Blueprint of Life: DNA Sequencing
23 pages
Slides Week 10 Classes35-38 Bio200 Win16 1
No ratings yet
Slides Week 10 Classes35-38 Bio200 Win16 1
44 pages
Genome Annotation
No ratings yet
Genome Annotation
25 pages
Chbe 473/594B Homework #1 Spring 2013 (Due Jan. 31, 2011 in Class) 1. Multiple Choice (Only One Correct Answer) (3' For Each Problem)
No ratings yet
Chbe 473/594B Homework #1 Spring 2013 (Due Jan. 31, 2011 in Class) 1. Multiple Choice (Only One Correct Answer) (3' For Each Problem)
6 pages
Kato Bridgious Exam Bioinformatics
No ratings yet
Kato Bridgious Exam Bioinformatics
17 pages
Genome Annotation
No ratings yet
Genome Annotation
58 pages
Biomolecules - NA
No ratings yet
Biomolecules - NA
45 pages
Marine Biology 6e - Molecular Tools Chapter
No ratings yet
Marine Biology 6e - Molecular Tools Chapter
10 pages
BPS3101 C1-Lect1 F2024
No ratings yet
BPS3101 C1-Lect1 F2024
25 pages
Biochem Act
No ratings yet
Biochem Act
7 pages
2025 Spring BS120 General Biology Lecture 15
No ratings yet
2025 Spring BS120 General Biology Lecture 15
30 pages
The Human Genome - Final
No ratings yet
The Human Genome - Final
27 pages
Bioinformatic Practice
No ratings yet
Bioinformatic Practice
4 pages
Dna & Rna
No ratings yet
Dna & Rna
7 pages
Ensembl Genes and Transcripts
No ratings yet
Ensembl Genes and Transcripts
3 pages
02 Sequence Alignment
No ratings yet
02 Sequence Alignment
43 pages
03 Databases
No ratings yet
03 Databases
47 pages
Genomics & Molecular Biology Insights
No ratings yet
Genomics & Molecular Biology Insights
4 pages
Gene Expression
No ratings yet
Gene Expression
78 pages
Nucleic Acids Topic Test - Answers
No ratings yet
Nucleic Acids Topic Test - Answers
7 pages
BIO353 Lecture10 SF (Splicing) 2022
No ratings yet
BIO353 Lecture10 SF (Splicing) 2022
145 pages
2023-GenomicaFuncional y Biocomputacion-Day1
No ratings yet
2023-GenomicaFuncional y Biocomputacion-Day1
92 pages
Protein Synthesis Review
No ratings yet
Protein Synthesis Review
34 pages
Online Edition - Digital Access 7
No ratings yet
Online Edition - Digital Access 7
1 page
Human Genome Insights
No ratings yet
Human Genome Insights
21 pages
Assignment
No ratings yet
Assignment
11 pages
1 Dna Sequencing
No ratings yet
1 Dna Sequencing
117 pages
Nucleic Acids Study Guide
No ratings yet
Nucleic Acids Study Guide
7 pages
CUBT401 - 4 - Sequence and Genome Annotation
No ratings yet
CUBT401 - 4 - Sequence and Genome Annotation
66 pages
Gene Annotation Compatible
No ratings yet
Gene Annotation Compatible
17 pages
Molecular Genetics Test
No ratings yet
Molecular Genetics Test
9 pages
Reading The Blueprint of Life: DNA Sequencing
No ratings yet
Reading The Blueprint of Life: DNA Sequencing
23 pages
Farmakogenetika
No ratings yet
Farmakogenetika
197 pages
Algorithms in Bioinformatics: A Practical Introduction: Introduction To Molecular Biology
No ratings yet
Algorithms in Bioinformatics: A Practical Introduction: Introduction To Molecular Biology
78 pages
Module 3 Activity Central Dogma
0% (1)
Module 3 Activity Central Dogma
5 pages
Anatomy of A Gene
No ratings yet
Anatomy of A Gene
33 pages
Module - 3&4 Notes
No ratings yet
Module - 3&4 Notes
42 pages
Introduction To Bioinformatics - Notes
No ratings yet
Introduction To Bioinformatics - Notes
18 pages
Lecture 5
No ratings yet
Lecture 5
26 pages
Lecture 4
No ratings yet
Lecture 4
21 pages
Lecture 3
No ratings yet
Lecture 3
46 pages
4-Excitable Cell 2024
No ratings yet
4-Excitable Cell 2024
23 pages
Physiology of Body Fluid Dynamics
No ratings yet
Physiology of Body Fluid Dynamics
36 pages
1-Basic Human and Animal Anatomy 2024
No ratings yet
1-Basic Human and Animal Anatomy 2024
34 pages
Techkriti'23: Startup Expo at IIT Kanpur
No ratings yet
Techkriti'23: Startup Expo at IIT Kanpur
5 pages
Basic Electronics Periodic Exam 1
No ratings yet
Basic Electronics Periodic Exam 1
4 pages
Mark Anthony Estrada CV
No ratings yet
Mark Anthony Estrada CV
1 page
Tiếng Anh - Chính Thức - Thiệp
No ratings yet
Tiếng Anh - Chính Thức - Thiệp
7 pages
Hair & Self-Esteem in Grade 11 Students
No ratings yet
Hair & Self-Esteem in Grade 11 Students
48 pages
Texas Edible Wild Plant Foraging Beginner Foraging Field Guide For Finding, Identifying, Harvesting, and Preparing Edible Wild Food
100% (9)
Texas Edible Wild Plant Foraging Beginner Foraging Field Guide For Finding, Identifying, Harvesting, and Preparing Edible Wild Food
27 pages
Networking 1 Tutorial
No ratings yet
Networking 1 Tutorial
12 pages
Schott Tubing Datasheet Glass 8250 English
No ratings yet
Schott Tubing Datasheet Glass 8250 English
1 page
101 Side Hustles
No ratings yet
101 Side Hustles
35 pages
Amurru Akkadian - A Linguistic Study - Shlomo Izre'el - 1991
No ratings yet
Amurru Akkadian - A Linguistic Study - Shlomo Izre'el - 1991
257 pages
Material Handling1
No ratings yet
Material Handling1
2 pages
Circle/or Mark Using Different Colors: The Most Correct Answer of The Following Questions
No ratings yet
Circle/or Mark Using Different Colors: The Most Correct Answer of The Following Questions
7 pages
1 - 1 - Online Traffic School
No ratings yet
1 - 1 - Online Traffic School
3 pages
Forensic Ballistics
No ratings yet
Forensic Ballistics
4 pages
District Literary Fair: Broward County Public Schools
No ratings yet
District Literary Fair: Broward County Public Schools
13 pages
Online Assessment Preparation Guide
No ratings yet
Online Assessment Preparation Guide
2 pages
Ud6 Comunicacion Prof Ingles
No ratings yet
Ud6 Comunicacion Prof Ingles
20 pages
Abnish Kumar Saxena 409
No ratings yet
Abnish Kumar Saxena 409
5 pages
Skill 13 Habit Builder Workbook
No ratings yet
Skill 13 Habit Builder Workbook
92 pages
Guppy Farming
No ratings yet
Guppy Farming
7 pages
Word Transformation Exercise
No ratings yet
Word Transformation Exercise
3 pages
Cs25c03 Ec-Set 1
No ratings yet
Cs25c03 Ec-Set 1
2 pages
Hostavin N 30 Pills
No ratings yet
Hostavin N 30 Pills
1 page
MUHS OBG Sylabus
No ratings yet
MUHS OBG Sylabus
10 pages
Edison 04 Detail 1
No ratings yet
Edison 04 Detail 1
3 pages
Book Accountability Form: Subjects Book Title Code
No ratings yet
Book Accountability Form: Subjects Book Title Code
4 pages
320210BTST60702 - 000 - 00 Hot Insulation
No ratings yet
320210BTST60702 - 000 - 00 Hot Insulation
17 pages
Proposal Thesis
No ratings yet
Proposal Thesis
15 pages
From Landau Theory of Phase Transitions To The Nonlinear SCHR Odinger Equation
No ratings yet
From Landau Theory of Phase Transitions To The Nonlinear SCHR Odinger Equation
11 pages
Buss 1020 Assessment.
No ratings yet
Buss 1020 Assessment.
9 pages

Lecture 2

Uploaded by

Lecture 2

Uploaded by

HOMEWORK DAY 1

Problems and Solutions?

DNA sequencing Protein sequencing

DNA sequencing is blind to post-translational Protein sequencing can objectively uncover

Missing information: Principle and techniques?

• Gene symbol, full name, reviewed by RefSeq

• Location on the human genome (based on GRCh38)

GenBank Record Fields

Searching for ORFs

 The site for transcription and translation elements.

 Other genefinders: FGENESH, GRAIL, GLIMMERM, GENEID, GENEFINDER,

• Introduction to Ensembl database and browser

• EXERCISE: A light exploration of the Ensembl genome

Once nearly everyone believed that only

• ~70% (3/4) of the human genome can be

• >20,000 non-coding RNAs, functionally

GRCh38.p4 (June 29, 2015)

• The goal of Ensembl was to automatically annotate

Exercise 2: A light exploration of the Ensembl genome browser with AKT1

- Exploring genomic information of your genes (from Homework Day 1) using

DEADLINE: 10am Thursday 15th 2021

ORF finder Gene prediction

You might also like