100% found this document useful (1 vote)

281 views39 pages

Biological Databases

The document discusses different types of biological networks and databases. It describes protein-protein interaction networks, metabolic networks, gene regulatory networks, and RNA networks. It then explains different database structures like flat files and relational databases. Finally, it discusses different types of biological databases categorized by content, like primary databases containing original data and secondary databases containing processed information.

Uploaded by

Kasun Bandara

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

281 views39 pages

Biological Databases

Uploaded by

Kasun Bandara

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 39

Biological Databases

Dr. Upeksha Ganegoda

Department of Computational Mathematics

1
Outline
 Different types of biological networks
 Database Structures
 Biological database types based on content

2
Biological Networks
 Protein-protein interaction network
 Metabolic network
 Gene regulatory network
 RNA network

3
Protein-protein interaction network
 A protein can interact with another protein, in order to build
a protein complex or to activate it. By using a protein-
protein interaction network it shows how and which proteins
are interact each other.
 Node represent a protein, Arc represent the interaction
between two protein.
 Can use different types of graph algorithms to identify:
 Protein complexes
 Protein functions
 Protein Hubs
 etc

4
Protein Hub?

Protein complexes?

5
Metabolic network
 Metabolic networks give an in-depth insight of the molecular
mechanisms of a particular organism. It will correlate the
genome with molecular physiology and provide the most
comprehensive of all biological networks.

Ex: Databases such as the Kyoto Encyclopedia of Genes and

Genomes (KEGG) and the Biochemical Genetic and
Genomics knowledgebase (BIGG) contain the metabolic
network of a wide range of species.

6
Gene-regulatory network
 It is a common type of regulatory network
 gene regulatory network consist of DNA segments in a cell
which interact with each other indirectly (by using their
RNA and protein expression products) and with other
materials in the cell to manage the gene expression levels of
mRNA and proteins.

7
RNA networks
 RNA networks show the interaction between RNA-RNA or
RNA-DNA interactions. By understanding the microRNA’s
role in disease, the researchers able to construct microRNA-
gene networks by using predicted microRNA targets
available in public databases such as Target Scan, PicTar,
microRNA, miRBase and miRDB.

8
Representation as a network
Network G = (V, E, w), where V represents the set of proteins,
E is the set of interactions and w denotes the weight of each
interaction
Network can construct as
 Directional graph Ex: gene-regulatory graph
r1

g2
 Bidirectional graph Ex: PPI network r2

p1
9
Main functions of biological databases
 Make biological data available to scientists.
As much as possible of a particular type of information should
be available in one single place (book, site, database). Published
data may be difficult to find or access, and collecting it from
the literature is very time-consuming. And not all data is
actually published explicitly in an article (genome sequences).
 To make biological data available in computer-
readable form.
Since analysis of biological data almost always involves
computers, having the data in computer-readable form (rather
than printed on paper) is a necessary first step.

10
What is a database?
 How can data be stored...

Flat-file format, with fields separated by some delimiter

These data could also be stored in a spreadsheet

What are the problems with this sort of database?

Relational Databases offer a solution...

11
Database structures
 Flat files
 Relational
 Object oriented

A relational database consists of a relations (tables) containing attributes (fields or

columns). Each row in a table is known as a tuple or a record. Information should be
‘normalized’ so that it is non-redundant this means that every row should be unique,
although this ideal is not always observed.

Professor_id First_name Last_name Contact_id

Table 1 Nancy Dengler 1
2 Peter Lewis 2
'Professors' 3 John Coleman 1
4 John Coleman 3

Contact_id Institution Department Address

Table 1 University of Toronto Dept. of Botany 25 Willocks St, Toronto, ON. M5S 3B2
2 Uni. Toronto Dept. of Biochemisty 1 King’s College Circle, Toronto, ON. M5S 1A8
'Contacts' 3 York University Dept. of Biology 4700 Keele St, Toronto, ON. M3J 1P

13
14
Different Database Types
 Primary databases
Contain original biological data. Ex. Raw nucleic acid sequence data from
GeneBank, EMBL database, DNA Data Bank.
 Secondary databases
Contain computationally processed or manually curated information based
on original information from primary database. Ex. SWISS-PROT, TrEMBL
(contain translated nucleic acid sequences), PIR (contain annotated protein
sequences).
 Specialized databases
 This will cater to a particular research interest. Ex. Flybase, WormBase,
AceDB, and TAIR

15
Pitfalls of biological databases
 Overreliance of sequence information without understanding
the reliability of the information.
 High level of redundancy
 Annotations of genes can occasionally be false or incomplete.

16
Accession codes, identifiers
 Many of the biological databases (GenBank, UNIPROT etc.)
have two (or more!) different ways of identifying a given
entry:
• Identifier
• Accession code (or number)

17
 Identifier
An identifier ("locus" in GenBank, "entry name" in UNIPROT) is a
string of letters and digits that understandable in some meaningful way
by a human.

Identifiers are not as stable as accession numbers, mainly because they are
modified by the curators if the presumed function of the protein is found
to be something else.

UNIPROT: B5YME7
GenBank: XM_002295694

An identifier can change. For example, the database curators may decide
that the identifier for an entry no longer is appropriate. This can happen
very rarely.

18
 Accession code (number)

An accession code (or number) is a number (with a few

characters in front) that uniquely identifies an entry. It is often
assigned arbitrarily. For example, the accession code for
B5YME7_THAPS in UNIPROT is B5YME7.

In the case of GenBank, the accession code for the human

BRAC2 gene sequence is XM_002295694.

19
Versions and Gene Indices
In 1992, NCBI began assigning a unique number for each sequence
submitted – the GenInfo Identifier (GI) number. The same accession number
may be associated with a different GI if a newer or corrected sequence is
submitted.

Records typically contain the Accession.Version identifier, such as

XM_002295694.1, in the VERSION line of the record. This identifier is
mapped to its unique corresponding GI number, which is the “primary key”
of GenBank.
To specify a sequence exactly in GenBank, use either its GI or
Accession.Version. To retrieve the most up-to-date sequence, use the
accession number without version.

20
21
22
GenBank Flatfile Format (GBFF)

 The GenBank flatfile format (GBFF) explain the nucleotide sequences of a specific
gene. It contains all of the information associated with the sequence, as well as the
sequence itself.
The GBFF has 3 parts: the header, the features, and the sequence itself.

identifier length source type NCBI entry date

taxonomic group
23
GenBank flatfile format - Header

DEFINITION: The biology of the molecule in a sentence.

ACCESSION: Code(s)
VERSION: Number; GI number
KEYWORDS: Keywords as defined by the submitters

24
SOURCE: Contains organism
name
ORGANISM: Contains complete
taxonomic information from the
NCBI taxonomy server.
REFERENCE: Details on a
publication about the sequence.
COMMENT: Contains misc.
information and revision details.

25
GenBank Flatfile Format – Features

A direct representation of the biological information in the

record.
The Source Feature must be present in all GenBank records, and
contains information as to where the molecule comes from
/organism = “Homo sapiens”, and, potentially, map, chromosome
and tissue type information.
 In some records the CDS (coding sequence) feature is present:

26
27
GenBank Flatfile Format – Sequence
 The last part of the GenBank flat file record is the sequence
itself:

28
Nucleotide Databases – Growth of
GenBank
 from http://www.ncbi.nlm.nih.gov/genbank/statistics

29
Other facilities in NCBI database

30
Disease details

31
Gene Details

32
Gene expression details….

33
34
35
36
37
38
39

Biological Databases: DR Z Chikwambi Biotechnology
No ratings yet
Biological Databases: DR Z Chikwambi Biotechnology
47 pages
Protein Structure Prediction
No ratings yet
Protein Structure Prediction
17 pages
Bioinformatics Notes
No ratings yet
Bioinformatics Notes
40 pages
Bioinformatics Notes
No ratings yet
Bioinformatics Notes
104 pages
Lecture 5-6 - Databases NR
No ratings yet
Lecture 5-6 - Databases NR
35 pages
Bioinformatics II Course Overview
No ratings yet
Bioinformatics II Course Overview
91 pages
Next-Gen Sequencing Overview
No ratings yet
Next-Gen Sequencing Overview
28 pages
Transposable Genetic Element
No ratings yet
Transposable Genetic Element
60 pages
BIOINFORMATICS LAB Report
No ratings yet
BIOINFORMATICS LAB Report
14 pages
Databases Bioinformatics
No ratings yet
Databases Bioinformatics
42 pages
Genome Acquisition - Sangers's Sequencing
100% (1)
Genome Acquisition - Sangers's Sequencing
30 pages
Omics Technology: October 2010
No ratings yet
Omics Technology: October 2010
28 pages
DNA Sequencing at 40 - Past Present and Future
No ratings yet
DNA Sequencing at 40 - Past Present and Future
10 pages
Exer 5 - BIOINFORMATICS
100% (1)
Exer 5 - BIOINFORMATICS
21 pages
Sequence Alignments: Felix Sappelt Irina Wagner
100% (1)
Sequence Alignments: Felix Sappelt Irina Wagner
34 pages
Bioinformatics Pratical File
No ratings yet
Bioinformatics Pratical File
63 pages
Unit 6 - Bioinformatics
No ratings yet
Unit 6 - Bioinformatics
41 pages
LSM2241 Practical 4: Introduction To BLAST
No ratings yet
LSM2241 Practical 4: Introduction To BLAST
12 pages
Lab Report 2 Bioinformatics
No ratings yet
Lab Report 2 Bioinformatics
17 pages
Transcription
100% (1)
Transcription
30 pages
Biological Database Overview
No ratings yet
Biological Database Overview
31 pages
Gene Cloning & DNA Library Basics
100% (1)
Gene Cloning & DNA Library Basics
27 pages
Assignment of Molecular Biology
No ratings yet
Assignment of Molecular Biology
4 pages
DNA Sequencing
100% (1)
DNA Sequencing
31 pages
Nucleic Acid Extraction Methods
No ratings yet
Nucleic Acid Extraction Methods
27 pages
Omics
No ratings yet
Omics
6 pages
Comprehensive Biological Databases List
100% (2)
Comprehensive Biological Databases List
8 pages
Genetic Linkage and Crossing Over
No ratings yet
Genetic Linkage and Crossing Over
34 pages
Blotting Techniques
No ratings yet
Blotting Techniques
28 pages
Biological Database
No ratings yet
Biological Database
8 pages
Bioinformatics Session1
No ratings yet
Bioinformatics Session1
35 pages
Resna N K Microbiology
No ratings yet
Resna N K Microbiology
54 pages
Molecular Biology Basics
No ratings yet
Molecular Biology Basics
25 pages
Emboss (Pairwise Sequence Alignment: Prepared By:-Bansari Patel (19it02) M.Sc. IT (SEM-2
No ratings yet
Emboss (Pairwise Sequence Alignment: Prepared By:-Bansari Patel (19it02) M.Sc. IT (SEM-2
19 pages
Eukaryotic mRNA Processing Steps
100% (1)
Eukaryotic mRNA Processing Steps
20 pages
DNA Packaging
No ratings yet
DNA Packaging
25 pages
Bioinformatics Chapter 3
No ratings yet
Bioinformatics Chapter 3
225 pages
Chemistry of Nucleic Acids..slide
No ratings yet
Chemistry of Nucleic Acids..slide
19 pages
Primer Design For PCR Assignment
100% (1)
Primer Design For PCR Assignment
5 pages
Whole-Genome Shotgun Sequencing - Overview, Steps and Achievements
No ratings yet
Whole-Genome Shotgun Sequencing - Overview, Steps and Achievements
8 pages
Genome Structure for Biochemists
100% (1)
Genome Structure for Biochemists
26 pages
Blast
100% (1)
Blast
21 pages
DNA Sequencing Methods Explained
No ratings yet
DNA Sequencing Methods Explained
64 pages
Molecular Genetic Diagnosis
No ratings yet
Molecular Genetic Diagnosis
47 pages
Bioinformatics History of Bioinformatics
No ratings yet
Bioinformatics History of Bioinformatics
10 pages
Russel - Capt3 - Replicacion
No ratings yet
Russel - Capt3 - Replicacion
25 pages
Clone Identification, Screening, Selection
No ratings yet
Clone Identification, Screening, Selection
21 pages
DNA Microarray
100% (1)
DNA Microarray
34 pages
Lecture 3 - Genome Mapping
No ratings yet
Lecture 3 - Genome Mapping
47 pages
DNA Analysis Sanger Sequencing: Seminar
No ratings yet
DNA Analysis Sanger Sequencing: Seminar
51 pages
Comparing DNA Sequences To Understand Evolutionary Relationships With Blast
No ratings yet
Comparing DNA Sequences To Understand Evolutionary Relationships With Blast
3 pages
Biological Database 1
No ratings yet
Biological Database 1
50 pages
4.1 Vector
No ratings yet
4.1 Vector
108 pages
Bioinformatics Lab Manual
No ratings yet
Bioinformatics Lab Manual
102 pages
BIOT643 Midterm Exam Summer 2016
No ratings yet
BIOT643 Midterm Exam Summer 2016
4 pages
Bioinformatics Database Basics
No ratings yet
Bioinformatics Database Basics
18 pages
4 Bioinformaticsdatabases
No ratings yet
4 Bioinformaticsdatabases
71 pages
Biol BDs Singapore
No ratings yet
Biol BDs Singapore
24 pages
Coursera BioinfoMethods-I Lecture01 r2022 For Slides
No ratings yet
Coursera BioinfoMethods-I Lecture01 r2022 For Slides
16 pages
Biology
No ratings yet
Biology
15 pages
veilleux2021-CLASE 3
No ratings yet
veilleux2021-CLASE 3
8 pages
Zhang Et Al 2023 - Lipid Nanomaterials-Based RNA Therapy and Cancer Treatment PDF
No ratings yet
Zhang Et Al 2023 - Lipid Nanomaterials-Based RNA Therapy and Cancer Treatment PDF
13 pages
Revised Manuscript Td4657 v2
No ratings yet
Revised Manuscript Td4657 v2
65 pages
Biology of Cancer Summary Notes
No ratings yet
Biology of Cancer Summary Notes
72 pages
Biology: Control of Gene Expression
No ratings yet
Biology: Control of Gene Expression
82 pages
Rnai Strategies For Pest Management Methods and Protocols 1St Edition Luis María Vaschetto
100% (5)
Rnai Strategies For Pest Management Methods and Protocols 1St Edition Luis María Vaschetto
72 pages
Cells: The Protective Effect of Exercise in Neurodegenerative Diseases: The Potential Role of Extracellular Vesicles
No ratings yet
Cells: The Protective Effect of Exercise in Neurodegenerative Diseases: The Potential Role of Extracellular Vesicles
26 pages
Nobel Prize 2024 in Physiology or Medicine
No ratings yet
Nobel Prize 2024 in Physiology or Medicine
1 page
Icp2023 03
No ratings yet
Icp2023 03
116 pages
3 s2.0 B9780128185612000047 Main
No ratings yet
3 s2.0 B9780128185612000047 Main
19 pages
RNA Interference
No ratings yet
RNA Interference
26 pages
TCGA Gene Expression Data Classification
No ratings yet
TCGA Gene Expression Data Classification
24 pages
Heat Stroke - Pathogenesis, Diagnosis, and Current Treatment
No ratings yet
Heat Stroke - Pathogenesis, Diagnosis, and Current Treatment
12 pages
Lung Cancer Thesis PDF
100% (2)
Lung Cancer Thesis PDF
4 pages
Diet Et Skin Aging
No ratings yet
Diet Et Skin Aging
25 pages
00 Unit-1-Slides - Handouts
No ratings yet
00 Unit-1-Slides - Handouts
10 pages
Khan Et Al., 2016
No ratings yet
Khan Et Al., 2016
13 pages
DialoguesClinNeurosci 21 417
No ratings yet
DialoguesClinNeurosci 21 417
12 pages
Algorithms in Bioinformatics: A Practical Introduction: Introduction To Molecular Biology
No ratings yet
Algorithms in Bioinformatics: A Practical Introduction: Introduction To Molecular Biology
78 pages
Plant MicroRNAs Methods and Protocols 1st Edition Zhixin Xie (Auth.) Instant Access 2025
No ratings yet
Plant MicroRNAs Methods and Protocols 1st Edition Zhixin Xie (Auth.) Instant Access 2025
167 pages
ROLE OF miRNA IN THE DIAGNOSIS AND PROGRESSION OF PROSTATE CANCER AMONG YOUNG AGE MALE POPULATION
No ratings yet
ROLE OF miRNA IN THE DIAGNOSIS AND PROGRESSION OF PROSTATE CANCER AMONG YOUNG AGE MALE POPULATION
7 pages
ASH SAP 8th Edition 2022
No ratings yet
ASH SAP 8th Edition 2022
751 pages
bbw114 PDF
No ratings yet
bbw114 PDF
17 pages
3361guyton & Hall Physiology Review (Guyton Physiology) 4th Edition John E. Hall PHDPDF Download
100% (4)
3361guyton & Hall Physiology Review (Guyton Physiology) 4th Edition John E. Hall PHDPDF Download
73 pages
Cellular Signaling Article
No ratings yet
Cellular Signaling Article
14 pages
Gene Regulation for Biology Students
No ratings yet
Gene Regulation for Biology Students
23 pages
Livro Toxicology of Metal
No ratings yet
Livro Toxicology of Metal
26 pages
Mapa Mental Epigenetica.
No ratings yet
Mapa Mental Epigenetica.
12 pages
HB-2983-002 HB QIAseq miRNA UDI Lib 0823 WW
No ratings yet
HB-2983-002 HB QIAseq miRNA UDI Lib 0823 WW
52 pages
Easms 75 199-206
No ratings yet
Easms 75 199-206
8 pages

Biological Databases

Uploaded by

Biological Databases

Uploaded by

Biological Databases

Dr. Upeksha Ganegoda

Ex: Databases such as the Kyoto Encyclopedia of Genes and

Flat-file format, with fields separated by some delimiter

These data could also be stored in a spreadsheet

What are the problems with this sort of database?

A relational database consists of a relations (tables) containing attributes (fields or

Professor_id First_name Last_name Contact_id

Contact_id Institution Department Address

An accession code (or number) is a number (with a few

In the case of GenBank, the accession code for the human

Records typically contain the Accession.Version identifier, such as

identifier length source type NCBI entry date

DEFINITION: The biology of the molecule in a sentence.

A direct representation of the biological information in the

You might also like