0% found this document useful (0 votes)
14 views47 pages

Lecture 1and 2 Introduction

The document provides an introduction to bioinformatics, highlighting its integration of biology, computer science, and information technology to solve biological problems. It covers various related fields such as genomics, proteomics, and medical informatics, as well as the history and development of bioinformatics and biological databases. Additionally, it details the types of biological databases, their architecture, and examples of important databases for genomic and protein data.

Uploaded by

Its Zainu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views47 pages

Lecture 1and 2 Introduction

The document provides an introduction to bioinformatics, highlighting its integration of biology, computer science, and information technology to solve biological problems. It covers various related fields such as genomics, proteomics, and medical informatics, as well as the history and development of bioinformatics and biological databases. Additionally, it details the types of biological databases, their architecture, and examples of important databases for genomic and protein data.

Uploaded by

Its Zainu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 47

Introduction to

Bioinformatics
&
Biological Databases
Ishtiaq Ahmad
IIB GCU Lahore
What Is Bioinformatics?
• Bioinformatics is the unified discipline formed
from the combination of biology, computer
science, and information technology.
• "The mathematical, statistical and computing
methods that aim to solve biological problems
using DNA and amino acid sequences and
related information.“ –Frank Tekaia
A Molecular Alphabet
• Macromolecules are polymers of monomers
• All monomers belong to the same general class,
but there are several types with distinct and well-
defined characteristics
• Many monomers can be joined to form a single,
large macromolecule; the ordering of monomers
in the macromolecule encodes information, just
like the letters of an alphabet
Other related Fields:
Computational Biology
• The study and application of computing
methods for biological data
• Primarily concerned with the computation
of data related to evolutionary, population
and theoretical biology aspects.
Related Fields:
Medical Informatics
• The study and application of computing
methods to improve communication,
understanding, and management of
medical data
• Generally concerned with how the data is
manipulated rather than the data itself.
Related Fields:
Cheminformatics
• The study and application of computing
methods, along with chemical and
biological technology, for drug design and
development
Related Fields:
Genomics
• Analysis and comparison of the entire
genome of a single species or of multiple
species
• A genome is the set of all genes
possessed by an organism
• Genomics existed before any genomes
were completely sequenced, but in a very
primitive state
Related Fields:
Proteomics
• Study of how the genome is expressed in
proteins, and of how these proteins
function and interact
• Concerned with the actual states of
specific cells, rather than the potential
states described by the genome
Related Fields:
Pharmacogenomics
• The application of genomic methods to
identify drug targets
• For example, searching entire genomes
for potential drug receptors, or by studying
gene expression patterns in tumors
Related Fields:
Pharmacogenetics
• The use of genomic methods to determine
what causes variations in individual
response to drug treatments
• The goal is to identify drugs that may be
only be effective for subsets of patients, or
to tailor drugs for specific individuals or
groups
History of Bioinformatics
• Genetics
• Computers and Computer Science
• Bioinformatics
History of Genetics
• Gregor Mendel
• Chromosomes
• DNA
History of Chromosomes
• Walter Flemming
• August Weissman
• Theodor Boveri
• Walter S. Sutton
• Thomas Hunt Morgan
History of Computers
Computer Timeline
• ~1000BC The abacus
• 1621 The slide rule invented
• 1625 Wilhelm Schickard's mechanical calculator
• 1822 Charles Babbage's Difference Engine
• 1926 First patent for a semiconductor transistor
• 1937 Alan Turing invents the Turing Machine
• 1939 Atanasoff-Berry Computer created at Iowa State
– the world's first electronic digital computer
• 1939 to 1944 Howard Aiken's Harvard Mark I (the IBM ASCC)
• 1940 Konrad Zuse -Z2 uses telephone relays instead of mechanical logical
circuits
• 1943 Collossus - British vacuum tube computer
• 1944 Grace Hopper, Mark I Programmer (Harvard Mark I)
• 1945 First Computer "Bug", Vannevar Bush "As we may think"
Computer Timeline (cont.)
• 1948 to 1951 The first commercial computer – UNIVAC
• 1952 G.W.A. Dummer conceives integrated circuits
• 1954 FORTRAN language developed by John Backus (IBM)
• 1955 First disk storage (IBM)
• 1958 First integrated circuit
• 1963 Mouse invented by Douglas Englebart
• 1963 BASIC (standing for Beginner's All Purpose Symbolic Instruction Code) was written (invented) at Dartmouth
College, by mathematicians John George Kemeny and Tom Kurtzas as a teaching tool for undergraduates
• 1969 UNIX OS developed by Kenneth Thompson
• 1970 First static and dynamic RAMs
• 1971 First microprocessor: the 4004
• 1972 C language created by Dennis Ritchie
• 1975 Microsoft founded by Bill Gates and Paul Allen
• 1976 Apple I and Apple II microcomputers released
• 1981 First IBM PC with DOS
• 1985 Microsoft Windows introduced
• 1985 C++ language introduced
• 1992 Pentium processor
• 1993 First PDA
• 1994 JAVA introduced by James Gosling
• 1994 Csharp language introduced
Putting it all Together
• Bioinformatics is basically where the findings in genetics
and the advancement in technology meet in that
computers can be helpful to the advancement of
genetics.
• Depending on the definition of Bioinformatics used, or
the source , it can be anywhere between 30 to 55 years
old
– Bioinformatics like studies were being performed in
the ’60s long before it was given a name
• Sometimes called “molecular evolution”
– The term Bioinformatics was first published in 1991
Genomics
• Classic Genomics
• Post Genomic era
– Comparative Genomics
– Functional Genomics
– Structural Genomics
What is Genomics?
• Genome
– complete set of genetic instructions for
making an organism
• Genomics
– any attempt to analyze or compare the entire
genetic complement of a species
– Early genomics was mostly recording genome
sequences
History of Genomics
• 1995
– Haemophilus influenzea genome sequenced (flu bacteria, 1.8 Mb)
• 1996
– Saccharomyces cerevisiae (baker's yeast, 12.1 Mb)
• 1997
– E. coli (4.7 Mbp)
• 2000
– Pseudomonas aeruginosa (6.3 Mbp)
– A. thaliana genome (100 Mb)
– D. melanogaster genome (180Mb)
2001 The Big One
• The Human Genome sequence is
published
– 3 Gb
What next?
• Post Genomic era
– Comparative Genomics
– Functional Genomics
– Structural Genomics
Comparative Genomics
• the management and analysis of the
millions of data points that result from
Genomics
– Sorting out the mess

Comparative genomics involves the management


and analysis of vast amounts of data resulting from
genomics studies.
Functional Genomics
• Other, more direct, large-scale ways of
identifying gene functions and
associations
– (for example yeast two-hybrid methods
Functional genomics aims to directly identify the
functions and associations of genes within a
genome. It involves large-scale methods for
studying gene functions, interactions, and
regulatory mechanisms.
Structural Genomics
• emphasizes high-throughput, whole-
genome analysis.
– outlines the current state
– future plans of structural genomics efforts
around the world and describes the possible
benefits of this research
Structural genomics emphasizes high-throughput
analysis of the 3D structures of biomolecules, such
as proteins and nucleic acids, at a genome-wide
scale. It seeks to determine the structures of all the
proteins encoded by an organism's genome.
What Is Proteomics?
• Proteomics is the study of the proteome—
the “PROTEin complement of the
genOME”
• More specifically, "the qualitative and
quantitative comparison of proteomes
under different conditions to further
unravel biological processes"
What Makes Proteomics
Important?
• A cell’s DNA—its genome—describes a
blueprint for the cell’s potential, all the
possible forms that it could conceivably
take. It does not describe the cell’s actual,
current form, in the same way that the
source code of a computer program does
not tell us what input a particular user is
currently giving his copy of that program.
What Makes Proteomics
Important?
• All cells in an organism contain the same DNA.
• This DNA encodes every possible cell type in
that organism—muscle, bone, nerve, skin, etc.
• If we want to know about the type and state of a
particular cell, the DNA does not help us, in the
same way that knowing what language a
computer program was written in tells us nothing
about what the program does.
Biological Databases
• Biological databases are the collection of
biological data organized and annotated in such
form that can be reused for research purposes.
• Source of the data contained in the biological
databases can be highly sophisticated
experimental results, published literature or
computational analyses related to taxonomy,
phylogeny, genomics, proteomics, microarray
gene expression etc.
Basic Components of Biological
Database Architecture
Biological database design, development
and management are the basic areas in
bioinformatics, which requires following;
rational database management system
• RDBMS programs from computer Science.

• Information retrieval system from digital


libraries.
Information in Biological Databases

The information contained in different


biological databases may be
• A gene or protein sequence, SwissProt,
GenBank etc.
• Descriptions in text form.
• Ontological classification
• Citation record
• Tables
Data Formats of Biological
Databases

Majority of them contain semi-structured


data in form of text descriptions
• Tabular data.
• Tab or space delimited data records.
• XML data format. extensible markup language
• Cross referencing other databases.
Primary Sequence Databases
• Genome sequence
- Nucleotide sequence of gene(s)
- DNA and RNA

• Proteome sequence
- Amino acid sequence of proteins
expressed or derived from the gene
sequences
Genome Databases
• Collect, organize, annotate, analyze and
manage the whole genome sequence of
single or different organisms.
Examples: Corn, a database of maize genome
Ensembl, a database of human, mouse, other
vertebrates and eukaryotes genomes
• These databases are accessible publicly
Important Genome Databases
• Corn: Maiz genome www.maizgdb.org
Education Resources
• ERIC: Enteropathogen genome www.ericbrc.org Information Center
• National Microbial Pathogen Data Resource www.nmpdr.org
• JGI Genomes: Eukaryote and microbial genome joint genome institute
http://genome.jgi.doe.gov/
• MGI Mouse Genome www.informatics.jax.org mouse genome institute
• Wormbase: C. elegans genome
• Flybase: Genome of fruit fly
• Saccharomyces Genome Database: Genome of yeast model organism
• Ensembl: Human, mouse, other vertebrates and eukaryotic genome
database www.ensembl.org
• TAIR: Arabidoopsis http://arabidopsis.org
The Arabidopsis Information Resource
Nucleotide (Gene) Sequence Databases

• DDBJ: DNA Data Bank of Japan


http://www.ddbj.nig.ac.jp/Welcome-e.html
• EMBL Nucleotide DB: European Molecular Biology
Laboratory http://www.ebi.ac.uk/embl/index.html
• GenBank: National Center for Biotechnology Information (NCBI)
www.pubmed.com
Protein Sequence Databases
Protein sequences have been stored in
different databases as annotations
containing general and specific details
about different aspects of protein
properties and features along with
sequence details of each protein.
List of Protein Sequences Databases
• Uniprot: http://www.ebi.ac.uk/, http://expasy.org
• PIR: http://www-nbrf.georgetown.edu/pir/searchdb.html
• SwissProt: http://expasy.org
• PROSITE: Database of Protein Families and Domains
www.expasy.org/prosite
• DIP: Database of Interacting Proteins sequences and
structures http://dip.doe-mbi.ucla.edu/
• Pfam: Protein families database of alignments and
HMMs http://www.sanger.ac.uk/Software/Pfam
• ProDom: Comprehensive set of Protein Domain Families
http://protein.foulouse.inra.fr/prodom/current/html/home.
php
Protein Structure Databases
• Protein Data Bank (PDB) www.rcsb.org
• CATH (Class, Architecture, Topology,
Homologous super-family): Protein structure
classification www.cathdb.info
• SCOP: Structural Classification of protein
http://scop.mrc-lmb.cam.ac.uk/scop/
• PDBe: www.ebi.ac.uk/pdbe/
• SWISS-MODEL: A Server and collection of
protin structures from PDB acting as templates
http://swissmodel.expasy.org//SWISS-
MODEL.html
• ModBase: A database of comparative structure
Models of proteins http://salilab.org/modbase
Protein-Protein Interaction Databases

• STRING: A database of experimental &


predicted protein-protein interactions
http://string.embl.de/
• DIP: Database of Interacting Proteins
sequences and structures http://dip.doe-
mbi.ucla.edu/
• BIND: A database of biomolecular
interaction network www.bind.ca
Metabolic Pathway Databases
• BioCyc: A collection of 3563
Pathway/Genome Databases
with tools for understanding their
data http://biocyc.org/
• KEGG: Kyoto Encyclopedia of Genes and Genomes
• MANET Molecular Ancestry Networks
• Reactome
Microarray-Gene Expression Databases

• ArrayExpress (EBI)
• Gene Expression Omnibus (NCBI)
• maxd (Univ. of Manchester)
• SMD (Stanford University)
• GPX (Scottish Centre for Genomic
Technology and Informatics)
Mathematical Model Databases

• CellML: http://www.cellml.org/models

• Biomodels: http://www.ebi.ac.uk/biomodels/
PCR Primer Databases

• PathoOligoDB: A free QPCR oligo


database for pathogens
Meta-Databases
A type of database source or platform
hosting different database sources
presenting the data of these databases in
a new and rather simpler and unified form
or containing the information of that
particular gene or protein with its
implication to a specific disease etc.
Entrez is one of the example of a meta-
database.
Major Meta Databases
• Entrez
• euGenes
• GeneCards
• SOURCE
• mGen
• Bioinformatic Harvester
• MetaBase
Questions and Answers

You might also like