Introduction to Bioinformatics
1
Objectives
– define the terms bioinformatics;
– explain the scope of bioinformatics;
– To organize vast reams of molecular biology data
in an efficient manner;
– To develop tools that help in the analysis of such
data;
– To interpret the results accurately and
meaningfully.
– describe web-based versus command-line
approaches to bioinformatics.
2
• Left: The connectivity of the internet (from the Wikipedia entry for
“internet”)
• Right: A map of human protein interactions (from the Wikipedia entry for
“Protein–protein interaction”).
• We seek to understand biological principles on a genome-wide scale using
the tools of bioinformatics.
3
… What is Bioinformatics?...
• Bioinformatics
– the study of how information is represented and
transmitted in biological systems
– “research, development, or application of
computational tools and approaches for expanding
the use of biological, medical, behavioral, or health
data, including those to acquire, store, organize,
analyze, or visualize such data.”
4
…What is Bioinformatics?...
– Bioinformatics is being used largely in the field of human
genome research by the Human Genome Project that has
been determining the sequence of the entire human genome
(about 3 billion base pairs)
– and is essential in using genomic information to understand
diseases.
– It is also used largely for the identification of new
molecular targets for drug discovery.
5
… What is Bioinformatics?...
– “Bioinformatics is the branch of biology that is
concerned with the acquisition, storage, display, and
analysis of the information found in nucleic acid and
protein sequence data.”
Altman, R.B. 1998. Bioinformatics in support of molecular medicine. Proceedings of AMIA Symposium 1998, 53–61. PMID: 9929182.
Altman, R.B., Dugan, J.M. 2003. Defining bioinformatics and structural bioinformatics. Methods of Biochemical Analysis 44, 3–14. PMID: 12647379.
6
… What is Bioinformatics?...
• - perspective of
bioinformatics is the
organism.
• Broadening our view
from the level of the
cell to the organism,
we can consider the
individual’s genome
(collection of genes),
including the genes
that are expressed as
RNA transcripts and
the protein products.
7
• For an individual organism, bioinformatics
tools can be applied to describe changes
through developmental time, changes across
body regions, and changes in a variety of
physiological or pathological states.
DNA: Deoxyribo nucleic Acid
8
…What is Bioinformatics?...
• bioinformatics refers to computational
bioinformatics.
• Bioinformatics is a science that involves:
collecting, manipulating, analyzing,
transmitting huge quantities of data
• - an interdisciplinary field that develop
methods and software tools for understanding
biological data.
• - combines : computer science, statistics,
mathematics, engineering to analyze and
interpret biological data 9
…What is Bioinformatics?...
• Techniques used include
– pattern recognition, data mining, machine learning
algorithms, and visualization
• Analyzing biological data to produce
meaningful information
• involves writing and running software
programs that use algorithms from
10
...What is Bioinformatics?...
• Bioinformatics derives knowledge from computer
analysis of biological data.
• These can consist of the
• information stored in the genetic code, experimental results
from various sources, patient statistics, and scientific literature.
• Research in bioinformatics: includes method
development for storage, retrieval, analysis of the data.
11
• Bioinformatics
• informatics,
• statistics,
• mathematics,
• chemistry,
• biochemistry,
• physics,
12
Why is Bioinformatics Important?
• Applications areas include
– Medicine
– Pharmaceutical drug design
• Toxicology - Toxicology
• Does it kill the patient?
• Does it have side effects?
• Does it get to the problem spots?
– Molecular evolution
– Biosensors
– Biomaterials
– DNA computing
13
• as gene sequencing, gene expression studies
and drug discovery.
• For example, in medicine, bioinformatics can
be used to identify links between specific
diseases and the gene sequences that cause
them.
• The field of pharmacogenomics uses
bioinformatics data to tailor/modify medical
treatments to the patients, based on their DNA.
• also be used to develop more effective
vaccines through the development of new, 14
• Structural bioinformatics/genomics
• is the branch of bioinformatics which is related
to the analysis and prediction of the three-
dimensional structure of biological
macromolecules such as proteins, RNA, and
DNA. Deals with generalizations about
macromolecular 3D structure
• Functional genomics
• attempts to answer questions about the function
of DNA at the levels of genes, RNA
transcripts, and protein products. 15
Bioinformatics Software: Two Cultures
Web-based or Command line (often Linux)
graphical user interface (GUI)
Biopython,
Central resources Python, BioPerl, R:
(NCBI, manipulate data files
EBI,)
Data analysis
GUI software software: sequences,
Genome browsers (Partek, MEGA, proteins, genomes
(UCSC, Ensembl) RStudio,
BioMart,
IGV)
Next generation
Galaxy
sequencing tools
(web access
to NGS tools,
browser data)
16
Tool makers and tool users across informatics disciplines
• Many informatics disciplines have emerged in
recent years.
• Bioinformatics is distinguished by its particular
focus on DNA and proteins (impacting its
databases, its tools, and its entire culture). 17
• Bioinformatics Databases
• Meta databases are databases of databases, that collect data about data to generate new data.
• They are capable of merging information from different sources and making it available in a new and more
convenient form,
• (or with an emphasis on a particular disease or organism.)
• A biological database is a large, organized body of persistent data, usually associated with computerized
software designed to update, query, and retrieve components of the data stored within the system.
• A simple database might be a single file containing many records, each of which includes the same set of
information.“
• A few popular databases are GenBank,
SwissProt, EMBL
• GenBank: GenBank (Genetic Sequence
Databank) is one of the fastest growing
repositories of known genetic sequences.
EMBL: The EMBL Nucleotide Sequence 18
• ISCB: http://www.iscb.org/
• NBCI: http://ncbi.nlm.nih.gov/
• http://www.bioinformatics.org/
19