0% found this document useful (0 votes)
19 views37 pages

Bioinformatics 1

The document outlines an introductory bioinformatics course led by Amir Mitchell, detailing the course layout, grading system, and key topics covered over eleven lessons. It emphasizes the integration of computer science and biology in analyzing genetic data, utilizing databases and tools like GCG for sequence analysis. Additionally, it highlights significant milestones in bioinformatics and resources such as the NCBI for further research and data access.

Uploaded by

HuongPham
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views37 pages

Bioinformatics 1

The document outlines an introductory bioinformatics course led by Amir Mitchell, detailing the course layout, grading system, and key topics covered over eleven lessons. It emphasizes the integration of computer science and biology in analyzing genetic data, utilizing databases and tools like GCG for sequence analysis. Additionally, it highlights significant milestones in bioinformatics and resources such as the NCBI for further research and data access.

Uploaded by

HuongPham
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

Introduction to Bioinformatics

Questions & Help


• Amir Mitchell – lecturer.

• Itay Mayros, Einat Hazkani-Covo, and Shira


Mintz – Teaching assistants

• Emails:
mitchel@post.tau.ac.il, itaymay@post.tau.ac.il,
einat@kimura.tau.ac.il, mintzshi@post.tau.ac.il

• Course site

2
Course Layout
• Eleven lessons – eleven weeks.
• Lecture, exercise, discussion.
• Presentations and exercises.
• Books and additional material.
• Missing lessons or exercises.
• Consultation hour.
• Personal gene/protein.
3
Final grade
• Final exam (80%):
– Multiple choice questions
– Open questions
– No online part

• Home assignment (20%)

4
Bioinformatics
• Buzzword …
Nanotechnology, Biotechnology …

Bioinformatics: Bioinformatics is the branch of computer science


that focuses on sub-domains of biology: research on genes and
proteins. Researchers in this field must use powerful computers and
special calculation methods to process the large body of complex data
generated by genetics. Using these tools, it was possible to sequence
the human genome .
Lexicon-encyclobio

5
Two separate approaches
• Computer science - inventing tools,
developing algorithms.

• Biology - Utilizing tools for biological


research.
1. Purely bioinformatics (comparing exon/intron
structure in human and mouse).
2. “Fairly” bioinformatics (Locating the active site of
an enzyme by identifying conserved residues in
the protein sequence).
6
Research outline
Databases (public, local)

Retrieve data

Analysis

Results

Lab (wet biology) Literature


7
Databases & Tools
• Free shared databases (on-line, bioinfo unit)

• Internet based tools (PC)

• GCG package tools (unix)

8
GCG
• Commercial DNA and protein sequence
analysis package.

• Written by Wisconsin Genetics Computing


Group.

• Includes more than 130 separate tools.

9
GCG
• GCG works in unix environment (OS)

• Same principles apply to all GCG programs

• On-line help

10
Divided work
PC1 Unix2 Web
- Databases Databases
(main ones only) (all)
Data storage Data storage -
Tools Tools Tools

1Access(unix and web)


2Advanced analysis, user databases, web site

11
Lesson 1 – Introduction,
Unix environment
1. Administration
2. Introduction to Bioinformatics.
3. NCBI
4. Working in Unix environment

12
Lesson 2 – databases and text
based searching:
1. Databases: organization and entries.
2. Database problems.
3. Principles of database searching.
4. Unix and GCG.

13
Lesson 3 – pairwise alignment

1. Comparing two sequences.


2. Scoring: good and bad alignments.
3. Comparison methods.
4. Comparison programs.
5. Unix.

14
Lesson 4 – Sequence based
searching
1. DNA or protein sequences as search queries.

2. Problems with sequence search.

3. Methods for searching (fasta, blast).

15
Lesson 5 – Multiple sequence
alignment
1. Comparing multiple sequences.
2. Uses of multiple alignment.
3. Methods for multiple alignment, efficiency
and limitations.
4. Profiles and consensus sequences.

16
Lesson 6 – Phylogenies

1. Introduction to phylogeny.

2. Methods for constructing evolutionary trees.

3. Statistical analysis of constructed trees.

17
Lesson 7 – Protein families,
secondary databases
1. Dividing proteins into families.
2. Patterns.
3. Different approaches: motifs, fingerprints.
4. Different databases.
5. Consurf.

18
Lesson 8 – DNA sequence
analysis
1. Gene structure.
2. Gene finding.
3. Predicting gene features.
4. Consurf.

19
Lesson 9 - genomes
• Genome features.
• Prokaryotic and Eukaryotic genomes.
• Genome viewers
• Model organisms

20
Lesson 10 - Various tools
• Making things easy, useful tools for lab
work.

Lesson 11 - Summary
• Overview, Q&A before the exam.

21
Last comments
• Introduction only.

• Finding sites: Links and google.

• Biology background.

• Unix accounts.

• Terminology

22
Milestones in bioinformatics
1965 Theory of molecular evolution (Zuckerkandl & Pauling)
1967 Atlas of protein sequences (Dayhoff)
1970 Global alignment algorithm (Needleman, Wunsch)
1981 Local alignment algorithm (Smith, Waterman)
1981 Sequence motif concept (Doolittle)
1982 GenBank made public
1982 Phage lambda genome fully sequenced
1983 Database search algorithm (Wilbur, Lipman)
1985 Fast sequence similarity searching
1990 Blast
1991 ESTs
23
* 1953 Watson and Crick
Milestones in bioinformatics
1995 First bacterial genome fully sequenced H. influenzae
1996 Yeast genome fully sequenced
1997 C. elegans genome fully sequenced
1999 Fruit fly genome fully sequenced
2000 Human genome fully sequenced (draft)

24
Today …
• Over 1500 fully sequenced genomes from
all domains of life.

• Numerous databases.

• Numerous tools.

25
Today …

Archea (16)

Eukarya (20)

Bacteria (139)

Viruses (1500)
26
Examples
• Human , mouse, rat, zebra fish, drosophila,
yeast, anopheles, tomato, rice, wheat.

• E. coli (4 strains), M. tuberculosis, M.


leprae.

• Mitochondria, chloroplast, plasmids.


27
Public interest:
Human Genome Project
• 2000 - Working draft of the Genome, work of 20
groups world wide.
(http://www.ncbi.nlm.nih.gov).
• 2003 - Obtain a complete, high-quality genomic
sequence.
• Determine the sequences of the 3 billion bases.
• Identify all the estimated 30,000 genes in human
DNA

28
Human Genome Project

Chromosome 21
9 May, 2000

Chromosome 22
2 Dec, 1999

Initial analysis
15 Feb, 2001

29
NCBI – at a glance
The biggest and most comprehensive site!
Includes numerous tools and databases!

30
NCBI - overview
PubMed OMIM

Books Exp’ profiles

Structure
NCBI Nucleotides

Domains Proteins

Taxonomy Genomes
31
* Cross references between the databases
NCBI
PubMed

• Citations, abstracts, full articles.

Books

• Online books, full text from books (Cell,


introduction to genetic analysis)

32
NCBI
OMIM

• Online Mendelian Inheritance in Man. A


comprehensive database of human genes
and genetic disorders.

• Entries include textual information and


,most importantly, references to literature
and sequences.
33
NCBI
GEO

• Gene Expression Omnibus


Results from a high throughput
experiments. mRNA, DNA, and protein
arrays.

34
NCBI
Genomes Nucleotides Proteins

• Sequence databases. Divided into sections


and sub-sections.

Domains

• Protein domains, both conserved sequence


domains and 3D domains.

35
NCBI
Structure

• 3D structure of proteins (~20,000 entries).

Taxonomy

• Taxonomy of all organisms found in NCBI

36
NCBI - Interconnectivity
PubMed OMIM

Books Exp’ profiles

Structure
NCBI Nucleotides

Domains Proteins

Taxonomy Genomes
37
* Cross references between the databases

You might also like