0% found this document useful (0 votes)

34 views28 pages

Lec 2 PDF

This document provides an overview of Biopython, a widely used Python package for bioinformatics. It discusses why Python is well-suited for bioinformatics applications due to its cross-platform use, built-in features, dynamic and modular nature. The document then describes several popular Python tools for bioinformatics including Biopython, PyMOL, Scikit-learn, and NumPy. Biopython is highlighted as an open-source collection of Python modules for biological computations that can work with DNA, RNA, protein sequences and structures. The document also discusses common data types used as inputs for Biopython.

Uploaded by

ziadmohamad3412

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views28 pages

Lec 2 PDF

Uploaded by

ziadmohamad3412

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 28

BioPython

Edited by
Python Programming
in bioinformatics
Why Python?
 Python can be installed and used on different platforms,
including Windows, Mac, and Linux.
 Python has several built-in features that make it well-
suited for bioinformatics applications.
 Python‟s dynamic and modular nature allows researchers
to reuse and share code, reducing development time and
increasing productivity.
 Python has a relatively simple syntax, making it easy to
learn and use.
 Python is a high-level language that offers advanced data
structures and functions that make it easy to work with
complex biological data.
Tools for Python Programming in
Bioinformatics

1. Biopython
 One of the most widely used bioinformatics packages for Python. Biopython is an
open-source collection of Python modules that provides a set of powerful and easy-
to-use tools for performing biological computations.
 Biopython requires very less code and comes up with the following advantages −

 Some of the tasks of Biopython are:

 Biopython provides tools for working with DNA, RNA, and protein sequences,
including sequence alignment, motif and pattern matching, and translation between
nucleotide and protein sequences.
 Biopython includes tools for working with protein structures, such as parsing and
manipulating PDB files and performing structure comparisons.
 Biopython supports file formats commonly used in bioinformatics, such as FASTA,
GenBank, and BLAST.
 Biopython includes tools for visualizing biological data, such as sequence alignment
plots and phylogenetic trees.
 BioSQL − Standard set of SQL tables for storing sequences plus features and
annotations.
2. PyMOL
 PyMOL is a free and open-source molecular
visualization software used in bioinformatics. It creates
high-quality images and animations of molecular
structures, which can be useful in a variety of applications
including drug discovery, protein engineering, and
molecular biology research.
 PyMOL is written in Python and can easily integrate with
other Python-based tools and libraries.
3. Scikit-learn
 Scikit-learn is a Python library that provides tools for machine
learning. It is a powerful and flexible tool for machine learning
applications in bioinformatics which provides a wide range of
algorithms and tools that can be used to analyze complex
biological datasets and make predictions about biological
systems.
 Some uses of Scikit-learn in bioinformatics are:
 It can be used to classify biological samples based on gene
expression data or proteomics data.
 It can be used to cluster biological samples or reduce the
dimensionality of large datasets.
 It can be used to develop machine learning models to predict
the structure of proteins and protein-protein interactions
based on their amino acid sequences.
4. NumPy (Numerical Python)
 NumPy is a Python library that is used for working with
numerical data in Python. It is extensively used in Pandas,
SciPy, Matplotlib, Scikit-learn, and many other scientific
Python packages. NumPy provides a multidimensional
array object called „ndarray‟ and can be used to perform a
wide range of mathematical operations on arrays.
To install and import Biopython:
What are the input data types for Biopython?
 Text file:
 1. Sequence file (sequence.txt)

 2. Cell Microarray
What are the input data types for Biopython?

 CSV file:

 FASTA File:
What are the input data types for Biopython?

 Other files format like:

 Blast output
 GenBank
 PubMed and Medline
 SCOP, including „dom‟ and „lin‟ files
 UniGene
 SwissProt
Overview on

Some key notes in Python

Data Types: Number Types

int, float, complex

1. Integer 2. Real 3. Complex

numbers: numbers: numbers:
>>> type(4) >>> type(4.5) >>> type(3+2j)

>>> (2+1j)**2
>>> 17/5
(3+4j)

3
Data Types: Strings

 Single quote:
>>> ’atg’
’atg’

 Double quote:
>>> ”atg”
’atg’
>>> ’This is a codon, isn’t it?’
Invalid Syntax

>>> ” This is a codon, isn’t it?” # Or >>> ’This is a codon, isn\’t it?’

This is a codon, isn’t it?

String Operators
 Escape character: Backslash „\‟ , gives special meaning for the
following character.
 To produce more readable outputs: print()
 String Operators: Construct Meaning
 Concatenate + \n Newline
 Copy or replicate * \t Tab
 Checks if first IS in second string in \\ Backslash
 Checks if first IS NOT in second string not in \” Double Qoute
>>> ’atg’ + ’gcc’
’atggcc’
>>> ’atg’ * 3
’atgatgatg’
>>> ’tg’ in ’atgatgatg’
True
>>> ’tc’ in ’atgatgatg’
False
Variables
 Variables are containers that store numbers, strings,
and other data types and structures.
 Variables are names given to values that can be changed.
 Variables are assigned values using the equal sign (=).
>>> codon = ’tag'
>>> dna_sequence = "gtcgcctaaccgtatatttttcccgt"
 A variable cannot be used if not assigned a
value, an error occurs.
>>> dna

NameError: name 'dna' is not defined

Variables
 Naming
 Select meaningful names: dnaSequence, is better than s.
 Follow naming rules:
 Case-sensitive :
 DnaSequence = 1
 DNASEQUENCE = 2
 Dnasequence = 3
 Consists of letters and numbers combinations, and
underscore.
 Dna1, dna_1, dnaSeq.
 Numbers should not be the first letter.
 Invalid: 1dna
 No special characters.
 dna#, dna@1
String Operators
 [i] : returns the character in index i in a string. (index)
 [i:j] : returns the substring between index i and index j in a string. (slice)
>>> dna="gatcccccgatattatttgc”
>>> dna[0]
'g’ - The first position in a string is position 0
>>> dna[-1]
'c’ - Counting from the right using negative
indices, begins with -1
>>> dna[-2]
'g’
>>> dna[0:3]
'gat’ - In slices: Start index included, end index
excluded
>>> dna[:3]
‘gat’ - Ommiting start index means use default, 0
>>> dna[2:]
‘tcccccgatattatttgc’ - Ommiting end index means use default, end
of string
Strings as Objects
• String variables are objects that can perform specific
actions using built-in methods:
>>> dna="gatcccccgatattatttgc
>>> len(dna)
20
>>> dna.count(‟t') - Count characthers
7
>>> dna.count(‟ga') - Count substrings
2
Strings Functions
>>> dna="gatcccccgatattatttgc”
>>> dna.upper() - Convert all to upper case, lower(): Lower
case
GATCCCCCGATATTATTTGC
>>> dna.find(‟ga') - Returns the first occurrence of „ga‟, -1 if not
found
0
>>> dna.find(‟at‟,5) - Returns the first occurrence of „ga‟ starting
from index 5
9
>>> dna.rfind(„ga‟) - Returns the last occurrence of „ga‟, -1 if not
8
>>> dna.islower() - True if all is lower case
True
>>> dna.isupper()
False
>>> dna.replace('a','A') - Replaces all ‟a‟ with ‟A‟
Inputs

>>> dna = input("Enter a DNA sequence, please:")

Enter a DNA sequence, please: agtagcatgaggagggacttc
>>> dna
agtagcatgaggagggacttc
Examples:
 Create a random DNA sequence of length 10
import random
alphabet = "AGCT"
sequence = ""
for i in range(10):
index = random.randint(0, 3)
sequence = sequence + alphabet[index]
Read from a text file
readlines(). read().

•readlines(x); read up to x bytes. If you read(x); read up to x bytes in a file. If

don’t supply a size, it reads all the data you don’t supply the size, it reads the
until it reaches a newline (\n) or the end entire file.
of a paragraph. The output is displayed as strings only
once.
Write a text file
Notes about file modes

What is + means in open()?

• The + adds either reading or writing to an existing open mode (update mode).
• The r means reading file; r+ means reading and writing the file.
• The w means writing file; w+ means reading and writing the file.
• The a means writing file, append mode; a+ means reading and writing file, append mode.
Examples:
 Difference between r and r+ in open()

with open('file.txt„, „r‟) as f: with open('file.txt', 'r+') as f:

print(f.read()) f.write("new line \n")
Output Output
On Terminal On Terminal
new line
welcome to python 1
welcome to python 1
welcome to python 2
welcome to python 2
welcome to python 3
welcome to python 3
welcome to python 4
welcome to python 4

with open('file.txt', 'r') as f:

f.write("test \n")
io.UnsupportedOperation: not writable
Examples:
 Difference between w and w+ in open()
with open('file.txt', 'w+') as f: with open('file.txt', 'w+') as f:
f.write("test 1\n") f.write("test 1\n")
f.write("test 2\n") f.write("test 2\n")
f.write("test 3\n") f.write("test 3\n")
Output f.seek(0)
file.txt lines = f.read()
test 1
test 2
print(lines)
test 3 Output
Terminal
test 1
test 2
test 3

Note: f. seek(0)  move the file pointer to begining

Examples:
 Difference between a and a+ in open()
with open('file.txt', 'a') as f: with open('file.txt', 'a+') as f:
f.write(“3") f.seek(0)
Output lines = f.readlines()
file.txt f.write("\n" + str(len(lines)))
welcome to python 1
welcome to python 2 Output
welcome to python 3 file.txt
welcome to python 4 welcome to python 1
3 welcome to python 2
welcome to python 3
welcome to python 4
4
Assignment
 Apply all the discussed functions on a text file produced
by your self

Python For Bioinformatics - Docx (PDFDrive)
No ratings yet
Python For Bioinformatics - Docx (PDFDrive)
15 pages
Python Lectures 2
No ratings yet
Python Lectures 2
28 pages
Python Datatype
No ratings yet
Python Datatype
13 pages
Python Introduction
No ratings yet
Python Introduction
122 pages
An Introduction To Python and Its Use in Bioinformatics: Dr. Nancy Warter-Perez
No ratings yet
An Introduction To Python and Its Use in Bioinformatics: Dr. Nancy Warter-Perez
30 pages
Red: What You'd Actually Put in Python File: Common Differences Between Python and MATLAB, Ways To Approach Python
No ratings yet
Red: What You'd Actually Put in Python File: Common Differences Between Python and MATLAB, Ways To Approach Python
19 pages
Python Cheat-Sheet PDF
100% (2)
Python Cheat-Sheet PDF
29 pages
2B Strings
No ratings yet
2B Strings
23 pages
Bioinformatics Programming Using Python 1st Edition Mitchell L. Model Fast Download
No ratings yet
Bioinformatics Programming Using Python 1st Edition Mitchell L. Model Fast Download
85 pages
02 Handling Files
No ratings yet
02 Handling Files
18 pages
Unit - 4
No ratings yet
Unit - 4
27 pages
664 PythonBasics PDF
100% (1)
664 PythonBasics PDF
42 pages
Python Basics (By Mark Wickert)
No ratings yet
Python Basics (By Mark Wickert)
42 pages
2B Strings
No ratings yet
2B Strings
26 pages
Python For Biologist
No ratings yet
Python For Biologist
24 pages
05 ModulesFiles
No ratings yet
05 ModulesFiles
16 pages
Python in Earth Science
No ratings yet
Python in Earth Science
86 pages
Beel 1234 Lab 1 - Introduction To Python Programming
No ratings yet
Beel 1234 Lab 1 - Introduction To Python Programming
10 pages
Bio Python 202111
No ratings yet
Bio Python 202111
63 pages
Python
No ratings yet
Python
60 pages
Numerical Computing: Scilab
No ratings yet
Numerical Computing: Scilab
33 pages
Python Notes Sarang Sir
No ratings yet
Python Notes Sarang Sir
24 pages
oG1M8adGXOGe DHBiQVrXgXHO6GrHU01tHWZgd tpRqUW65xGX9ufzrZMtM6hjBWlvlYViPn6r2Cgghq2M8oiXNNdf0HeL-DQvJKWM
No ratings yet
oG1M8adGXOGe DHBiQVrXgXHO6GrHU01tHWZgd tpRqUW65xGX9ufzrZMtM6hjBWlvlYViPn6r2Cgghq2M8oiXNNdf0HeL-DQvJKWM
42 pages
Python Numpy-Github - Io
No ratings yet
Python Numpy-Github - Io
25 pages
Python for Scientific Computing
No ratings yet
Python for Scientific Computing
63 pages
4 2. Sequences
No ratings yet
4 2. Sequences
39 pages
Basic Python For Scientists: Pim Schellart May 27, 2010
No ratings yet
Basic Python For Scientists: Pim Schellart May 27, 2010
21 pages
PROGRAMMING
No ratings yet
PROGRAMMING
5 pages
Python Numpy Tutorial
No ratings yet
Python Numpy Tutorial
22 pages
Session One
No ratings yet
Session One
43 pages
Ec 02 2023
No ratings yet
Ec 02 2023
82 pages
Unit 4
No ratings yet
Unit 4
138 pages
A Summer Training Report On Python and It's Libraries Under The Guidance of
No ratings yet
A Summer Training Report On Python and It's Libraries Under The Guidance of
20 pages
RemoveWatermark PYTHON+MID2
No ratings yet
RemoveWatermark PYTHON+MID2
8 pages
Babaoskag
No ratings yet
Babaoskag
76 pages
Python
No ratings yet
Python
71 pages
NumPy, SciPy and MatPlotLib
100% (1)
NumPy, SciPy and MatPlotLib
18 pages
PPS Unit-4
No ratings yet
PPS Unit-4
182 pages
Python Scripting For System Administration: Rebeka Mukherjee
No ratings yet
Python Scripting For System Administration: Rebeka Mukherjee
50 pages
Python Programming Basics Guide
100% (1)
Python Programming Basics Guide
29 pages
1.1 (Co1, Co2)
No ratings yet
1.1 (Co1, Co2)
25 pages
Module 4
No ratings yet
Module 4
16 pages
BASIC - FUNCTIONALITIES - OF - PYTHON (1) Vikas
No ratings yet
BASIC - FUNCTIONALITIES - OF - PYTHON (1) Vikas
52 pages
Intro To Scientific Python (2018-01-23) PDF
No ratings yet
Intro To Scientific Python (2018-01-23) PDF
16 pages
Getting Started With Python in The Lab
No ratings yet
Getting Started With Python in The Lab
18 pages
INTRODUCTION TO PYTHON Version 1 WITH SO
No ratings yet
INTRODUCTION TO PYTHON Version 1 WITH SO
158 pages
2.python Basic
No ratings yet
2.python Basic
85 pages
Python Programming Language
No ratings yet
Python Programming Language
15 pages
Analytics Python Programming
92% (13)
Analytics Python Programming
203 pages
Python AOS
No ratings yet
Python AOS
209 pages
m1 Python Notes
No ratings yet
m1 Python Notes
51 pages
Python Tutorial
No ratings yet
Python Tutorial
114 pages
Introduction To Python: Arun Kumar
No ratings yet
Introduction To Python: Arun Kumar
41 pages
Q-Step WS 02102019 Practical Introduction To Python
No ratings yet
Q-Step WS 02102019 Practical Introduction To Python
88 pages
Q1.What Is Dictionary Ans - Dictionaries in Python Is A Data Structure, Used To
No ratings yet
Q1.What Is Dictionary Ans - Dictionaries in Python Is A Data Structure, Used To
23 pages
Python and Data Tools for IT Students
No ratings yet
Python and Data Tools for IT Students
23 pages
Manual Lab 1
No ratings yet
Manual Lab 1
15 pages
CT Hndpyth
No ratings yet
CT Hndpyth
11 pages
Python Basics: A Comprehensive Guide
No ratings yet
Python Basics: A Comprehensive Guide
61 pages
Comparative Analysis of KNIME and Celonis
No ratings yet
Comparative Analysis of KNIME and Celonis
4 pages
Manual k3 2017 Eng
No ratings yet
Manual k3 2017 Eng
43 pages
IP v6 Application Compatibility List
No ratings yet
IP v6 Application Compatibility List
5 pages
Asgnmt 02 CPP en 16
No ratings yet
Asgnmt 02 CPP en 16
5 pages
Kerberos and Netlogon Changes-V2
No ratings yet
Kerberos and Netlogon Changes-V2
2 pages
BIT 2310 Management Information Systems Year Iv Semester I
No ratings yet
BIT 2310 Management Information Systems Year Iv Semester I
2 pages
Studio Strings: Spitfire Audio
No ratings yet
Studio Strings: Spitfire Audio
27 pages
Olusoji Ayeni - Resume072521
No ratings yet
Olusoji Ayeni - Resume072521
4 pages
Microprocessor and Peripherals Interfacing Notes: Course Code: ECC501 Class: TE-EXTC Mumbai University
No ratings yet
Microprocessor and Peripherals Interfacing Notes: Course Code: ECC501 Class: TE-EXTC Mumbai University
10 pages
ITTC - Recommended Procedures and Guidelines: Full Scale Manoeuvring Trials
No ratings yet
ITTC - Recommended Procedures and Guidelines: Full Scale Manoeuvring Trials
18 pages
Digital Water Flow Meter
No ratings yet
Digital Water Flow Meter
1 page
Digital Transformation of SAP Supply Chain Processes... 2024 - 215 PP
No ratings yet
Digital Transformation of SAP Supply Chain Processes... 2024 - 215 PP
215 pages
New Oriental 1000NA - 202110
100% (1)
New Oriental 1000NA - 202110
10 pages
Spring Boot With MongoDB
No ratings yet
Spring Boot With MongoDB
16 pages
SAP HANA Master Guide en
No ratings yet
SAP HANA Master Guide en
80 pages
Manual 3com SuperStack II 3000 TX
No ratings yet
Manual 3com SuperStack II 3000 TX
144 pages
Human Physiology For Medical Students Blood and Body Fluids 5th Edition Magdi Sabry Full Access
100% (2)
Human Physiology For Medical Students Blood and Body Fluids 5th Edition Magdi Sabry Full Access
76 pages
Iot Media Stack Yocto Project Meta Atom Processor E3900 Series
No ratings yet
Iot Media Stack Yocto Project Meta Atom Processor E3900 Series
14 pages
CSEC IT - Formulas & Functions - L3
No ratings yet
CSEC IT - Formulas & Functions - L3
33 pages
Session1a Slides
No ratings yet
Session1a Slides
20 pages
02 Reception
No ratings yet
02 Reception
32 pages
Wa0023
No ratings yet
Wa0023
10 pages
Plantilla Caso Práctico SEO-sem
No ratings yet
Plantilla Caso Práctico SEO-sem
9 pages
Geovariances Software Licensing 2020
No ratings yet
Geovariances Software Licensing 2020
2 pages
2018 07 25 Masterthesis Pas
No ratings yet
2018 07 25 Masterthesis Pas
53 pages
Eng Scrubmaster B310R 01 (Manual)
No ratings yet
Eng Scrubmaster B310R 01 (Manual)
108 pages
604 Computer Graphics
No ratings yet
604 Computer Graphics
13 pages
Remove IP Blacklist on Spamhaus
No ratings yet
Remove IP Blacklist on Spamhaus
8 pages
Read Me
No ratings yet
Read Me
5 pages
AHB Features
No ratings yet
AHB Features
12 pages

Lec 2 PDF

Uploaded by

Lec 2 PDF

Uploaded by

BioPython

 Some of the tasks of Biopython are:

 Other files format like:

Some key notes in Python

int, float, complex

1. Integer 2. Real 3. Complex

This is a codon, isn’t it?

NameError: name 'dna' is not defined

>>> dna = input("Enter a DNA sequence, please:")

•readlines(x); read up to x bytes. If you read(x); read up to x bytes in a file. If

What is + means in open()?

with open('file.txt„, „r‟) as f: with open('file.txt', 'r+') as f:

with open('file.txt', 'r') as f:

Note: f. seek(0)  move the file pointer to begining

You might also like