0% found this document useful (0 votes)

46 views6 pages

BIO Code Report

This document discusses using Biopython to analyze a COVID-19 DNA sequence. It shows how to import modules, parse the FASTA format DNA sequence, transcribe it to mRNA, translate the mRNA to an amino acid sequence, split the sequence at stop codons to identify proteins, and use ProtParam to analyze properties of the identified proteins such as molecular weight and flexibility.

Uploaded by

Sai Sangavi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

46 views6 pages

BIO Code Report

Uploaded by

Sai Sangavi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 6

COVID2–19 DNA sequence data using python.

Major Modules Used:

Bio Python
Squiggle
Pandas

Importing Modules:

from future import division

from Bio.SeqUtils import ProtParam
import warnings
import pandas as pd
from Bio import SeqIO
from Bio.Data import CodonTable

We will use Bio.SeqIO from Biopython for parsing

DNA sequence data(fasta). It provides a simple
uniform interface to input and output assorted
sequence file formats.

for sequence in SeqIO.parse(r'Covid.fna', "fasta"):

print(sequence.seq)
print(len(sequence), 'nucliotides')

DNAsequence = SeqIO.read(r'Covid.fna', "fasta")

print(DNAsequence)
Since input sequence is FASTA (DNA), and
Coronavirus is RNA type of virus, we need to:
Transcribe DNA to RNA (ATTAAAGGTT… =>
AUUAAAGGUU…)
Translate RNA to Amino acid sequence
(AUUAAAGGUU… => IKGLYLPR*Q…)
In the current scenario, the .fna file starts with
ATTAAAGGTT, then we call transcribe() so T
(thymine) is replaced with U (uracil), so we get the
RNA sequence which starts with AUUAAAGGUU
The transcribe() method will convert the DNA to
mRNA.
DNA = DNAsequence.seq
mRNA = DNA.transcribe()
print(mRNA)
print('Size : ', len(mRNA))

The difference between the DNA and the mRNA is

just that the bases T (for Thymine) are replaced
with U (for Uracil).
Next, we are going to translate the mRNA sequence
to amino-acid sequence using translate() method,
we get something like IKGLYLPR*Q ( is so-called
STOP codon, effectively is a separator for proteins).
Amino_Acid = mRNA.translate(table=1, cds=False)
print('Amino Acid', Amino_Acid)
print("Length of Protein:", len(Amino_Acid))
print("Length of Original mRNA:", len(mRNA))

The standard genetic code is traditionally

represented as an RNA codon table because, when
proteins are made in a cell by ribosomes, it is
mRNA that directs protein synthesis. The mRNA
sequence is determined by the sequence of
genomic DNA. Here are some features of codons:
Most codons specify an amino acid
Three “stop” codons mark the end of a protein
One “start” codon, AUG, marks the beginning of a
protein and also encodes the amino acid
methionine.
A series of codons in part of a messenger RNA
(mRNA) molecule. Each codon consists of three
nucleotides, usually corresponding to a single
amino acid. The nucleotides are abbreviated with
the letters A, U, G, and C. This is mRNA, which
uses U (uracil). DNA uses T (thymine) instead. This
mRNA molecule will instruct a ribosome to
synthesize a protein according to this code. Source

print(CodonTable.unambiguous_rna_by_name['Sta
ndard'])
Now we are extracting the Proteins (chains of
amino acids), basically separating at the stop
codon, marked by * (ASTERISK). Then let’s remove
any sequence less than 20 amino acids long, as
this is the smallest known functional protein

Proteins = Amino_Acid.split('*')
df = pd.DataFrame(Proteins)
df.describe()
print('Total proteins:', len(df))
def conv(item):
return len(item)
def to_str(item):
return str(item)
df['sequence_str'] = df[0].apply(to_str)
df['length'] = df[0].apply(conv)
df.rename(columns={0: "sequence"}, inplace=True)
df.head()
functional_proteins = df.loc[df['length'] >= 20]

print('Total functional proteins:',

len(functional_proteins))

print(functional_proteins.describe())

Protein Analysis With The Protparam Module In

Biopython using ProtParam.

poi_list = []
MW_list = []

for record in Proteins[:]:

print("\n")
X = ProtParam.ProteinAnalysis(str(record))
POI = X.count_amino_acids()
poi_list.append(POI)
MW = X.molecular_weight()
MW_list.append(MW)
print("Protein of Interest = ", POI)
try:
print("Amino acids percent = ",
str(X.get_amino_acids_percent()))
except ZeroDivisionError:
pass
print("Molecular weight = ", MW)
try:
print("Aromaticity = ", X.aromaticity())
except ZeroDivisionError:
pass
print("Flexibility = ", X.flexibility())
try:
print("Secondary structure fraction = ",
X.secondary_structure_fraction())
except ZeroDivisionError:
pass

As The Above Code Produces The OutPut For All

The 775 proteins, we have attached only one of the
output screen.

MOOC Project Work - Sequence Analysis - Data Analysis With Python 2021
No ratings yet
MOOC Project Work - Sequence Analysis - Data Analysis With Python 2021
29 pages
Biopy
No ratings yet
Biopy
4 pages
Bio Python 202111
No ratings yet
Bio Python 202111
63 pages
Lec 2
No ratings yet
Lec 2
31 pages
INFO390C DNDS Pset05
No ratings yet
INFO390C DNDS Pset05
9 pages
Lab - BioInformatics - Manual Updated
No ratings yet
Lab - BioInformatics - Manual Updated
24 pages
Uniroma1 Bioinformatics pcs2 2021 Ichatz Talk10
No ratings yet
Uniroma1 Bioinformatics pcs2 2021 Ichatz Talk10
5 pages
Computational Problem For Practice
No ratings yet
Computational Problem For Practice
18 pages
Computational Biology, Part 8: Protein Coding Regions
No ratings yet
Computational Biology, Part 8: Protein Coding Regions
40 pages
solutionsExerciseMaster11 23
No ratings yet
solutionsExerciseMaster11 23
13 pages
RIP Tutorials Bioinformatics
No ratings yet
RIP Tutorials Bioinformatics
19 pages
DNA RNA Protein
No ratings yet
DNA RNA Protein
5 pages
Biopython Tutorial
100% (1)
Biopython Tutorial
26 pages
Lab 2
No ratings yet
Lab 2
7 pages
BioPython Cookbook
No ratings yet
BioPython Cookbook
310 pages
04 Functions
No ratings yet
04 Functions
16 pages
Genomic Data Preprocessing Through Different Libraries
No ratings yet
Genomic Data Preprocessing Through Different Libraries
30 pages
Lec 2 PDF
No ratings yet
Lec 2 PDF
28 pages
Biopython Tutorial and Cookbook
No ratings yet
Biopython Tutorial and Cookbook
324 pages
Bio Python Tutorial
No ratings yet
Bio Python Tutorial
331 pages
Biopython Tutorial PDF
No ratings yet
Biopython Tutorial PDF
332 pages
Bio Python
100% (1)
Bio Python
357 pages
HW 13
No ratings yet
HW 13
6 pages
1009169194
No ratings yet
1009169194
17 pages
Biopython Tutorial
No ratings yet
Biopython Tutorial
237 pages
Tutorial
No ratings yet
Tutorial
365 pages
CL662 HW3
No ratings yet
CL662 HW3
5 pages
Biopython - Tutorial and Cookbook
No ratings yet
Biopython - Tutorial and Cookbook
206 pages
Lab 6 Pseudocode
No ratings yet
Lab 6 Pseudocode
2 pages
L1 Exercises Solutions
100% (1)
L1 Exercises Solutions
15 pages
Bio Python
No ratings yet
Bio Python
374 pages
Formats
No ratings yet
Formats
7 pages
Biopython Useage With Examples
No ratings yet
Biopython Useage With Examples
2 pages
Module in Tics
No ratings yet
Module in Tics
20 pages
Tutorial
No ratings yet
Tutorial
445 pages
Asm 4
No ratings yet
Asm 4
12 pages
Biopython Guide for Bioinformaticians
No ratings yet
Biopython Guide for Bioinformaticians
79 pages
3 - Introduction (SEQU ANAL of PCR Products 9 9 12
No ratings yet
3 - Introduction (SEQU ANAL of PCR Products 9 9 12
42 pages
Bioinformatics for Biochem Students
No ratings yet
Bioinformatics for Biochem Students
6 pages
Bioinformatics Lecture Summary
No ratings yet
Bioinformatics Lecture Summary
15 pages
RNA Seq Analysis
No ratings yet
RNA Seq Analysis
53 pages
Lab 3
No ratings yet
Lab 3
2 pages
Bio in For Matics Workshop
No ratings yet
Bio in For Matics Workshop
6 pages
COVID-19 Protein Analysis with Python
No ratings yet
COVID-19 Protein Analysis with Python
23 pages
Python For Biologist
No ratings yet
Python For Biologist
24 pages
02 Handling Files
No ratings yet
02 Handling Files
18 pages
Ass 2 Bioinformatics
No ratings yet
Ass 2 Bioinformatics
8 pages
PM703 Practical Biotechnology (2019) PM703 Practical Biotechnology (2019)
No ratings yet
PM703 Practical Biotechnology (2019) PM703 Practical Biotechnology (2019)
20 pages
ExPASy 1
No ratings yet
ExPASy 1
5 pages
From Scratch: Writing Your Own Functions
No ratings yet
From Scratch: Writing Your Own Functions
15 pages
Rana
No ratings yet
Rana
53 pages
User Guide For Propy 1.0: 1.1 What Is This?
No ratings yet
User Guide For Propy 1.0: 1.1 What Is This?
11 pages
Gene Prediction Using Statistical Methods
No ratings yet
Gene Prediction Using Statistical Methods
47 pages
Biopython Org DIST Docs Tutorial Tutorial HTML
No ratings yet
Biopython Org DIST Docs Tutorial Tutorial HTML
267 pages
In-Linear-Time: Check This Web Site
No ratings yet
In-Linear-Time: Check This Web Site
4 pages
Is To Be Acquaint With Sequence Analysis Tools That Can Be Accessed Through The Internet Specifically Working The NCBI Database
No ratings yet
Is To Be Acquaint With Sequence Analysis Tools That Can Be Accessed Through The Internet Specifically Working The NCBI Database
3 pages
Anotacion de Genomas
No ratings yet
Anotacion de Genomas
84 pages
One-Step Modification of Space Integrated Surfaces (Osmosis) For Adhesive Priming
No ratings yet
One-Step Modification of Space Integrated Surfaces (Osmosis) For Adhesive Priming
1 page
PDS - Q-Cut 245C
No ratings yet
PDS - Q-Cut 245C
1 page
Arlan Neutralization Worksheet
No ratings yet
Arlan Neutralization Worksheet
3 pages
Integrated Home Work Xii PCM (LPS)
No ratings yet
Integrated Home Work Xii PCM (LPS)
26 pages
MODULE 5 Introduction To Organic Chemistry PART 1
No ratings yet
MODULE 5 Introduction To Organic Chemistry PART 1
56 pages
Alcpt 20171106160211
80% (5)
Alcpt 20171106160211
335 pages
ACD Guidelines PDF
No ratings yet
ACD Guidelines PDF
11 pages
Potassium Persulfate
No ratings yet
Potassium Persulfate
2 pages
Efficient Aluminum Melting Furnaces
No ratings yet
Efficient Aluminum Melting Furnaces
3 pages
PC-TLC Paper IJAC2023-9914633
No ratings yet
PC-TLC Paper IJAC2023-9914633
6 pages
Study On Influence of Crushed Waste Glass On Properties of Concrete
No ratings yet
Study On Influence of Crushed Waste Glass On Properties of Concrete
6 pages
Interbond 808 - Cargo Oil Tanks PDF
No ratings yet
Interbond 808 - Cargo Oil Tanks PDF
26 pages
Transition Metal
No ratings yet
Transition Metal
33 pages
What Is Membrane Filtration
No ratings yet
What Is Membrane Filtration
7 pages
Extinguishing Basic
No ratings yet
Extinguishing Basic
133 pages
Thermochemistry Quiz A KEY
No ratings yet
Thermochemistry Quiz A KEY
2 pages
Volatile Oils & Resins: Nichakan Peerakam, PH.D
No ratings yet
Volatile Oils & Resins: Nichakan Peerakam, PH.D
108 pages
Msds-Oreas (Nram)
No ratings yet
Msds-Oreas (Nram)
7 pages
Import Sample
No ratings yet
Import Sample
15 pages
Comprehensive Guide to Impact Testing
No ratings yet
Comprehensive Guide to Impact Testing
35 pages
Cascade VCRS for Cryogenics
No ratings yet
Cascade VCRS for Cryogenics
10 pages
French Sun Care Market Trends
No ratings yet
French Sun Care Market Trends
9 pages
Soil Investigation and Exploration
100% (1)
Soil Investigation and Exploration
6 pages
Chemical Engineering Absorption Design
No ratings yet
Chemical Engineering Absorption Design
7 pages
EXAM Bio F4 T1 2013.14
No ratings yet
EXAM Bio F4 T1 2013.14
14 pages
Industrial Flow Control Catalog
No ratings yet
Industrial Flow Control Catalog
1 page
(Ebook) Eurekas and Euphorias: The Oxford Book of Scientific Anecdotes by Gratzer, W. B. (Walter Bruno), 1932-ISBN 9780198609407, 019860940X
100% (7)
(Ebook) Eurekas and Euphorias: The Oxford Book of Scientific Anecdotes by Gratzer, W. B. (Walter Bruno), 1932-ISBN 9780198609407, 019860940X
76 pages
Greenpeace Report
No ratings yet
Greenpeace Report
17 pages
Midi Service Manual
50% (2)
Midi Service Manual
203 pages
Nickel-Aluminum Bronze Specs
No ratings yet
Nickel-Aluminum Bronze Specs
2 pages

BIO Code Report

Uploaded by

BIO Code Report

Uploaded by

COVID2–19 DNA sequence data using python.

Major Modules Used:

from __future__ import division

We will use Bio.SeqIO from Biopython for parsing

for sequence in SeqIO.parse(r'Covid.fna', "fasta"):

DNAsequence = SeqIO.read(r'Covid.fna', "fasta")

The difference between the DNA and the mRNA is

The standard genetic code is traditionally

print('Total functional proteins:',

Protein Analysis With The Protparam Module In

for record in Proteins[:]:

As The Above Code Produces The OutPut For All

You might also like

from future import division