0% found this document useful (0 votes)

226 views10 pages

Perl Tutorial

This document contains a Perl tutorial covering various tasks for working with DNA/RNA sequences including: storing sequences in variables; concatenating sequences; transcribing DNA to RNA; calculating the reverse complement of a sequence; reading protein sequences from files; determining nucleotide frequencies using regular expressions and loops; and writing results to files. The tutorial provides code examples for each task and discusses concepts like using variables, file I/O, pattern matching, and conditional logic.

Uploaded by

Jessica Mitchell

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

226 views10 pages

Perl Tutorial

Uploaded by

Jessica Mitchell

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Perl

tutorial

Working with DNA Sequences

#!/usr/bin/perl -w
# Storing DNA in a variable, and printing it out
# First we store the DNA in a variable called $DNA

$DNA = 'ACGGGAGGACGGGAAAATTACTACGGCATTAGC';

# Next, we print the DNA onto the screen

print $DNA;

# Finally, we'll specifically tell the program to exit.

exit;

Concatenating the DNA sequences

#!/usr/bin/perl -w
# Concatenating DNA
# Store two DNA fragments into variables called $DNA1
#and $DNA2

$DNA1 = 'ACGGGAGGACGGGAAAATTACTACGGCATTAGC';
$DNA2 = 'ATAGTGCCGTGAGAGTGATGTAGTA';

# Print the DNA onto the screen

print "Here are the original two DNA fragments:\n\n";

print $DNA1, "\n";
print $DNA2, "\n\n";

# Concatenate the DNA fragments into a third variable and

#print them Using "string interpolation"
$DNA3 = "$DNA1$DNA2";
print "Here is the new DNA of the two fragments

version 1):\n\n";
print "$DNA3\n\n";

# An alternative way using the "dot operator":

# Concatenate the DNA fragments into a third variable and
# print them

$DNA3 = $DNA1 . $DNA2;

print "Here is the concatenation of the first two fragments
(version 2):\n\n";
print "$DNA3\n\n";

# Print the same thing without using the variable $DNA3

print "Here is the concatenation of the first two fragments

(version 3):\n\n";
print $DNA1, $DNA2, "\n";
exit;

TRANSCRIPTION: DNA -> RNA

#!/usr/bin/perl -w

# Transcribing DNA into RNA

# The DNA

$DNA = 'ACGGGAGGACGGGAAAATTACTACGGCATTAGC';

# Print the DNA onto the screen

print "Here is the starting DNA:\n\n";
print "$DNA\n\n";

# Transcribe the DNA to RNA by substituting all T's with U's.

$RNA = $DNA;
$RNA =~ s/T/U/g;
# Print the RNA onto the screen
print "Here is the result of transcribing the DNA to
RNA:\n\n";
print "$RNA\n";

# Exit the program.

exit;

Reverse Complement

#!/usr/bin/perl -w
# Calculating the reverse complement of a strand of DNA

# The DNA
$DNA = 'ACGGGAGGACGGGAAAATTACTACGGCATTAGC';

# Print the DNA onto the screen

print "Here is the starting DNA:\n\n";
print "$DNA\n\n";

# Calculate the reverse complement

# First, copy the DNA into new variable $revcom

# (short for REVerse COMplement)
#
# It doesn't matter if we first reverse the string and then
# do the complementation; or if we first do the
complementation
# and then reverse the string. Same result each time.
# So when we make the copy we'll do the reverse in the same
statement.

$revcom = reverse $DNA;

-----
The DNA is now reversed.. we neeed to complement the bases
in revcom - substitute all bases by their complements.
# A->T, T->A, G->C, C->G
####Attempt 1:

$revcom =~ s/A/T/g;
$revcom =~ s/T/A/g;
$revcom =~ s/G/C/g;
$revcom =~ s/C/G/g;
# Print the reverse complement DNA onto the screen
print "Here is the reverse complement DNA:\n\n";
print "$revcom\n";

#################

Does this work?? Why?

# See the text for a discussion of tr///
$revcom =~ tr/ACGTacgt/TGCAtgca/;

# Print the reverse complement DNA onto the screen

print "Here is the reverse complement DNA:\n\n";
print "$revcom\n";
print "\nThis time it worked!\n\n";
exit;

Reading Proteins in files

#!/usr/bin/perl -w
# Reading protein sequence data from a file
# The filename of the file containing the protein sequence
data

$proteinfilename = 'Name_Of_your_sequence_file.txt';

# First we have to "open" the file, and associate

# a "filehandle" with it. We choose the filehandle
# PROTEINFILE for readability.
open(PROTEINFILE, $proteinfilename) || Die ("cannot open
file");

# Now we do the actual reading of the protein sequence data

from the file, by using the angle brackets < and > to get
the input from the filehandle. We store the data into our
variable $protein.

@protein = <PROTEINFILE>;

# Now that we've got our data, we can close the file.

close PROTEINFILE;

# Print the protein onto the screen

print "Here is the protein:\n\n";
print @protein;
exit;

Pattern matching: Motifs and Loops

Proceed ONLY if condition is true...

code layout..
if (condition)

do something

Finding Motifs
#!/usr/bin/perl -w
# if-elsif-else

$word = 'MNIDDKL';

# if-elsif-else conditionals

if($word eq 'QSTVSGE') {
print "QSTVSGE\n";
} elsif($word eq 'MRQQDMISHDEL') {
print "MRQQDMISHDEL\n";
}

GC CONTENT

In PCR experiments, the GC-content of primers are used to predict their annealing temperature
to the template DNA. A higher GC-content level indicates a higher melting temperature.

GC % = G + C x100

A+G+C+T

Logical:

for each base in the DNA

if base is A
count_of_A = count_of_A + 1

if base is C
count_of_C = count_of_C + 1
if base is G
count_of_G = count_of_G + 1

if base is T
count_of_T = count_of_T + 1

done

print count_of_A, count_of_C, count_of_G, count_of_T

the script

#!/usr/bin/perl -w
# Determining frequency of nucleotides
# Get the name of the file with the DNA sequence data

$dna_filename = File_name.txt;

# Remove the newline from the DNA filename

chomp $dna_filename;

# open the file, or exit

open(DNAFILE, $dna_filename) || die ("Cannot open file

\"$dna_filename\");
exit;
}

# Read the DNA sequence data from the file, and store it
# into the array variable @DNA
@DNA = <DNAFILE>;
# Close the file
close DNAFILE;

# From the lines of the DNA file,

# put the DNA sequence data into a single string.
$DNA = join( '', @DNA);
# Remove whitespace
$DNA =~ s/\s//g;

# Now explode the DNA into an array where each letter of

# the original string is now an element in the array.
# This will make it easy to look at each position.
# Notice that we're reusing the variable @DNA for this
purpose.
@DNA = split( '', $DNA );

# Initialize the counts.

# Notice that we can use scalar variables to hold numbers.
$count_of_A = 0;
$count_of_C = 0;
$count_of_G = 0;
$count_of_T = 0;
$errors = 0;

# In a loop, look at each base in turn, determine which of

# the four types of nucleotides it is, and increment the
# appropriate count.

foreach $base (@DNA)

{
if ( $base eq 'A' ) {
++$count_of_A;
}
elsif ( $base eq 'C' ) {
++$count_of_C;
}
elsif ( $base eq 'G' ) {
++$count_of_G;
}
elsif ( $base eq 'T' ) {
++$count_of_T;
}
else {
print "!!!!!!!! Error - I don\'t recognize this
base: $base\n";
++$errors;
}
}

# print the results

print "A = $count_of_A\n";
print "C = $count_of_C\n";
print "G = $count_of_G\n";
print "T = $count_of_T\n";
print "errors = $errors\n";
# exit the program
exit;

---using regex ---

while($DNA =~ /a/ig){$a++}
while($DNA =~ /c/ig){$c++}
while($DNA =~ /g/ig){$g++}
while($DNA =~ /t/ig){$t++}
while($DNA =~ /[^acgt]/ig){$e++}
print "A=$a C=$c G=$g T=$t errors=$e\n";

----

Next is a new kind of loop, the foreach loop. This loop works over the elements
of an
array. The line:
foreach $base (@DNA)

Wrtiting to files

# Also write the results to a file called "countbase"

$outputfile = "countbase";
(
unless
open(COUNTBASE, ">$outputfile") || die ("Cannot open file
\"$outputfile\" to write to!!\n\n");

print COUNTBASE "A=$a C=$c G=$g T=$t errors=$e\n";

close(COUNTBASE);

Perl Bioinf 0411 PDF
No ratings yet
Perl Bioinf 0411 PDF
69 pages
Perl Programming Exercises 1 - 'A B C'
No ratings yet
Perl Programming Exercises 1 - 'A B C'
29 pages
Pattern Matching With Regular Expressions: Perl For Biologists
No ratings yet
Pattern Matching With Regular Expressions: Perl For Biologists
11 pages
Lab Assignments
100% (1)
Lab Assignments
4 pages
Lab Manual Bioinformatics Laboratory (Bt2308) V Semester B.Tech Degree Programme Department of Biotechnology
No ratings yet
Lab Manual Bioinformatics Laboratory (Bt2308) V Semester B.Tech Degree Programme Department of Biotechnology
28 pages
Bioinformatics with Perl
No ratings yet
Bioinformatics with Perl
19 pages
Bioinformatics with Perl
No ratings yet
Bioinformatics with Perl
49 pages
Perl Program
No ratings yet
Perl Program
38 pages
Introduction To Perl: Part 1
No ratings yet
Introduction To Perl: Part 1
11 pages
HW 13
No ratings yet
HW 13
6 pages
Bioperl: Perl Modules for Life Sciences
No ratings yet
Bioperl: Perl Modules for Life Sciences
47 pages
Bio-Perl: S B Mirza 1314 Bioinformatics 7 Semester (A.n)
No ratings yet
Bio-Perl: S B Mirza 1314 Bioinformatics 7 Semester (A.n)
13 pages
B Perl: Submitted To:S .N
No ratings yet
B Perl: Submitted To:S .N
8 pages
Assignment - Idc306
No ratings yet
Assignment - Idc306
6 pages
Primr Design
No ratings yet
Primr Design
57 pages
Perl Exercises
No ratings yet
Perl Exercises
14 pages
Perl & BioPerl for Programmers
No ratings yet
Perl & BioPerl for Programmers
103 pages
Manual de Ejercicios de Python
No ratings yet
Manual de Ejercicios de Python
1 page
Bioinformatics File Formats Guide
No ratings yet
Bioinformatics File Formats Guide
22 pages
An Introduction To Perl PDF
No ratings yet
An Introduction To Perl PDF
25 pages
Perl Refcard
No ratings yet
Perl Refcard
2 pages
Linux Tutorial
No ratings yet
Linux Tutorial
3 pages
PERL Bioinformatics Course Guide
No ratings yet
PERL Bioinformatics Course Guide
2 pages
IBS Basic Problems
No ratings yet
IBS Basic Problems
10 pages
Perl Scripts for Beginners
No ratings yet
Perl Scripts for Beginners
3 pages
Beginning Perl For Bioinformatics-RVS
No ratings yet
Beginning Perl For Bioinformatics-RVS
49 pages
Perl 240529 094027
No ratings yet
Perl 240529 094027
3 pages
Linux Bootcamp Exercises
No ratings yet
Linux Bootcamp Exercises
9 pages
Perl Doc
No ratings yet
Perl Doc
13 pages
02 Handling Files
No ratings yet
02 Handling Files
18 pages
Afpjawprwa'tj 3
No ratings yet
Afpjawprwa'tj 3
6 pages
Bioinformatics Data Skills (PDFDrive)
No ratings yet
Bioinformatics Data Skills (PDFDrive)
30 pages
02 Sequence Alignment
No ratings yet
02 Sequence Alignment
43 pages
Scripting Through PERL
No ratings yet
Scripting Through PERL
22 pages
Ch03 Molecular Biology Primer Part2
No ratings yet
Ch03 Molecular Biology Primer Part2
119 pages
Perl Tutorial 08
No ratings yet
Perl Tutorial 08
54 pages
Computer Manipulation of DNA and Protein Sequences
No ratings yet
Computer Manipulation of DNA and Protein Sequences
23 pages
Linux Examples Exercises
No ratings yet
Linux Examples Exercises
7 pages
Perl Reference Card #2
No ratings yet
Perl Reference Card #2
3 pages
Perl Tutorial: Based On A Tutorial by Nano Gough
No ratings yet
Perl Tutorial: Based On A Tutorial by Nano Gough
24 pages
Arhqh 32 Po 9 Lknan 2
No ratings yet
Arhqh 32 Po 9 Lknan 2
6 pages
Computational Problem For Practice
No ratings yet
Computational Problem For Practice
18 pages
Biopython Tutorial PDF
No ratings yet
Biopython Tutorial PDF
332 pages
Biopython Tutorial and Cookbook
No ratings yet
Biopython Tutorial and Cookbook
324 pages
Bio Python Tutorial
No ratings yet
Bio Python Tutorial
331 pages
Bio Python
100% (1)
Bio Python
357 pages
Unix Shell Scripting
No ratings yet
Unix Shell Scripting
6 pages
Bioinfomatics
No ratings yet
Bioinfomatics
21 pages
BioPython Cookbook
No ratings yet
BioPython Cookbook
310 pages
Perl Short-Cut For Variable - Scalar Would Be Scalar Short-Cut Names Have The Least
No ratings yet
Perl Short-Cut For Variable - Scalar Would Be Scalar Short-Cut Names Have The Least
23 pages
Web Technologies
No ratings yet
Web Technologies
12 pages
Practical 6 Com
No ratings yet
Practical 6 Com
5 pages
Lecture 01
No ratings yet
Lecture 01
20 pages
Pract 6
No ratings yet
Pract 6
5 pages
Instant Download The Threat Intelligence Handbook Second Edition Zane Pokorny PDF All Chapter
100% (2)
Instant Download The Threat Intelligence Handbook Second Edition Zane Pokorny PDF All Chapter
49 pages
3.2 Least Square and Polynomial Regression
No ratings yet
3.2 Least Square and Polynomial Regression
39 pages
Design and Implementation of A Virtual Classroom System
No ratings yet
Design and Implementation of A Virtual Classroom System
6 pages
RTO & Heat Exchanger Control Diagram
No ratings yet
RTO & Heat Exchanger Control Diagram
1 page
Table of Compliance Comsys
No ratings yet
Table of Compliance Comsys
2 pages
Memory Management Policies: Unix: The Design of The Unix Operating System Maurice J. Bach Prentice Hall
No ratings yet
Memory Management Policies: Unix: The Design of The Unix Operating System Maurice J. Bach Prentice Hall
22 pages
Cayenne Electrical System 04
100% (1)
Cayenne Electrical System 04
21 pages
SAP MM End To End S - 4 HANA Free Learning Document
100% (3)
SAP MM End To End S - 4 HANA Free Learning Document
191 pages
Tech Specs for Lenovo Laptop
No ratings yet
Tech Specs for Lenovo Laptop
34 pages
Inductive Proximity Sensors v8 t3 Ca08100010e
No ratings yet
Inductive Proximity Sensors v8 t3 Ca08100010e
102 pages
Module 4 - Social Media Advertising
100% (1)
Module 4 - Social Media Advertising
45 pages
General AC Drive Overview
No ratings yet
General AC Drive Overview
52 pages
20kva Ups Manual
67% (9)
20kva Ups Manual
44 pages
Power Transformer Monitoring System
No ratings yet
Power Transformer Monitoring System
4 pages
Validate Card Visa
No ratings yet
Validate Card Visa
458 pages
Call Center Assessment
0% (1)
Call Center Assessment
4 pages
FortiGate HA & SD-WAN Setup Guide
No ratings yet
FortiGate HA & SD-WAN Setup Guide
7 pages
Black Book
No ratings yet
Black Book
85 pages
Lab 5 Zener
No ratings yet
Lab 5 Zener
7 pages
Detecting Eggs Condition by Using Pixy Camera Based On Shell-Color Filtering
No ratings yet
Detecting Eggs Condition by Using Pixy Camera Based On Shell-Color Filtering
4 pages
15 System Security ICT Ethical Issues and Emerging Technologies
No ratings yet
15 System Security ICT Ethical Issues and Emerging Technologies
120 pages
Sensors 20 00726 With Cover
No ratings yet
Sensors 20 00726 With Cover
26 pages
Data Sheet - Hmi 4.3 Inch
No ratings yet
Data Sheet - Hmi 4.3 Inch
6 pages
Xootr Assembly Process Optimization
0% (1)
Xootr Assembly Process Optimization
2 pages
Federated Learning Security Survey
100% (1)
Federated Learning Security Survey
26 pages
08 Simple Template For Company Profile in Word
No ratings yet
08 Simple Template For Company Profile in Word
2 pages
Ict Assignment No. 2: Expansion Cards & Slots, Ports
No ratings yet
Ict Assignment No. 2: Expansion Cards & Slots, Ports
18 pages
Combating Cybercrime in Bangladesh: National and International Legal Frameworks
No ratings yet
Combating Cybercrime in Bangladesh: National and International Legal Frameworks
17 pages
Implementation of Booth's Algorithm
No ratings yet
Implementation of Booth's Algorithm
6 pages
The Easiest Way To Find The Log of Any Number Without
No ratings yet
The Easiest Way To Find The Log of Any Number Without
3 pages

Perl Tutorial

Uploaded by

Perl Tutorial

Uploaded by

Perl

Working with DNA Sequences

# Next, we print the DNA onto the screen

# Finally, we'll specifically tell the program to exit.

Concatenating the DNA sequences

# Print the DNA onto the screen

print "Here are the original two DNA fragments:\n\n";

# Concatenate the DNA fragments into a third variable and

# An alternative way using the "dot operator":

$DNA3 = $DNA1 . $DNA2;

# Print the same thing without using the variable $DNA3

print "Here is the concatenation of the first two fragments

TRANSCRIPTION: DNA -> RNA

# Transcribing DNA into RNA

# Print the DNA onto the screen

# Transcribe the DNA to RNA by substituting all T's with U's.

# Exit the program.

# Print the DNA onto the screen

# Calculate the reverse complement

# First, copy the DNA into new variable $revcom

$revcom = reverse $DNA;

Does this work?? Why?

# Print the reverse complement DNA onto the screen

# First we have to "open" the file, and associate

# Now we do the actual reading of the protein sequence data

# Print the protein onto the screen

Pattern matching: Motifs and Loops

Proceed ONLY if condition is true...

for each base in the DNA

print count_of_A, count_of_C, count_of_G, count_of_T

# Remove the newline from the DNA filename

# open the file, or exit

open(DNAFILE, $dna_filename) || die ("Cannot open file

# From the lines of the DNA file,

# Now explode the DNA into an array where each letter of

# Initialize the counts.

# In a loop, look at each base in turn, determine which of

foreach $base (@DNA)

# print the results

---using regex ---

# Also write the results to a file called "countbase"

print COUNTBASE "A=$a C=$c G=$g T=$t errors=$e\n";

You might also like