Perl
tutorial	
 
	
 
Working	
 with	
 DNA	
 Sequences	
 
#!/usr/bin/perl -w
# Storing DNA in a variable, and printing it out
# First we store the DNA in a variable called $DNA
$DNA = 'ACGGGAGGACGGGAAAATTACTACGGCATTAGC';
# Next, we print the DNA onto the screen
print $DNA;
# Finally, we'll specifically tell the program to exit.
exit;
Concatenating	
 the	
 DNA	
 sequences	
 
#!/usr/bin/perl -w
# Concatenating DNA
# Store two DNA fragments into variables called $DNA1
#and $DNA2
$DNA1 = 'ACGGGAGGACGGGAAAATTACTACGGCATTAGC';
$DNA2 = 'ATAGTGCCGTGAGAGTGATGTAGTA';
# Print the DNA onto the screen
print "Here are the original two DNA fragments:\n\n";
print $DNA1, "\n";
print $DNA2, "\n\n";
# Concatenate the DNA fragments into a third variable and
#print them Using "string interpolation"
$DNA3 = "$DNA1$DNA2";
print "Here is the new DNA of the two fragments
version 1):\n\n";
print "$DNA3\n\n";
# An alternative way using the "dot operator":
# Concatenate the DNA fragments into a third variable and
# print them
$DNA3 = $DNA1 . $DNA2;
print "Here is the concatenation of the first two fragments
(version 2):\n\n";
print "$DNA3\n\n";
# Print the same thing without using the variable $DNA3
print "Here is the concatenation of the first two fragments
(version 3):\n\n";
print $DNA1, $DNA2, "\n";
exit;	
 
TRANSCRIPTION:	
 DNA	
 ->	
 RNA	
 
#!/usr/bin/perl -w
# Transcribing DNA into RNA
# The DNA
$DNA = 'ACGGGAGGACGGGAAAATTACTACGGCATTAGC';
# Print the DNA onto the screen
print "Here is the starting DNA:\n\n";
print "$DNA\n\n";
# Transcribe the DNA to RNA by substituting all T's with U's.
$RNA = $DNA;
$RNA =~ s/T/U/g;
# Print the RNA onto the screen
print "Here is the result of transcribing the DNA to
RNA:\n\n";
print "$RNA\n";
# Exit the program.
exit;	
 
Reverse	
 Complement	
 
#!/usr/bin/perl -w
# Calculating the reverse complement of a strand of DNA
# The DNA
$DNA = 'ACGGGAGGACGGGAAAATTACTACGGCATTAGC';
# Print the DNA onto the screen
print "Here is the starting DNA:\n\n";
print "$DNA\n\n";
# Calculate the reverse complement
# First, copy the DNA into new variable $revcom
# (short for REVerse COMplement)
#
# It doesn't matter if we first reverse the string and then
# do the complementation; or if we first do the
complementation
# and then reverse the string. Same result each time.
# So when we make the copy we'll do the reverse in the same
statement.
$revcom = reverse $DNA;
-----
The DNA is now reversed.. we neeed to complement the bases
in revcom - substitute all bases by their complements.
# A->T, T->A, G->C, C->G
####Attempt 1:
$revcom =~ s/A/T/g;
$revcom =~ s/T/A/g;
$revcom =~ s/G/C/g;
$revcom =~ s/C/G/g;
# Print the reverse complement DNA onto the screen
print "Here is the reverse complement DNA:\n\n";
print "$revcom\n";
#################
Does	
 this	
 work??	
 Why?	
 
	
 	
 	
 # See the text for a discussion of tr///
$revcom =~ tr/ACGTacgt/TGCAtgca/;
# Print the reverse complement DNA onto the screen
print "Here is the reverse complement DNA:\n\n";
print "$revcom\n";
print "\nThis time it worked!\n\n";
exit;	
 	
 	
 	
 
Reading	
 Proteins	
 in	
 files	
 	
 
#!/usr/bin/perl -w
# Reading protein sequence data from a file
# The filename of the file containing the protein sequence
data
$proteinfilename = 'Name_Of_your_sequence_file.txt';
# First we have to "open" the file, and associate
# a "filehandle" with it. We choose the filehandle
# PROTEINFILE for readability.
open(PROTEINFILE, $proteinfilename) || Die ("cannot open
file");
# Now we do the actual reading of the protein sequence data
from the file, by using the angle brackets < and > to get
the input from the filehandle. We store the data into our
variable $protein.
@protein = <PROTEINFILE>;
# Now that we've got our data, we can close the file.
close PROTEINFILE;
# Print the protein onto the screen
print "Here is the protein:\n\n";
print @protein;
exit;
Pattern	
 matching:	
 Motifs	
 and	
 Loops	
 
Proceed	
 ONLY	
 if	
 condition	
 is	
 true...	
 
code layout..
if (condition)
           do something
Finding Motifs
#!/usr/bin/perl -w
# if-elsif-else
$word = 'MNIDDKL';
# if-elsif-else conditionals
if($word eq 'QSTVSGE') {
         print "QSTVSGE\n";
} elsif($word eq 'MRQQDMISHDEL') {
print "MRQQDMISHDEL\n";
}	
 
GC	
 CONTENT	
 
In	
 PCR	
 experiments,	
 the	
 GC-content	
 of	
 primers	
 are	
 used	
 to	
 predict	
 their	
 annealing	
 temperature	
 
to	
 the	
 template	
 DNA.	
 A	
 higher	
 GC-content	
 level	
 indicates	
 a	
 higher	
 melting	
 temperature.	
 
GC	
 %	
 =	
  	
                         	
  	
  	
  G	
 +	
 C	
  	
  	
  x100	
 
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  A+G+C+T	
 
Logical:	
 
for each base in the DNA
if base is A
count_of_A = count_of_A + 1
if base is C
count_of_C = count_of_C + 1
if base is G
count_of_G = count_of_G + 1
if base is T
count_of_T = count_of_T + 1
done
print count_of_A, count_of_C, count_of_G, count_of_T
	
 the	
 script	
 
#!/usr/bin/perl -w
# Determining frequency of nucleotides
# Get the name of the file with the DNA sequence data
$dna_filename = File_name.txt;
# Remove the newline from the DNA filename
chomp $dna_filename;
# open the file, or exit
open(DNAFILE, $dna_filename) || die ("Cannot open file
\"$dna_filename\");
exit;
}
# Read the DNA sequence data from the file, and store it
# into the array variable @DNA
@DNA = <DNAFILE>;
# Close the file
close DNAFILE;
# From the lines of the DNA file,
# put the DNA sequence data into a single string.
$DNA = join( '', @DNA);
# Remove whitespace
$DNA =~ s/\s//g;
# Now explode the DNA into an array where each letter of
# the original string is now an element in the array.
# This will make it easy to look at each position.
# Notice that we're reusing the variable @DNA for this
purpose.
@DNA = split( '', $DNA );
# Initialize the counts.
# Notice that we can use scalar variables to hold numbers.
$count_of_A = 0;
$count_of_C = 0;
$count_of_G = 0;
$count_of_T = 0;
$errors = 0;
# In a loop, look at each base in turn, determine which of
# the four types of nucleotides it is, and increment the
# appropriate count.
foreach $base (@DNA)
{
    if ( $base eq 'A' ) {
    ++$count_of_A;
    }
    elsif ( $base eq 'C' ) {
    ++$count_of_C;
    }
    elsif ( $base eq 'G' ) {
    ++$count_of_G;
    }
    elsif ( $base eq 'T' ) {
    ++$count_of_T;
    }
else {
    print "!!!!!!!! Error - I don\'t recognize this
base: $base\n";
++$errors;
}
}
# print the results
print "A = $count_of_A\n";
print "C = $count_of_C\n";
print "G = $count_of_G\n";
print "T = $count_of_T\n";
print "errors = $errors\n";
# exit the program
exit;	
 	
 
---using	
 regex	
 ---	
 
while($DNA =~ /a/ig){$a++}
while($DNA =~ /c/ig){$c++}
while($DNA =~ /g/ig){$g++}
while($DNA =~ /t/ig){$t++}
while($DNA =~ /[^acgt]/ig){$e++}
print "A=$a C=$c G=$g T=$t errors=$e\n";	
 
	
 
----	
 
Next	
 is	
 a	
 new	
 kind	
 of	
 loop,	
 the	
 foreach loop.	
 This	
 loop	
 works	
 over	
 the	
 elements	
 
of	
 an	
 
array.	
 The	
 line:	
 
             foreach $base (@DNA)	
 
	
 
Wrtiting	
 to	
 files	
 
# Also write the results to a file called "countbase"
$outputfile = "countbase";
(
unless
open(COUNTBASE, ">$outputfile") || die ("Cannot open file
\"$outputfile\" to write to!!\n\n");
print COUNTBASE "A=$a C=$c G=$g T=$t errors=$e\n";
close(COUNTBASE);