0% found this document useful (0 votes)

93 views21 pages

Bioinfomatics

This document discusses file management in the GCG bioinformatics package. It covers GCG sequence formats, using database sequences, editing sequences, list files, and converting between formats. The key points are how to retrieve, work with, and organize multiple sequences using GCG utilities and commands.

Uploaded by

NithinArvind

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

93 views21 pages

Bioinfomatics

Uploaded by

NithinArvind

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

Introduction to

Bioinformatics
A Theoretical and Practical Approach
Edited by
Stephen A. Krawetz
David D. Womble

Includes
CD-ROM
GCG File Management — 309

18 GCG File Management

Sittichoke Saisanit

Introduction
Users are most likely to encounter the Wisconsin Package (GCG) via the web
interface as SeqWeb. As the name implies, SeqWeb is a web interface product that
allows access to many programs in the GCG package. However, there are still a
number of advantages for using GCG on the UNIX command line interface. First of
all, the command-line interface is more amenable to batch processing of large
datasets. Secondly, the command-line interface allows access to all programs not
just the web interface subset. The use of GCG under the UNIX command line is
presented in this chapter.

UNIX Commands and Overview

Familiarity with UNIXas presented in Chapter 13 and Appendix 3 is recommended
prior to studying this chapter. Here are a few commands and rules in UNIX that can
help you get started. Unlike DOS and VMS, UNIX is case-sensitive. For example, a
file name mygene.seq is different from Mygene.seq or any other mixed case combina-
tions. The man command is short for manual; it is equivalent to help in other operating
systems and programs. For example, to find out how a certain UNIX command can be
used, type man and the command of interest, then hit Enter at the command prompt
%. For example: % man cd.
Manual pages for the command cd will be displayed. The cd command is used to
change directory from one location to another. For example: % cd /usr/home /usr/
common/myproject. This command changed the current working directory from /
usr/home to /usr/common/myproject.
What if the command itself is not known? One powerful feature of the man pages
is the ability to include a modifier -k to use a keyword feature for finding a com-
mand. For example, to find a command to delete a file, enter % man -k remove.
The command will list titles of man pages that contain the word remove in them.
Introduction to UNIX is covered in Chapter 13 which includes descriptions of many
UNIX commands.
In order to use GCG effectively with the command line interface, it is important
to learn how to manipulate files and directories. This is fundamental to any operat-
ing system.
309
3 1 0 — Saisanit

Using Database Sequences and Sequence Formats

The GCG package needs to be initialized by sourcing two scripts. This can be auto-
mated at the user log-in as the .cshrc file.
There are several formats of sequence data. Users may be familiar with GenBank
or EMBL format. GCG has its own format. It also provides several program utilities to
convert sequences from one format to another. The GCG format has one notable sig-
nature, i.e., 2 dots (..) to separate annotation and the sequence itself. The annotation
proceeds the 2 dots followed by the sequence. In most cases, users should not worry
about the GCG sequence format. All one needs to do is learn how to retrieve or specify
a sequence from these databases when executing a GCG program. The GCG conven-
tion for specifying a sequence is database:accession or database:locus_name. The
database, is the GCG logical name for a database. These names have been set by the
GCG administrator. For example, it is customary to set gb for a logical name of
GenBank database. To find out whether or not a local installation of GCG has gb as
one of the logical names, issue the following command: % name gb.
To list the logical names, issue the command name without any database name.
Example: % name
To retrieve a sequence, the GCG Fetch command can be used.
Example: % fetch gb:m97796
Assuming that the GenBank database is installed locally and gb is set as its logical
name, the above command will retrieve a GenBank sequence which has an accession
number of M97796. The result will be written to a file with a default name unless a
name is specifically given to the program.
The Fetch program can also work without database name specification.
Example: % fetch m97796
However, if the accession number appears in more than one database, Fetch will
retrieve all of the sequence records. To ensure uniqueness and speed of retrieval, it is
best to use Fetch with full specification of the database name and sequence accession
number.
On the occasion that a GCG sequence is created or modified by a text editor and the
checksum has been altered, GCG programs will not recognize this sequence.
Users need to run a utility called reformat (shown below) to correct the checksum.
Example: % reformat myseq.seq
Another useful file format in GCG is the Rich Sequence Format (RSF). In SeqLab,
which is a graphical user interface for GCG run under X Windows, RSF is particularly
useful because sequence annotations such as domains and phosphorylation sites can
be displayed for visualization. SeqLab can only be run from a UNIX workstation or an
X Windows emulation program. There are two modes of working when inside SeqLab:
main list and editor. The graphical sequence viewer is available in SeqLab editor
mode. The reformat command can be used to convert a GCG sequence into an RSF
format sequence by including the -RSF parameter in the command line: % reformat
-rsf myseq.seq.
It is recommended that users name sequence files consistently. By default, GCG
does not require consistent naming and UNIX does not insist on file types. In con-
trast, DOS and Windows usually require a 3-letter file type extension for files to be
GCG File Management — 311

recognized correctly by the programs. However, after accumulating files it will be

difficult to recognize older files. Users should make a habit of naming sequence
files with meaningful names and consistent name extensions. Appending .dna or
.seq extension for nucleic acid sequence files, or .pep or .pro extension for protein
sequence files will help in recognizing these files. In addition, storing sequence files
in specific directories for each project is generally a good idea. Once a certain project
has been completed, the entire directory can be archived or removed.
Another file format users may encounter is the FASTA format. Most public
sequence utilities on the web can accept or produce sequences in FASTA format. GCG
has a utility to convert GCG sequences into FASTA format sequences and vice versa.
This is useful because it allows one to use other available tools and external sequences.
Tofasta converts a GCG sequence into a FASTA format sequence.
Example: % tofasta gb:m97796
FromFasta converts a FASTA sequence into a GCG format sequence.
Example: % fromfasta pubseq.fasta

Editing GCG Formatted Sequences

The need to edit sequences may come from users’ own sequencing efforts. Addi-
tionally, users may want to track recombinant sequences such as products of mutagen-
esis. SeqEd is a utility to edit sequences in a much more efficient manner than a text
editor. SeqEd has another advantage in that the edited sequence will be recognized by
other GCG programs without the need to run the reformat command. Annotations to
specific residues can also be placed within an edited sequence. SeqEd can be started
by entering: % seqed myseq.seq.
Once inside a SeqEd editor, use Control-D to enter the editor command. Enter-
ing help in the editor brings up a list of commands that can be used inside the editor.

List File
When working with a family of gene or protein sequences, there is often a need to
simultaneously manage a number of sequences. GCG provides a powerful function
called a list file. A list file is simply a text file that contains a list of individual
sequences beginning with 2 dots (..) and separated by new lines. The GCG programs
ignore any text before the 2 dots and any text after an exclamation mark (!). Therefore,
comments or descriptions of sequences can be added. An example of a list file:
Sequences of mammalian EGF receptors and related family members.
..
sw:EGFR_HUMAN
sw:EGFR_MOUSE
/usr/home/newdata/myseq.pep ! unpublished EGFR-related sequence
As shown in this example, a list file can contain either database sequences or
local user sequences or both. A list file is accessed by preceding the file name with
the @ symbol. For example, to retrieve all sequences in the list file named egfr.list
to the current working directory, use the GCG Fetch command:
% fetch @egfr.list.
3 1 2 — Saisanit

In addition to making a list file, multiple sequences can be aligned and written to a
single Multiple Sequence Format (MSF) file. Several GCG programs can output files
in an MSF format. For example, the reformat program with the -MSF parameter can
be used to convert a group of sequences from a list file into an MSF formatted file.
Example: % reformat -msf @egfr.list
However, reformat does not align the input sequences. The file resulting from
reformat can be named egfr.msf, for example. This MSF file can then be used as input
for other GCG programs. One or a subset or all of the sequences in an MSF file can be
used. To specify a single sequence from an MSF file, type the MSF file name followed
by the sequence name in curly brackets, for example egfr.msf{egfr1}. To specify
multiple sequences, an asterisk wildcard character must be used. For example,
egfr.msf{egfr*} specifies sequences in the egfr.msf file with sequence names begin-
ning with egfr. Similiary, egfr.msf{*} indicates that all sequences in the MSF file will
be used. Note, plain file name specification is not sufficient to specify sequences from
MSF files. Either a sequence name or wildcard in the curly brackets must be used with
the file name.
SeqLab, the X Windows interface for GCG, can also output MSF files from a list of
sequences. GCG command-line programs that can output MSF files are listed below.
Programs that require -MSF parameter are listed accordingly.
• LineUp -MSF
• PileUp
• PrettyBox
• ProfileGap -MSF
• ProfileSegments -MSF
• Reformat -MSF
Below are two examples of how to use MSF files in a program without (PileUp)
and with (LineUp) “-MSF” option requirement.
Example: % pileup egfr.msf{*}
% lineup -msf egfr.msf

Graphic Files
Several GCG programs have an option to generate output in a graphic format. In
order to use the graphic feature, a graphical language and a graphic device must be
defined. The command ShowPlot displays the current graphic device while the com-
mand SetPlot changes it. After setting the graphic device, the command PlotTest can
generate a test graphic output. It is a quick and easy way to determine whether the
device is properly configured.
Graphic files require specific applications in order to be displayed correctly. They
can not be displayed from the command-line interface like plain text files. The .figure
files are generally a graphic output from many GCG programs.
Graphics can be displayed directly on the screen. If an appropriate device is selected.
For example, on an X Windows terminal, ColorX can be used. ColorX is a graphic
language and a device for the X Windows environment.
File management in GCG requires knowledge of the operating system on which
GCG runs. Most likely, it is one of many flavors of UNIX. Common sense should be
applied to maintain naming consistency and to facilitate the task of file organiza-
GCG File Management — 313

tion. This is helped by the various file utilities for creating sequence files and con-
verting them into proper formats. Learning how to manage and use graphic files will
be helpful to visualize the output from many GCG programs.

Glossary and Abbreviations

EMBL Nucleotide Database Europe’s primary collection of all publicly available
nucleotide sequences. It is maintain in collaboration with GenBank and DDBJ (Japan).
GCG Genetics Computer Group started in 1982 within the Department of Genet-
ics at the University of Wisconsin. It went private in 1990 and was acquired by Oxford
Molecular Group in 1997. In 2000, Oxford Molecular was acquired by Pharmacopeia
WWW resulting in a new company called Accelrys (see Website: http://www.accelrys.com)
which is currently the commercial distributor of GCG(r) Wisconsin Package™.
GenBank An annotated collection of all publicly available nucleotide sequences.
The protein sequence collection is referred to specifically as GenPept. GenBank is
maintained by National Center of Biotechnology Information (NCBI), a unit of the US
National Institute of Health (NIH).
SWISS-PROT An annotated protein sequence database maintained and curated
by the Swiss Institute for Bioinformatics (SIB). The database designation is often
abbreviated as SW in GCG.
Wisconsin Package A suite of tools and programs for Bioinformatics sequence
analysis developed by GCG. It runs on various UNIX operating systems including
SUN Solaris, SGI IRIX, Compaq Tru64 UNIX, IBM AIX, and Red Hat Linux.
3 1 4 — Saisanit
CD Contents — 711

Appendices
712— Appendix
CD Contents — 713

1. CD Contents
2. A Collection of Useful Bioinformatic Tools
and Molecular Tables
3. Simple UNIX Commands
714— Appendix
CD Contents — 715

1 Appendix
CD Contents

What is Included on the CD?

The CD that comes with this book includes:
1. All Figures and Tables with legends from the various chapters, many of which
are in color. This is an excellent source of illustrative material for presentations.
2. Several bioinformatics software packages that the readers can install on their own
computer workstations or servers.
3. Several useful basic tables and charts for understanding genome properties.
The CD is organized into folders and subfolders. The readers should be able to load
the CD into the CD-drive of any IBM-Personal Computer or Apple Macintosh and
browse through the folders.
The color figures can be found in the Color Figures folder, organized into sub-
folders by chapter.
The software packages can be found in the Programs folder, organized into sub-
folders by the name of each package. For each program subfolder, there is a ReadmeCD
file that provides further information about the software, including how to install it, use
it, and where up-to-date versions can be downloaded from the Web. There is also infor-
mation on licensing and registration, and restrictions that may apply.

BioDiscovery
This folder contains software packages for microarray analysis that may be installed
on IBM-PC computers. Installation instructions are included in the file named
Readme.pdf. You will need to use the Acrobat Reader utility to read the file (see
Section “Adobe Acrobat Reader”). The BioDiscovery software was kindly provided
by Sorin Draghici, author of Chapter 35.

ClustalX
This folder contains the graphical interface versions of the Clustal multiple
sequence alignment program. Versions for both IBM-PC (clustalx1.81.msw.zip) and
Macintosh (clustalx1.81.PPC.sea.Hqx) are included. The files in the packages will
need to be unpacked with common unzipping utilities. ClustalX versions for various
flavors of UNIX are also available from the original source FTP website (see
WWW Website: ftp://ftp-igbmc.u-strasbg.fr/pub/ClustalX/), described in the readme file.
Permission to include ClustalX on this CD was kindly provided by Julie Thompson
and is described by Steven Thompson in Chapter 31.

715
716— Appendix

Ensembl
This folder contains the files needed to install the Ensembl package on a UNIX
server. Installation instructions are located in the additional docs subfolder in the
file named EnsemblInstall100.pdf. You will need to use the Acrobat Reader utility
to read the file (see Section “Adobe Acrobat Reader”). The source code subfolder
contains the required source code for both Ensembl and Bioperl. Note that the files
in the source code folder are in UNIX format. Please use BINARY FTP mode to
transfer those files to your UNIX server. Up-to-date versions of Ensembl and Bioperl
WWW
are available at their respective Websites (see Websites: http://www.ensembl.org/
and http://www.bioperl.org). The Ensembl software was kindly provided by James
Stalker, author of Chapter 25.

MicroAnalyser
This folder contains a software package for microarray analysis that may be installed
on Macintosh computers. Up-to-date versions of the software are available (see Website:
WWW http://imru.bham.ac.uk/MicroAnalyser/). Permission to include the MicroAnalyser soft-
ware on this CD was kindly provided by Adrian Platts.

Oligo
This folder contains demo versions of the Oligo primer design and analysis soft-
ware for both IBM-PC and Macintosh computers. This software was kindly provided
by Wojciech Rychlik, author of Chapter 21.

Sequencealign
This folder contains a PowerPoint demonstration of sequence alignment. It was
kindly contributed by David S. Wishart, author of chapter 27.

Singh_perl_scripts
This folder contains perl scripts for statistical analysis that were generously con-
tributed by Gautam Singh, author of Chapters 22 and 23. They can be used for solving
the problems described in Chapter 23.

Staden
This folder contains the Staden Sequence Analysis Package and the Gap4 Viewer
software that can be installed on an IBM-PC computer. For up-to-date versions see
WWW Website: http://www.mrc-lmb.cam.ac.uk/pubseq/. This software was kindly provided
by Roger Staden, author of Chapters 20 and 24.

TreeView
This folder contains the TreeView tree drawing software for both IBM-PC
and Macintosh computers. TreeView is a free program for displaying phylogenies.
Up-to-date versions, including UNIX versions, can be found (see Website: http://
WWW
taxonomy.zoology.gla.ac.uk/rod/treeview.html). Please visit the Website to register
TreeView if you wish to use it. Permission to include TreeView on this CD was
kindly provided by Roderic D. M. Page.
CD Contents — 717

Adobe Acrobat Reader

In several of the folders on the CD, there are information files that may be in PDF
format. To read PDF format files, you will need to have the free Acrobat Reader utility
installed on your computer. If you do not already have Acrobat Reader installed, you can
WWW download it (see Website:: http://www.adobe.com/products/acrobat/readstep.html).

Other Sources for Bioinformatics Software

There are many sources available for downloading software that may be useful.
Here are two of our favorites: EBI FTP Server (see Website: http://www.ebi.ac.uk/
WWW FTP/) and IUBio Archive for Biology data and software (see Website: http://
iubio.bio.indiana.edu/).
718— Appendix
Bioinformatic Tools and Molecular Tables— 719

2 Appendix
A Collection of Useful Bioinformatic Tools and Molecular Tables

The Genetic Code

2nd Position
U C A G

UUU Phe UCU Ser UAU Tyr UGU Cys U

UUC Phe UCC Ser UAC Tyr UGC Cys C
U
UUA Leu UCA Ser UAA Stop UGA Stop A
UUG Leu UCG Ser UAG Stop UGG Trp G

CUU Leu CCU Pro CAU His CGU Arg U

CUC Leu CCC Pro CAC His CGC Arg C
C
CUA Leu CCA Pro CAA Gin CGA Arg A

3rd Position
1st Position

CUG Leu CCG Pro CAG Gin CGG Arg G

AUU Ile ACU Thr AAU Asn AGU Ser U

AUC Ile ACC Thr AAC Asn AGC Ser C
A
AUA Ile ACA Thr AAA Lys AGA Arg A
AUG Met ACG Thr AAG Lys AGG Arg G

GUU Val GCU Ala GAU Asp GGU Gly U

GUC Val GCC Ala GAC Asp GGC Gly C
G
GUA Val GCA Ala GAA Glu GGA Gly A
GUG Val GCG Ala GAG Glu GGG Gly G
The codons are read as triplets in the 5' → 3' direction, i.e., left to right.
Termination codons are in bold.

719
7 2 0 — Appendix

IUPAC Nucleotide Codes

Code Members Nucleotide

A A Adenine
C C Cytosine
G G Guanine
T T Thymine (DNA)
U U Uracil (RNA)
Y C or T(U) pYrimidine
R A or G puRine
M A or C aMino
K G or T(U) Keto
S G or C Strong interaction (3 H bonds)
W A or T(U) Weak interaction (2 H bonds)
H A or C or T(U) not-G
B G or T(U) or C not-A
V G or C or A not-T
D G or A or T(U) not-C
N G,A,C or T(U) aNy base

IUPAC Amino Acid Codes

3-Letter Code 1-Letter Code Amino Acid

Ala A Alanine
Arg R Arginine
Asn N Asparagine
Asp D Aspartic acid
Cys C Cysteine
Gln Q Glutamine
Glu E Glutamic acid
Gly G Glycine
His H Histidine
Ile I Isoleucine
Leu L Leucine
Lys K Lysine
Met M Methionine
Phe F Phenylalanine
Pro P Proline
Ser S Serine
Thr T Threonine
Trp W Tryptophan
Tyr Y Tyrosine
Val V Valine
Asx B Aspartic acid or Asparagine
Glx Z Glutamic acid or Glutamine
Xaa X Any amino acid
Bioinformatic Tools and Molecular Tables— 721

Converting Base Size of a Nucleic Acid → Mass of Nucleic Acid

Number of Bases Mass of Nucleic Acid

1 kb ds DNA (Na+) 6.6 × 105 Da

1 kb ss DNA (Na+) 3.3 × 105 Da
1 kb ss RNA (Na+) 3.4 × 105 Da
1.52 kb ds DNA 1MDa ds DNA (Na+)
Average MW of a dsDNA 660 Da
Average MW of a ss DNA 330 Da
Average NW of an RNA 340 Da

Converting Base Size

of a Nucleic Acid → Maximum Moles of Protein
Molecular Amino
DNA Weight (Da) Acids 1 µg 1 nmol

270 bp 10,000 90 100 pmol or 6 × 1013 molecules 10 µg

1.35 Kbp 50,000 450 20 pmol or 1.2 × 1013 molecules 50 µg
2.7 Kbp 100,000 900 10 pmol or 6 × 1012 molecules 100 µg
4.05 Kbp 150,000 1350 6.7 pmol or 4 × 1012 molecules 150 µg

Average MW of an amino acid = 110 (Da).

3 bp are required to encode 1 amino acid.

Sizes of Common Nucleic Acids

Nucleic Acid Number of Nucleotides Molecular Weight
lambda DNA 48,502 (dsDNA) 3.2 × 107
pBR322 DNA 4361 (dsDNA) 2.8 × 106
28S rRNA 4800 1.6 × 106
23S rRNA (E. coli) 2900 1.0 × 106
18S rRNA 1900 6.5 × 105
16S rRNA (E.coli) 1500 5.1 × 105
5S rRNA (E. coli) 120 4.1 × 104
tRNA (E. coli) 75 2.5 × 104

Mass of Nucleic Acid ↔ Moles of Nucleic Acid

Mass Moles

1 µg/ml of nucleic acid 3.0 µM phosphate

1 µg of a 1 kb DNA fragment 1.5 pmol; 3.0 pmol ends
0.66 µg of a 1 kb DNA fragment 1 pmol
7 2 2 — Appendix

Sizes of Various Genomes

Organism Approximate Size (million bases)

Human 3000.0
M. Musculus (mouse) 3000.0
Drosophila (fruit fly) 135.6
Arabidopsis (plant) 100.0
C. elegans (round worm) 97.0
S. cerevisiae (yeast) 12.1
E. coli (bacteria) 4.7
H. influenzae (bacteria) 1.8

Genomic Equivalents of Species

µg quantity Number
Source pg/haploid a for Genome of Genomes
Organism of DNA Genome Avg.b Equivalence × 106
Human diploid 3.50 3.16 10.0 2.86
Mouse diploid 3.00 3.21 8.57 2.86
Rat diploid 3.00 3.68 8.57 2.86
Bovine haploid 3.24 3.24 9.26 2.86
Annelid haploid 1.45 1.45 4.14 2.86
Drosophila diploid 0.17 0.18 0.486 2.86
Yeast haploid 0.016 0.0245 0.0457 2.86
a pg/haploid genome was calculated as a function of the tissue source. Genomic equivalence

was calculated given that 10 µg of human genomic DNA contains 2.86 × 106 genome copies.
b Average of all values given in each tissue for that species.
Simple UNIX Commands — 723

3 Appendix
Simple UNIX Commands

The following tables contain a brief list of simple but useful UNIX commands1.
These commands can be used to move around the file system, examine files, and copy,
delete, or rename files. They can also be used to do housekeeping on a user’s account,
and to communicate with other users on the local system or on remote systems.

Directory Operations
Command Action

pwd present working directory (show directory name)

cd change directory: cd /path/name
cd change to your home directory: cd
mkdir make (create) new directory: mkdir Name
rmdir remove directory (if empty): rmdir Name
quota check disk space quota: quota -v

File Operations
Command Action
ls list files
cp copy files: cp /path/name newname
rm remove (i.e. delete) files: rm name
mv move or rename files: mv name newname
more page file contents (spacebar to continue): more name
cat scroll file contents: cat name
less better pager than more? (q to quit): less name
vi visual text editor (:wq to save and quit): vi name
pico pico text editor (Ctrl-X to quit): pico name
chmod change mode of file permissions: chmod xxx name

1Most commands have options. To see what options are available, use the man command to

open the manual pages for that command, e.g. type man ls to open the manual for the ls command.

723
7 2 4 — Appendix

Manual Pages
Command Action

man open the man pages for a command: man command

Communications
Command Action

write write messages to another user’s screen

talk talk split-screen with another user: talk username
mail UNIX email command
pine send or read E-mail with pine mail system
telnet connect to another computer via the network
ftp file transfer over the network
lynx text-based Web browser

System Operations
Command Action
df show free disk space
du show disk usage
ps list your processes
kill kill a process: kill ###
passwd change your password
date show date and time
w who is doing what on the system
who who is connected to the system
ping ping another computer (is it alive?)
finger get information on users
exit exit, or logout, from the system

X Windows
Command Action

clock & display a clock (&: run in background)

cmdtool & command tool window
filemgr & file manager
mailtool & email program
perfmeter & system performance meter
seqlab & SeqLab interface for GCG
setenv DISPLAY for setting the DISPLAY environment variable
shelltool & shell tool window
textedit & text editor
xterm & X terminal window

Bioinformatics File Formats Guide
No ratings yet
Bioinformatics File Formats Guide
22 pages
Applied Bioinformatics
100% (1)
Applied Bioinformatics
166 pages
Lab Manual Bioinformatics Laboratory (Bt2308) V Semester B.Tech Degree Programme Department of Biotechnology
No ratings yet
Lab Manual Bioinformatics Laboratory (Bt2308) V Semester B.Tech Degree Programme Department of Biotechnology
28 pages
Basic Linux Introduction
No ratings yet
Basic Linux Introduction
8 pages
Gbug Feb09 Cramer
No ratings yet
Gbug Feb09 Cramer
32 pages
Lab Record
0% (1)
Lab Record
117 pages
Unix Basics for Bioinformatics
No ratings yet
Unix Basics for Bioinformatics
52 pages
Linux Bootcamp Exercises
No ratings yet
Linux Bootcamp Exercises
9 pages
Introduction To Shell Scripting: © J. Banfelder, L. Skrabanek, Weill Cornell Medical College, 2013
No ratings yet
Introduction To Shell Scripting: © J. Banfelder, L. Skrabanek, Weill Cornell Medical College, 2013
6 pages
Whole Bioinfo Record
No ratings yet
Whole Bioinfo Record
49 pages
Basic Linux Alignement
No ratings yet
Basic Linux Alignement
37 pages
Whole Bioinfo Record
No ratings yet
Whole Bioinfo Record
49 pages
Linux Introduction
No ratings yet
Linux Introduction
20 pages
Linux, R, & PLINK Basics for Students
No ratings yet
Linux, R, & PLINK Basics for Students
22 pages
Genus Notes
0% (1)
Genus Notes
9 pages
"Linux at The Command Line": Don Johnson of BU IS&T
No ratings yet
"Linux at The Command Line": Don Johnson of BU IS&T
53 pages
Afpjawprwa'tj 3
No ratings yet
Afpjawprwa'tj 3
6 pages
Whole Bioinfo Record
No ratings yet
Whole Bioinfo Record
47 pages
Unix and Shell Programming Lab Manual (1) ...
No ratings yet
Unix and Shell Programming Lab Manual (1) ...
50 pages
Linux/UNIX Shell Scripting Lab Manual
No ratings yet
Linux/UNIX Shell Scripting Lab Manual
21 pages
Introduction To The Command Line For Genomics
No ratings yet
Introduction To The Command Line For Genomics
10 pages
Vikrant Unix Notes
No ratings yet
Vikrant Unix Notes
12 pages
Unix Lec1 v2
No ratings yet
Unix Lec1 v2
32 pages
Introduction To Unix1.2
No ratings yet
Introduction To Unix1.2
216 pages
Linux Commands
No ratings yet
Linux Commands
7 pages
1 Very Short Linux Manual
No ratings yet
1 Very Short Linux Manual
5 pages
Unix Commands: (Basic & Advanced)
No ratings yet
Unix Commands: (Basic & Advanced)
5 pages
Module 1 Session 1 Part 3 Linux
No ratings yet
Module 1 Session 1 Part 3 Linux
23 pages
Arhqh 32 Po 9 Lknan 2
No ratings yet
Arhqh 32 Po 9 Lknan 2
6 pages
Os Lab Record
No ratings yet
Os Lab Record
78 pages
Unix Shell Scripting
No ratings yet
Unix Shell Scripting
6 pages
Intro To Using Galaxy - For Bioinformatics: Carrie Ganote
No ratings yet
Intro To Using Galaxy - For Bioinformatics: Carrie Ganote
26 pages
Linux Practical2
No ratings yet
Linux Practical2
12 pages
OS Lab Manual
No ratings yet
OS Lab Manual
51 pages
UNIX/Linux Command List Guide
No ratings yet
UNIX/Linux Command List Guide
11 pages
Lab5 3 1
No ratings yet
Lab5 3 1
3 pages
Commands in Course Order: Command Usage Comment
No ratings yet
Commands in Course Order: Command Usage Comment
1 page
Aadim Busubung Rai LB 8
No ratings yet
Aadim Busubung Rai LB 8
10 pages
Intro to Bioinformatics Lab Guide
No ratings yet
Intro to Bioinformatics Lab Guide
6 pages
1st Unit UNIX
No ratings yet
1st Unit UNIX
21 pages
Pipingfile
No ratings yet
Pipingfile
11 pages
Linux Training
No ratings yet
Linux Training
22 pages
Bioinformatics Tools: Stuart M. Brown, PH.D Dept of Cell Biology NYU School of Medicine
No ratings yet
Bioinformatics Tools: Stuart M. Brown, PH.D Dept of Cell Biology NYU School of Medicine
50 pages
Unix Shell Scripting
No ratings yet
Unix Shell Scripting
6 pages
Tutorial: Unix Command Summary
No ratings yet
Tutorial: Unix Command Summary
12 pages
Os Lab Record
No ratings yet
Os Lab Record
102 pages
Linux Command Reference Index (GNU / Linux Kernel 2.4.18-3, 2.4.18-14 and 2.4.20-6) Linux Is A Registered Trademark of Linus Torvalds
No ratings yet
Linux Command Reference Index (GNU / Linux Kernel 2.4.18-3, 2.4.18-14 and 2.4.20-6) Linux Is A Registered Trademark of Linus Torvalds
9 pages
Linux For Bioinformatics (2012), Paul Stothard
100% (1)
Linux For Bioinformatics (2012), Paul Stothard
36 pages
Bioinformatics Tools & Databases
No ratings yet
Bioinformatics Tools & Databases
50 pages
Laboratory Component: Data Structures Laboratory (BCSL305) : Department of Computer Science and Engineering
No ratings yet
Laboratory Component: Data Structures Laboratory (BCSL305) : Department of Computer Science and Engineering
101 pages
Linux & GDB Guide for Students
No ratings yet
Linux & GDB Guide for Students
3 pages
An Introduction To Linux For Bioinformatics: Paul Stothard April 6, 2010
No ratings yet
An Introduction To Linux For Bioinformatics: Paul Stothard April 6, 2010
36 pages
Linux - Iv Unit
No ratings yet
Linux - Iv Unit
34 pages
Bioinformatics Lab 2
No ratings yet
Bioinformatics Lab 2
9 pages
Grep
100% (2)
Grep
20 pages
Unix Command Basics Guide
No ratings yet
Unix Command Basics Guide
96 pages
Bioinformatics Lab 2 (Evelyn)
No ratings yet
Bioinformatics Lab 2 (Evelyn)
9 pages
Linux Commands
No ratings yet
Linux Commands
11 pages
Introduction To Linux
No ratings yet
Introduction To Linux
22 pages
DFMA Insights for Engineers
No ratings yet
DFMA Insights for Engineers
32 pages
01 15mec 314 Metrology Unit 3 Signal+conditioning
No ratings yet
01 15mec 314 Metrology Unit 3 Signal+conditioning
5 pages
1-Theory of Metal Cutting PDF
100% (1)
1-Theory of Metal Cutting PDF
144 pages
01 15mec 314 Metrology Unit 3 Signal+conditioning PDF
No ratings yet
01 15mec 314 Metrology Unit 3 Signal+conditioning PDF
73 pages
Corrosion Testing Article
No ratings yet
Corrosion Testing Article
4 pages
Automatic Anti Glare System For Night Ti
No ratings yet
Automatic Anti Glare System For Night Ti
4 pages
Ersa 2008
No ratings yet
Ersa 2008
26 pages
Industrial Graphene Technical Sheet
No ratings yet
Industrial Graphene Technical Sheet
3 pages
IBL July2021final4
No ratings yet
IBL July2021final4
228 pages
Qap - 32
No ratings yet
Qap - 32
2 pages
Vetting Inspns Unit 9new
No ratings yet
Vetting Inspns Unit 9new
21 pages
Booklet 2025 Unit 7 Institutions Satellite Env DM
No ratings yet
Booklet 2025 Unit 7 Institutions Satellite Env DM
112 pages
Coleridge-The Nightingale
No ratings yet
Coleridge-The Nightingale
4 pages
Global Action Plan For The Prevention of Runway Incursions: Volume I - Recommendations
No ratings yet
Global Action Plan For The Prevention of Runway Incursions: Volume I - Recommendations
25 pages
SN2364 BOG Calculation - Ballast Voyage
No ratings yet
SN2364 BOG Calculation - Ballast Voyage
1 page
ALEPH MY Manual PDF
100% (1)
ALEPH MY Manual PDF
2 pages
Omm Technique List
No ratings yet
Omm Technique List
2 pages
PJABE Vol18 No1 4 2022 Brusola Et Al
No ratings yet
PJABE Vol18 No1 4 2022 Brusola Et Al
19 pages
Geriatric Nursing
100% (2)
Geriatric Nursing
64 pages
G9 Performance Task Proposal
No ratings yet
G9 Performance Task Proposal
5 pages
Skeletal System: The Appendicular Skeleton: Clinical Case Study
No ratings yet
Skeletal System: The Appendicular Skeleton: Clinical Case Study
24 pages
Water Distribution On Earth U3M1L1-LAPTOP-7JLB48DN
No ratings yet
Water Distribution On Earth U3M1L1-LAPTOP-7JLB48DN
13 pages
15 Systematic and Random Errors
No ratings yet
15 Systematic and Random Errors
12 pages
FINAL
No ratings yet
FINAL
31 pages
Vortex Flowmeter
No ratings yet
Vortex Flowmeter
27 pages
CP 3001
No ratings yet
CP 3001
2 pages
Complex Numbers
No ratings yet
Complex Numbers
24 pages
Produk SKE 2022 - Update 16022022
No ratings yet
Produk SKE 2022 - Update 16022022
449 pages
Organic Chemistry Presentation
No ratings yet
Organic Chemistry Presentation
9 pages
Biochemical Engineering Journal
No ratings yet
Biochemical Engineering Journal
8 pages
Light Soaking Chamber
No ratings yet
Light Soaking Chamber
3 pages
Torque Divider D8R
No ratings yet
Torque Divider D8R
8 pages
Softcopy For Teachers
86% (7)
Softcopy For Teachers
209 pages
1st Module PPT 4-9-18
No ratings yet
1st Module PPT 4-9-18
88 pages

Bioinfomatics

Uploaded by

Bioinfomatics

Uploaded by

Introduction to

18 GCG File Management

UNIX Commands and Overview

Using Database Sequences and Sequence Formats

recognized correctly by the programs. However, after accumulating files it will be

Editing GCG Formatted Sequences

Glossary and Abbreviations

What is Included on the CD?

Adobe Acrobat Reader

Other Sources for Bioinformatics Software

The Genetic Code

UUU Phe UCU Ser UAU Tyr UGU Cys U

CUU Leu CCU Pro CAU His CGU Arg U

CUG Leu CCG Pro CAG Gin CGG Arg G

AUU Ile ACU Thr AAU Asn AGU Ser U

GUU Val GCU Ala GAU Asp GGU Gly U

IUPAC Nucleotide Codes

IUPAC Amino Acid Codes

Converting Base Size of a Nucleic Acid → Mass of Nucleic Acid

1 kb ds DNA (Na+) 6.6 × 105 Da

Converting Base Size

270 bp 10,000 90 100 pmol or 6 × 1013 molecules 10 µg

Average MW of an amino acid = 110 (Da).

Sizes of Common Nucleic Acids

Mass of Nucleic Acid ↔ Moles of Nucleic Acid

1 µg/ml of nucleic acid 3.0 µM phosphate

Sizes of Various Genomes

Genomic Equivalents of Species

pwd present working directory (show directory name)

man open the man pages for a command: man command

write write messages to another user’s screen

clock & display a clock (&: run in background)

You might also like