COMSATS University
Abbottabad Campus
Assignment1
Course Title:
Bioinformatics
Submitted By:
Faisal khan
Registration No:
SP20-Bcs -061
Submitted To:
Muhammad Rizwan:
Q1: How Bioinformatics is related to computer?
Bioinformatics:
Bioinformatics is an interdisciplinary field that develops methods and software tools for understanding
biological data. As an interdisciplinary field of science, bioinformatics combines computer science, statistics,
mathematics, and engineering to analyze and interpret biological data.
Bioinformatics is related to computer:
Bioinformatics is an interdisciplinary field bringing together biology, computer science, mathematics, statistics,
and information theory to analyze biological data for interpretation and prediction. One popular approach is to
develop a predictive computer model from a database of known gene sequences and use the resulting model to
predict where genes are likely to be in newly generated sequence information. Currently, bioinformatics
approaches to this problem range from statistical modeling to machine learning techniques such as artificial
neural networks, hidden Markov models, and support vector machines. The development of predictive computer
models can be accomplished in many ways. A technique that has generated significant attention for its
flexibility, ease of parallelization, and useful performance is evolutionary computation (EC). For pattern
recognition, EC can be used to optimize the parameters or structure (or both) of any type of classifier or
predictive model. EC can also be applied to problems in bioinformatics that do not necessarily involve pattern
recognition. This chapter is intended for the computer scientists who require some additional background
material for the biological problems, and provides an introduction to basics of biology and bioinformatics.
Q2: Give some practical examples from computer domain?
Domains in Computer Science
1. Computer Theory
Computer Theory, according to our lecture notes, is the branch of computer science concerned with finding out
how efficiently a given problem can be solved on a model computer using a given algorithm. Within Computer
Theory are three main focuses: Computability Theory, Complexity Theory, and Formal Languages.
Computability theory is concerned with finding out whether or not a problem is solvable with a finite number of
computations. This is an important field in computer science because it saves time by determining if a problem
is solvable. Complexity Theory deals with finding out how much time an algorithm will take to solve a problem
based on how many elements are involved with the computation. The answer is usually written in “Big O”
notation. There are generally two speeds at which an algorithm works: polynomial time and exponential time.
Polynomial time is good for most algorithms; exponential is completely impractical for any useful dataset. The
details of polynomial and exponential algorithms are discussed in more detail in the section “Tractable vs.
Intractable Problem.”
Formal languages are a way of talking about languages in the abstract. They are called “formal” because “...all
the rules for the language are explicitly stated in terms of what strings of symbols can occur” (1 p.7). Formal
languages are defined using set theory. There is a fundamental set containing the acceptable characters in the
language. Up from the basic set is another set containing all allowable combinations of characters, and beyond
this is a set of all allowable sentences. This continues to the largest constructs in a given language.
2. Algorithms
Algorithms are an explicitly described method for solving a problem. A great deal of work is done by computer
scientists to find the most efficient algorithm to solve a problem. For example, the problem of sorting a list of
integers can be solved in many ways. The most obvious one would be to compare each element to its neighbor
and shift positions if they are out of the desired order. This routine, called Bubble Sort is terribly inefficient for
all but the smallest of lists. A more appropriate algorithm for practical use is Merge Sort, which uses recursion
and the Divide and Conquer method to efficiently sort long lists. Implementations of algorithms are the most
common applications of computers. Computers excel at executing algorithms. Almost any simple task can be
defined by an algorithm. When put together, a collection of algorithms creates a program.
There are limitations to algorithms though. Many algorithms are very rigid and only correctly on a narrow
range of inputs. A good example of this is Dijkstra's Algorithm (a shortest path graph algorithm). According to
Weiss, this algorithm performs correctly only if there is no negative cost
3. Cryptography
Cryptography is an ancient science concerned with secret writing. That is hiding a message in what appears to
be an unintelligible sequence of symbols but which can through some transposition or substitution algorithm be
converted back into a meaningful message. Cryptography is still a very important discipline, perhaps more than
it ever was as a result of people and businesses storing sensitive information on remotely accessible computers.
In the modern context of computers, cryptography is the science of encrypting data. Data encryption is
necessary to keep information such as bank account numbers hidden from those with malicious intent. The
importance of cryptography has increased greatly since the ubiquity of the Internet. In fact, one of the most
common applications of cryptography is to protect data transmitted across networks. Older, more naïve,
protocols such as telnet send unencrypted plain text across networks allowing user credentials to be easily
intercepted. SSH is a secure substitute for telnet; It encrypts data before transmitting it. Encrypting data at rest
on a disk is just as important as encrypting transmitted data. If a malicious user obtains access to a disk either
physically or electronically, file system and database encryption are the last line of defense. A simple method
for encrypting data is to use the XOR bitwise operator on a key string and the message to be encrypted. The
result is an encrypted message. The beauty of the XOR operator is that if the key string is then XORed against
the encrypted data it will return the original message.
Example:
A domain name takes the form of two main elements. For example, the domain name Facebook.com consists of
the website’s name (Facebook) and the domain name extension (.com). When a company (or a person)
purchases a domain name, they’re able to specify which server the domain name points to.
Examples of domain:
bigstuff.cornell.edu
Must be registered because it is a three-part domain name (see glossary) ending with "cornell.edu"
www.bigstuff.cornell.edu
Can be created by the college or department that has registered bigstuff.cornell.edu
Does not need a separate entry in the registry because it is a four-part domain name
server3.dept.cornell.edu
does not need its own registry entry as long as dept.cornell.edu is registered is the standard style of name for an
individual computer or host birdsource.org or sharedresearch.info or marysmith.us etc.Any domain name not
ending with "cornell.edu"must be registered if its domain name service is provided by Cornell's domain name
servers otherwise must be recorded if purchased with university funds or if running on a university-owned
computer.
Q3:what is Phylogenetic enlist tools, servers and database for construction of
phylogenetic tree?
Constructing phylogenetic trees:
Many different types of data can be used to construct phylogenetic trees, including morphological data, such as
structural features, types of organs, and specific skeletal arrangements; and genetic data, such as mitochondrial
DNA sequences, ribosomal RNA genes, and any genes of interest.
These types of data are used to identify homology, which means similarity due to common ancestry. This is
simply the idea that you inherit traits from your parents, only applied on a species level: all humans have large
brains and opposable thumbs because our ancestors did; all mammals produce milk from mammary glands
because their ancestors did.
Trees are constructed on the principle of parsimony, which is the idea that the most likely pattern to be the one
requiring the fewest changes. For example, it is much more likely that all mammals produce milk because they
all inherited mammary glands from a common ancestor that produced milk from mammary glands, versus
multiple groups of organisms each independently evolving mammary glands
Servers:
1. MEGA
MEGA is a useful software in constructing phylogenies and visualizing them, and also for data conversion. It
can easily convert alignment files to other formats such as nexus, paup, phylip, and fasta, and so on. The MEGA
tree explorer is helpful in editing trees very easily; sub trees can also be selected and edited separately. Some
tree image export options are also available. The input formats are newick, phylip, mega, and nexus. The
phylogenetic tree can also be converted in newick format but it falls short on converting it into other formats
such as phylip which is required in other analyses such as selection analysis.
2. Dendroscope
It is helpful in visualizing large trees and provides several options to export their graphics with a command line.
Several different views are also available, trees can be easily re-rooted and node labels and branches can be
easily formatted. It can export trees in newick and nexus format. Although users will have to register
themselves first to use this feature
3. Fig Tree
It is actually designed to visualize trees that are produced by BEAST [4] program. Tip labels and node labels
can be easily edited. It can easily export trees in nexus, newick, and JSON format with some graphics export
options such as emf, pdf, sg, png, etc.
4. Phylotree.js
It is a JavaScript based library to visualize and annotate trees and offer some other customizations. It has a wide
application in Data monkey [6] comparative analyses. A user can upload trees using Phylotree.js where a user
can easily select test and reference branches, and any changes can be mapped to their position on the
corresponding structure. It is also good for comparison of trees with links between leaves known as a tangle
gram, where crossings can represent evolutionary events. It also offers several export options and other built-in
features [5].
5. Ggtree
Ggtree is an R package for phylogenetic tree visualization and annotation. It also displays annotation data on the
tree apart from visualizing it. Users can annotate trees with their own data and can easily convert trees into a
data frame,
.Tools and Database:
There are several bioinformatics tools and databases that can be used for phylogenetic analys.These include
PANTHER, P-Pod, PFam, Tree am, and the PhyloFacts structural phylogenomic encyclopedia.
Each of these databases uses different algorithms and draws on different sources for sequence information, and
therefore the trees estimated by PANTHER, for example, may differ significantly from those generated by P-
Pod or PFam.
As with all bioinformatics tools of this type, it is important to test different methods, compare the results, then
determine which database works best (according to consensus results) for studies involving different types of
datasets.
Q4:What are the main parts of computer science in bioinformatics?
Computer science in bioinformatics:
Bioinformatics is a new and rapidly evolving discipline that has emerged from the fields of experimental
molecular biology and biochemistry, and from the artificial intelligence, database, and algorithms disciplines of
computer science (http://www.cs.wright.edu/cse/research/facilities-room.phtml?room=307). There are two
possibilities to increase research in Bioinformatics field, one way is to teach computer science to biologists,
biotechnologists and the other way is to teach biology to computer scientists. I think both the departments are
doing these things in their capacities. It is easy to teach new computer technologies to biologists.
Bioinformatics includes biological studies that use computer programming as part of their methodology, as well
as a specific analysis "pipelines" that are repeatedly used, particularly in the field of genomics. Common uses of
bioinformatics include the identification of candidate’s genes and single nucleotide polymorphisms (SNPs).
Often, such identification is made with the aim of better understanding the genetic basis of disease, unique
adaptations, desirable properties (esp. in agricultural species), or differences between populations. In a less
formal way, bioinformatics also tries to understand the organizational principles within nucleic acid and protein
sequences, called proteomics
Q5: Enlist latest application of bioinformatics?
Applications of Bioinformatics:
1 Varietal Information System
2 Plants Genetic Resources Data Base
3 Biometrical Analyses
4 Storage and Retrieval of Data
5 Studies on Plant Modeling
6 Pedigree Analyses
7 Preparations of Reports
8 Updating of Information
9 Diagrammatic Representations
10 Planning of Breeding Program