Showing posts with label Anniversary. Show all posts
Showing posts with label Anniversary. Show all posts

Saturday, February 25, 2017

Blog anniversary: 5 years


The first post was put up on this blog on Saturday, February 25 2012, which makes today the fifth anniversary.

First blog header

By my reckoning, this is the 469th blog post, not all of them written by me, of course; but this makes an average of one post for every 3.9 of the 1,827 days. I have never counted the number of actual words, but if I had ever contemplated that number then I probably would never have started.

Second blog header

It is rather tricky to estimate the readership, because of the number of blog hits that clearly come from robots. However, even trying to take that into account, I get an estimate just short of 500,000 pageviews over the 5 years.

Third blog header

So, thanks to everyone for dropping by. If you ever feel inclined to re-read any of the old posts, then they are grouped roughly by topic in the "Pages" at the top of the right-hand column.

Monday, September 14, 2015

Multiple sequence alignment


Following a previous post on Multiple sequence alignment, celebration of the 20th anniversary of my first publication in the alignment field continues, with a new publication:

  • Morrison DA, Morgan MJ, Kelchner SA (2015) Molecular homology and multiple sequence alignment: an analysis of concepts and practice. Australian Systematic Botany 28: 46-62.

This paper places sequence alignment within the larger picture of detecting homologies in molecular data, emphasizing the hierarchical nature of homologies. Surprisingly, this relationships has not been emphasized before. It also points out why nucleotide alignments are a unique form of homology assessment, even within this framework. Indeed, the only genotypic data are nucleotides, since everything else is an expression of the nucleotide sequences, rather than being inherited.

The article is Open Access.


Wednesday, March 4, 2015

Multiple sequence alignment


I started actively working on phylogenetic networks more than 10 years ago, when I gave a talk at the Phylogenetic Combinatorics and Applications meeting in Uppsala in July 2004.

However, before I started working on networks I had for several years been working on multiple sequence alignment methodology, and I still do. This work is also of direct relevance to network construction, of course, since faulty alignments will generate conflicting signals that can confound the biological signals that alone should appear in the network.

This year marks the 20th anniversary of my first publication in the alignment field (see the list appended below). To celebrate this I have some review / commentary articles planned. The first of these has now appeared online, and I would like to draw it to your attention:
  • Morrison DA (2015) Is multiple sequence alignment an art or a science? Systematic Botany 40: 14-26.
This paper relates current sequence alignment procedures to homology assessments as they are practiced for other data. Most algorithms can be seen as implementing only one of the several criteria that are used to identify homologies, which is inadequate. Suggestions are made for improving this situation.

Note: the second of these papers has now also appeared.


There will also be a couple of upcoming blog posts canvassing a few issues that I see as important for the future development of alignment methods.

Previous Publications

Theory

Ellis J, Morrison DA (1995) Effects of sequence alignment on the phylogeny of Sarcocystis deduced from 18S rDNA sequences. Parasitology Research 81: 696-699.

Morrison DA, Ellis JT (1997) Effects of nucleotide sequence alignment on phylogeny estimation: a case study of 18S rDNAs of Apicomplexa. Molecular Biology and Evolution 14: 428-441. [This has been the most cited of these publications, surprising me by still getting cited about once per month]

Morrison DA (2006) Multiple sequence alignment for phylogenetic purposes. Australian Systematic Botany 19: 479-539.

Morrison DA (2009) A framework for phylogenetic sequence alignment. Plant Systematics and Evolution 282: 127-149. [This was actually accepted for publication in 2007]

Morrison DA (2009) Why would phylogeneticists ignore computerized sequence alignment? Systematic Biology 58: 150-158.

Morrison DA (2010) [Book review of] ‘Sequence Alignment: Methods, Models, Concepts, and Strategies’. Systematic Biology 59: 363-365.

Empirical examples

Mugridge NB, Morrison DA, Johnson AM, Luton K, Dubey JP, Votypka J, Tenter AM (1999) Phylogenetic relationships of the genus Frenkelia: a review of its history and new knowledge gained from comparison of large subunit ribosomal RNA gene sequences. International Journal for Parasitology 29: 957-972.

Mugridge NB, Morrison DA, Heckeroth AR, Johnson AM, Tenter AM (1999) Phylogenetic analysis based on full-length large subunit ribosomal RNA gene sequence comparison reveals that Neospora caninum is more closely related to Hammondia heydorni than to Toxoplasma gondii. International Journal for Parasitology 29: 1545-1556.

Mugridge NB, Morrison DA, Jäkel T, Heckeroth AR, Tenter AM, Johnson AM (2000) Effects of sequence alignment and structural domains of ribosomal DNA on phylogeny reconstruction for the protozoan family Sarcocystidae. Molecular Biology and Evolution 17: 1842-1853.

Beebe NW, Cooper RD, Morrison DA, Ellis JT (2000) Subset partitioning of the ribosomal DNA small subunit and its effects on the phylogeny of the Anopheles punctulatus group. Insect Molecular Biology 9: 515-520.

Beebe NW, Cooper RD, Morrison DA, Ellis JT (2000) A phylogenetic study of the Anopheles punctulatus group of malaria vectors comparing rDNA sequence alignments derived from the mitochondrial and nuclear small ribosomal subunits. Molecular Phylogenetics and Evolution 17: 430-436.

Wednesday, February 25, 2015

Three years of network blogging


Today is the third anniversary of starting this blog, and this is post number 325. Thanks to all of our visitors over the past three years — we hope that the next year will be as productive as this past one has been.

I have summarized here some of the accumulated data, in order to document at least some of the productivity.

As of this morning, there have been 238,613 pageviews, with a median of 192 per day. The blog has continued to grow in popularity, with a median of 70 pageviews per day in the first year, 189 per day in the second year, and 353 per day in the third year. The range of pageviews was 172-1148 per day during this past year. The daily pattern for the three years is shown in the first graph.

Line graph of the number of pageviews through time, up to today.
The largest values are off the graph. The green line is the half-way mark.
The inset shows the mean (blue) and standard deviation of the daily number of pageviews.

There are a few general patterns in the data, the most obvious one being the day of the week, as shown in the inset of the above graph. The posts have usually been on Mondays and Wednesdays, and these two days have had the greatest mean number of pageviews.

Some of the more obvious dips include times such as Christmas - New Year; and the biggest peaks are associated with mentions of particular blog posts on popular sites.

Unfortunately, the data are also seriously skewed by visits from troll sites. These have been particularly from the Ukraine, which is solely responsible for the peak between days 900 and 1000. The smaller following peak represents visits from Taiwan.

The posts themselves have varied greatly in popularity, as shown in the next graph. It is actually a bit tricky to assign pageviews to particular posts, because visits to the blog's homepage are not attributed by the counter to any specific post. Since the current two posts are the ones that appear on the homepage, these posts are under-counted until they move off the homepage, (after which they can be accessed only by a direct visit to their own pages, and thus always get counted). On average, 30% of the blog's pageviews are to the homepage, rather than to a specific post page, and so there is considerable under-counting.

Scatterplot of post pageviews through time, up to last week; the line is the median.
Note the log scale, and that the values are under-counted (see the text).

It is good to note that the most popular posts were scattered throughout the years. Keeping in mind the initial under-counting, the top collection of posts (with counted pageviews) have been:
129
42
172
10
181
73
58
188
146
98
49
29
8
The Music Genome Project is no such thing
Charles Darwin's unpublished tree sketches
The acoustics of the Sydney Opera House
Why do we still use trees for the dog genealogy?
How do we interpret a rooted haplotype network?
Carnival of Evolution, Number 52
Who published the first phylogenetic tree?
Phylogenetics with SpongeBob
Charles Darwin's family pedigree network
Faux phylogenies
Evolutionary trees: old wine in new bottles?
Network analysis of scotch whiskies
Tattoo Monday
8,347
5,271
5,052
3,954
3,644
2,398
2,077
2,037
2,011
1,951
1,870
1,756
1,747
This list is not very different to the same time last year. Posts 129 (which is linked in Wikipedia) and 172 continue to receive visitors almost every day.

The audience for the blog continues to be firmly in the USA. Based on the number of pageviews, the visitor data are:
United States
France
Ukraine [spurious]
Germany
United Kingdom
Russia
Canada
Australia
China
Turkey
40.3%
6.8%
5.1%
5.0%
4.7%
3.1%
1.8%
1.6%
1.0%
0.7%

Finally, if anyone wants to contribute, then we welcome guest bloggers. This is a good forum to try out all of your half-baked ideas, in order to get some feedback, as well as to raise issues that have not yet received any discussion in the literature. If nothing else, it is a good place to be dogmatic without interference from a referee!

Tuesday, February 25, 2014

Two years of network blogging


Today is the second anniversary of starting this blog, and this is post number 222. Thanks to all of our visitors over the past two years — we hope that the next year will be as productive as this past one has been.

I have summarized here some of the accumulated data, in order to document at least some of the productivity.

As of this morning, there have been 104,211 pageviews, with a median of 129 per day. The blog has continued to grow in popularity, with a median of 70 pageviews per day in the first year and 189 per day in the second year. The range of pageviews was 69-812 per day during this past year, and 3-667 the previous year. The daily pattern for the two years is shown in the first graph.

Line graph of the number of pageviews through time, up to today.
The largest values are off the graph. The green line is the half-way mark.
The inset shows the mean (blue) and standard deviation of the daily number of pageviews.

The erratic nature of the daily variation is apparently all too typical of blogs, and there appears to be no good explanation for it.  So, we might take this as a good example of the stochastic nature of the web.

There are a few general patterns in the data, the most obvious one being the day of the week, as shown in the inset of the above graph. The posts have usually been on Mondays and Wednesdays, and these two days have had the greatest mean number of pageviews.

Some of the more obvious dips include times such as Christmas - New Year; and the biggest peaks are associated with mentions of particular blog posts on popular sites. There also continue to be a few instances of "rogue" visits. These tend to be visits from sites such as Referer and Vampirestat.

The posts themselves have varied greatly in popularity, as shown in the next graph. It is actually a bit tricky to assign pageviews to particular posts, because visits to the blog's homepage are not attributed by the counter to any specific post. Since the current two posts are the ones that appear on the homepage, these posts are under-counted until they move off the homepage, (after which they can be accessed only by a direct visit to their own pages, and thus always get counted). On average, 30% of the blog's pageviews are to the homepage, rather than to a specific post page, and so there is considerable under-counting.

Scatterplot of post pageviews through time, up to last week; the line is the median.
Note the log scale, and that the values are under-counted (see the text).

It is good to note that the most popular posts were scattered throughout the two years. Keeping in mind the initial under-counting, the top collection of posts (with counted pageviews) have been:
129
42
73
172
10
98
58
49
29
19
67
188
8
The Music Genome Project is no such thing
Charles Darwin's unpublished tree sketches
Carnival of Evolution, Number 52
The acoustics of the Sydney Opera House
Why do we still use trees for the dog genealogy?
Faux phylogenies
Who published the first phylogenetic tree?
Evolutionary trees: old wine in new bottles?
Network analysis of scotch whiskies
Tattoo Monday IV
Metaphors for evolutionary relationships
Phylogenetics with SpongeBob
Tattoo Monday
4,552
3,100
1,964
1,891
1,641
1,451
1,359
1,352
1,298
1,247
1,178
1,088
1,051
This is quite a different list to the same time last year. Posts 129, 42 and 172 continue to receive visitors almost every day.

The audience for the blog continues to be firmly in the USA. Based on the number of pageviews, the visitor data are:
United States
United Kingdom
Germany
France
Russia
Canada
Australia
China
Brazil
Poland
41.1%
5.6%
4.9%
3.8%
3.3%
2.7%
2.1%
1.4%
1.0%
0.8%
You will note that this list is dominated by English-speaking countries. The blog does have a link to Google Translate to help other people, but it is clear that the audience is made up almost entirely of those people who are comfortable with English (or Australian, at any rate).

Finally, if anyone wants to contribute, then we welcome guest bloggers. This is a good forum to try out all of your half-baked ideas, in order to get some feedback, as well as to raise issues that have not yet received any discussion in the literature. If nothing else, it is a good place to be dogmatic without interference from a referee!

Monday, February 25, 2013

One year of network blogging


Today is the first anniversary of starting this blog, and this is post number 120. So, a big thankyou to all of our visitors over the past year. We hope that the next year will be as productive as this past one has been.

We have summarized here some of the accumulated data, in order to document at least some of the productivity.

As of this morning, there have been 29,316 pageviews, for a median of 70 per day, but with a range of 3-667 pageviews. The daily pattern for the year is shown in the first graph.

Line graph of pageviews through time, up to today.
The largest value (Day 224) is off the graph.

The erratic nature of the daily variation is apparently all too typical of blogs, and there appears to be no good explanation for it. So, we might take this as a good example of the stochastic nature of the web. Nervertheless, there are general patterns detectable. For example, the steady rise from one third of the way through the year is very gratifying, although the slight dip right at the end is less so. The recent mean pageview data are:
October – November
December
Christmas – New Year
January – mid February
late February
90
130
90
130
90

Some of the sharp peaks in the graph were due to various identifiable events, including the email announcing the existence of the blog, the addition of the blog to the Systematic Biology homepage, the mention of the blog in some posts at the Scientopia blog, and the mention of some of the posts in the monthly Carnival of Evolution blog roundup.

The biggest peak (which goes off the graph) was due to hosting an edition of the Carnival of Evolution, which generated an extra 2,000 pageviews. There were also unexpected Twitter announcements for particular posts, including the fourth Tattoo post (which got picked up when it happened to go out on April Fool's Day) and the one on Scotch Whiskies, which is apparently a topic of widespread interest.

There are also other general patterns in the data, the most obvious one being the day of the week, as shown in the second graph. The posts have usually been on Mondays and Wednesdays, and these two days have had the greatest mean number of pageviews (84 and 90, respectively), The other weekdays have had somewhat less (Tuesday 82, Thursday 75, Friday 65), and the weekend even fewer (Saturday 50, Sunday 63).

Boxplot of the daily pageviews, up to last Friday.
The largest value has been excluded.

There were also a few instances of what appear to be "rogue" visits during late December and early January. These involved an almost instantaneous addition of c.100 pageviews, without obvious explanation, which presumably came from bots examining the blog. They occurred once the blog reached 100 posts, which may not be coincidental.

The posts themselves have varied greatly in popularity, as shown in the next graph. It is actually a bit tricky to assign pageviews to particular posts, because visits to the blog's homepage are not attributed by the counter to any specific post. Since the current two posts are the ones that appear on the homepage, these posts are under-counted until they move off the homepage, (after which they can be accessed only by a direct visit to their own pages, and thus always get counted). On average, 33% of the blog's pageviews are to the homepage, rather than to a specific post page, and so there is considerable under-counting.

Scatterplot of post pageviews through time, up to today; the line is the median.
Note the log scale, and that the values are under-counted (see the text).

The fact that 33% of the blog's pageviews are to the homepage means that one-third of the visitors are reading the blog as the posts are posted, while two-thirds are visiting via web searches and external links. So, we do have a regular readership, as well as having itinerant visitors.

It is good to note that the most popular posts were scattered throughout the year. Keeping in mind the under-counting, the top collection of posts (with counted pageviews) have been:
73
42
19
49
10
58
98
26
67
17
29
2
35
Carnival of Evolution Number 52
Charles Darwin's unpublished tree sketches
Tattoo Monday IV
Evolutionary trees: old wine in new bottles?
Why do we still use trees for the dog genealogy?
Who published the first phylogenetic tree?
Faux phylogenies
Steven Jay Gould was wrong
Metaphors for evolutionary relationships
Tattoo Monday III
Network analysis of scotch whiskies
The first phylogenetic network (1755)
Tattoo Monday V
1,559
1,302
737
687
666
606
600
429
420
415
414
403
394

This blog has two possible uses: (i) providing an outlet for commentaries and ideas by professionals; and (ii) advertising phylogenetic networks to a wider audience. It has turned out that the latter posts have appeared mostly on Mondays and the former mostly on Wednesdays. Furthermore, it seems reasonable for the latter posts to have fewer pageviews, since the expected audience is much smaller (or "more select", as we prefer to see it).

There have been five main types of posts:

(i) Discussions of methodology
These are the mainstay of the blog for those who are professionally interested in phylogenetic networks. A wide range of topics have been discussed, and there is plenty more that can be said.

If anyone wants to contribute to this part of the blog, then we welcome guest bloggers. This is a good forum to try out all of your half-baked ideas, in order to get some feedback, as well as to raise issues that have not yet received any discussion in the literature. If nothing else, it is a good place to be dogmatic without interference from a referee!

As a blogger, you are very likely to get feedback from people, even if they do not leave comments on the blog itself. Professionals do not yet seem to be very used to writing blog comments, but they will send you an email.

(ii) Explanations
There are all sorts of things that seem obvious to professionals but which are obscure to non-experts. These posts are designed to redress this situation, so that there is somewhere on the web for people to go when they want to find out. They seem to have been rather popular posts.

(iii) Data analyses
The EDA analyses are intended to illustrate the usefulness of networks as data summaries (as opposed to their use for strictly evolutionary analyses). In particular, choosing datasets outside science advertizes the potential uses of scientific data analysis to a wider public. Networks provide a valuable way of visualizing a table of numbers -- so, any time you see such a table you should be tempted to find out whether a network will help people to picture what it says. Most of the analyses have proved quite popular in terms of pageviews, but there has been little feedback about whether the public understands any of it.

(iv) Historical commentaries
These have usually been among the most popular posts with visitors. They simply involve bits of information that have accumulated through time, and the blog seems to be a good place to put them. They often involve phylogenetic trees, rather than networks, but that is only because trees have been used more often and thus have more history. Mind you, you have to have a good title in order to attract the public's attention!

(v) Miscellaneous
These are uncategorizable posts, which just consist of things that relate in some way to phylogenetic analysis, however peripherally. There are almost no other phylogenetics blogs on the web, and so there is no other obvious outlet for this information. The most popular of these posts have been the ones compiling the various pictures of phylogenetic tattoos that are lying around the web -- these are the most common Google search hits to the blog, along with the first compilation of Darwin's unpublished tree sketches.

Along with these posts, we have also started compiling a list of datasets that will be useful for evaluating network algorithms. Such datasets, where biologists seem to have an independently validated idea about the phylogenetic pattern, are hard to come by, and so it is worthwhile to make them available at a centralized location. A blog page is a good as anywhere else for this purpose, and the number of visits to this page is quite steady. Contributions of datasets are always welcome.

Finally, the audience for the blog has been, not unexpectedly, firmly in the USA. Based on the number of  pageviews, the data are:
United States
United Kingdom
Germany
Russia
Canada
France
Australia
New Zealand
Netherlands
Sweden
37.4%
6.6%
5.3%
4.7%
4.0%
2.7%
2.3%
1.7%
1.6%
1.5%
You will note that this list is dominated by English-speaking countries. The blog does have a link to Google Translate to help other people, but it is clear that the audience is made up almost entirely of those people who are comfortable with English (or Australian, any any rate).