Showing posts with label Visualization. Show all posts
Showing posts with label Visualization. Show all posts

Monday, December 9, 2019

The Science of Spice by S. Farrimond — in networks


It's feasting time; and any good feast tickles the tongues with flavours unknown and exotic. But not all spices go well with each other. One suggested solution is to "understand flavour connections" in order to "revolutionize your cooking", which is the subtitle of a book by Stuart Farrimond: The Science of Spice (Dorling Kindersley 2018, ISBN: 978-0-2413-0214-9).

In his book, Farrimond categorizes spices into flavour groups characterized by their major and secondary chemical compounds, such as "sweet warming phenols", "fragrant terpenes" and "pungent compounds". He presents a "periodic table of spices" covering 54 spices, and gives a four-step protocol for how to combine spices:
  • Step 1: Choose the main flavour group(s);
  • Step 2: Check the blending science (which is quite elaborate — you have to buy the book);
  • Step 3: Pick your primary spices; and
  • Step 4: Add complexity (something we strongly encourage in general here at the Genealogical World of Phylogenetic Networks).
Farrimond provides five sets of principal data in the various chapters of his book entitled: "Spice science" (an introduction), "World of Spice" (which spices are used in which countries, including a recipe for a local spice blend), and "Spice Profiles" (bit of history, food to spice, blending science). For the 54 spices of the periodic table, they are:
  • chemical composition;
  • geography (uses as "signature", "supporting" and "supplementary spice" in various countries);
  • general characterization, such as "sweet", "pungent", "earthy", "complex";
  • food partners;
  • flavor category.
All of this information can be visualized using our beloved Neighbor-nets. Here, we will show only two: the flavour compounds network (based on information tabulated on pp. 214–217), and a network grouping countries by similarity in spice use. For those interested in the primary data used here; tabulated data, character matrices and raw networks can be found @ figshare.

Spice compounds

Humans are, and have always been, very diverse, and so is their food; and the spices are no exception. They contain numerous flavor-active substances, and Farrimond has picked for his periodic table of spices those that cover a huge range of flavor compounds. Accordingly, the Neighbor-net is star-like, as shown here.

Neighbor-net based on absence/presence of 117 chemical compounds that put spice in spices.

For estimating (Hamming) chemical 'inter-spice' distances, I used ternary ordered characters: "0" – absence; "1" – presence; "2" – flagged as major compound. Most flavor groups are chemically diverse; Mother Nature has many means to tickle our taste buds in a certain fashion. One exception are the spices of the "citrous terpenes" flavor group characterized by citral as the main flavor compound (otherwise only found as accessory compound in wattle, ginger and turmeric) accompanied by linalool (a compound found in many other spices and main compound of coriander).

Geographic patterns

To visualize the geographic differentiation of the spices, I treated the absence/presence of each spice in the local cuisines as an ordered character:
  • "3" – a signature spice, ie. a main spice in the local cuisines;
  • "2" – supporting spice accompanying many dishes;
  • "1" – supplementary spice, ie. a spice to round up or add more particularity;
  • "0" – absent, ie. not mentioned by Ferrimond.
In total, the matrix covers 93 spices for 44 countries/regions. Some spices are relatively ubiquitous, and hence are not informative about geographical variation, such as chili (37 out 44 cuisines, with 26 using it as a signature spice), garlic (25 uses as a signature spice) or ginger (19), while others are rare or geographically quite restricted. For instance turmeric is a signature spice of Indian cuisines and also of South Africa. During the British Empire, many Indians migrated to South Africa, and Indian traditions blended in with African and European; which makes South Africa an interesting place to visit and feast (as I can affirm first hand).

A global network based on the used spices. Colorization refers to the continental regions used by Ferrimond (chapter World of Spice, p. 20ff)

Not unexpectedly, the network shown here reflects geographic vicinity as well as rather ancient historical connections. For example, most aspects of European civilization have their origin in the Middle East, and spices reached medieval Europe via Arab sea-traders and the Silk Route; but there was also influence from elsewhere during the various the colonial epochs.

The Latin American cuisines are spice-wise most similar to those of Spain and Portugal within their regional groups, while Canada and the U.S.A. mix this tradition with that of other European countries such as Italy and France. Great Britain is distinct because His/Her majesties ruled many lands with a great variety of food and spices. In contrast to many other aspects of colonialism, the influence hence goes both ways.

The most unique spice cuisines are Indonesia, the home of many spices (and the reason why both the Portuguese and the Spanish set sail), and (tropical) western and central Africa. That the Horn of Africa graphs within the South Asian group is not surprising as it was for a very long time the sea-trade spice hub between Asia and Europe.

The is also a higher diversity seen in the Southeast Asian compared to the East Asian and South Asian countries and regions.

A bit of an oddball is the placement of the Caribbean cuisine, and especially the Creole kitchen, which is known for its spice mixing — in Farrimond's three-concepts characterization: "Adventurous | Bold | Spicy".

Conclusion

So, in case you want to spice up the coming holiday and festive season, Farrimond's book is an invaluable source for applied science, which has a simple primary use: filling the mouth with taste while filling the belly with ballast.

Monday, April 1, 2019

The Tree of Life (April 1)


The so-called Tree of Life is actually an anastomosing plexus rather than a divaricating tree, due to extensive interconnections between the cell and genome lineages during early single-cell evolution. These connections may have been caused by the process known as horizontal gene transfer.

Furthermore, the alleged Last Universal Common Ancestor may not have been a single coherent group, but may have been a mixture of quite different genotypes. After all, this supposed ancestor does not represent the origin of life, but was itself the end-product of an extensive prior evolutionary history.

These two basic points are illustrated in the following figure.


Happy April 1. For previous posts, see:

Monday, October 15, 2018

Jumping political parties in Germany's state elections


In one of last year's post, I showed a neighbour-net for the parties competing in the national election based on political distances inferred from the Wahl-O-Mat questionnaire (A network of political parties competing for the 2017 Bundestag). But Germany is a federal state, and since then, there has been a state election in Lower Saxony, and soon there will be two in Bavaria and Hesse. This is a good opportunity to make some network-based comparisons.

It is important to note that there are many political parties in Germany, not just two or three major parties, as in most English-speaking countries. State parliaments can therefore be composed of quite different mixtures of these groups.

The questionnaire

The Wahl-O-Mat is a political information service provided by the BPB, the "Bundeszentrale für politische Bildung". A group of youngsters assisted by scientists puts together a questionnaire of political theses (bullet points), which is sent out to the political parties competing in an election. When participating, as most parties do, they can either choose "agree", "disagree" or "neutral" to each statement.

As a voter, you can fill in the same questionnaire, mark some of the questions as "high importance" (which will be weighted stronger), and then choose (up to) eight parties for your personal comparison. The result will be a bar chart, showing you the percentage of your personal overlap with each of the parties. The BPB usually provide this service for all federal and state elections.

The problem I always have with this approach is that you don't get any graphical summary information about how the parties agree or disagree with each other to start from. In the worst case scenario, you could have 75% overlap with each of two parties who disagree with each other for 50% of the bullet points!

A straightforward solution to this shortcoming is to: code the questionnaire as a ternary matrix (0 = "disagree", 1 = "neutral", 2 = "agree"), treat them as ordered characters and determine the mean pairwise (Hamming) distances, and then infer a Neighbor-net based on the resulting distance matrix.

This is shown in the first figure, where each labeled point is one of the political parties. The two political extremes are also labeled.

The neighbour-net for the 2017 federal election Wahl-O-Mat questionaire (original GWoN post from last year, for those interested in further comments, extrapolations, and infographics, see related posts on my Res.I.P blog). The red split denotes the outgoing and new coalition parties (Merkel's "centre-right" CDU/CSU + "centre-left" SPD, the social-democrats), the blue split the most natural minor coalition partner for the CDU/CSU since the Kohl era, the "centrist" liberals (FDP). For the yellow split, see here (in German, but there is a Google translate button).

Political Compasses (for orientation)

Another graphical approach is to use a "political compass", instead. The original can be found at The Political Compass. Parties or persons are placed along two absolute (in the case of the original) axes: an economic left-right x-axis and a social authoritarian-libertarian (in the classic, not US sense, i.e. socially liberal) y-axis. (I encourage everyone to do the test for themselves. I was not surprised to see where I stand in the compass, but others have been. But first do the test, before browsing The Political Compass' highly interesting pages.)

Here's how this looks like for the main German parties (currently six) that also got seat in the newly formed Bundestag, with some orientation points: (in)famous historical figures and the presidential run-offs in the U.S. (most of this blog's readers sit in the U.S.) and France (because I live there, but can't vote).


Overlay of several Political Compass assessments regarding the last major elections in the Germany, France and the U.S. Grey dots, (in)famous figures that shaped modern world; the main German parties are in full colours (all on the economic right, except for the Left Party, Die Linke, which is where social-democrats where in the 70s, when the European model of welfare states was fully implemented). The position of U.S. (both right-authoritharian) and French (relaxed choice between Hitler, fascism, and Friedman, neo-liberalism) presidential run-offs is provided for comparison.


In Lower Saxony the "Niedersächsiche Landeszentrale für politische Bildung", the state's analog of the BPB, hired a Dutch company to provide a compass ("Wahlkompass") linked to the Wahl-O-Mat questionnaire for the 2017 state parliament election.

After filling in the questionnaire, you would be placed in the relative, compass, too. Note that (possibly to avoid giving due credit to The Political Compass) the y-axis has been flipped and modified to "progressiv" (progressive) and "konservativ" (conservative). Another reason may be that classifying parties as authoritarian is a bit tricky for a state-funded German institution for historical reasons.

The red marker indicates an all-neutral voter. The placement is a relative one, hence no grid.

The relative positions of the liberals (FDP), the right-wing populists of the "Alternative for Germany" AfD (blue symbol at the bottom), the CDU, SPD and Left Party (Linke) all agree with The Political Compass' assessment of their federal-level counterparts. However, the Green Party is placed much closer to the Left Party on the social y-axis. This has two possible reasons:
  1. The Political Compass bases its assessment on party programs and actual government politics, and the Greens are part of quite a few state governments, and are the major ruling party in Baden-Würrtemberg, Germany's economically strongest state.
  2. There can be a difference between progressive and libertarian. The Greens are progressive by supporting e.g. equal rights for women or LGBT and other aspects of modern society, but aim to achieve these goals by imposing legislation, which is authoritarian. On the other hand, conservatism – keeping the status-quo – is mutually linked to authoritarian politics. Any social movement will change society, or challenge the status-quo, and hence needs to be constrained or suppressed.
Another difference to the Wahl-O-Mat is that – similar to the questionnaire of The Political Compass – the Lower Saxony Wahlkompass allows six possible answers to each bullet point: "totally disagree" (which I scored as "0"), "disagree" (1), "neutral" (2), "agree" (3), "totally agree" (4), and "No opinion" (?). The latter is a quite useful, and would be an useful add-on also to the Wahl-O-Mat, because there is a difference whether one is neutral on a matter (could live with it) or has no opinion on it (don't bother). The more refined scale also allows us to treat the answers as ordered multistate characters when inferring the distance matrix, resulting in a more resolved network.

This is shown in the next figure.

Neighbour-net based on the Niedersachsen Wahlkompass questionaire (full post, in German).

As you can see, the political-distance-based Neighbor-net splits graph captures the similarity of the political parties to each other quite well. Now the only thing left to do is to add yourself (as a voter or interested third party) to the matrix and then re-infer the Neighbor-net. The basic files to do so (NEXUS-formatted matrices) for this, upcoming (Bavaria, Hesse), and future elections can be found on figshare

Comparing different elections

As a federal state, Germany has a long tradition of within-party diversity. Most commonly known is that the "Schwesterparteien" (sister parties) CSU and CDU disagree in not a few points. The CSU is a Bavarian endemit, while the CDU covers rest of Germany, including the former East Germany — see also my post [in English] on German and French party genealogies after World War II). Hence, they are treated separately by The Political Compass for the 2017 election. The CSU is in general (much) less neo-liberal than the CDU (placed left of it), but (often) more authoritarian, cultivating conservative views. But neither is the CDU a homogeneous formation when compared from state to state, nor are any of the other parties. The following splits graphs, based on the various Wahl-O-Mat questionnaires, illustrate this quite well.

Let's start with the upcoming state elections in Bavaria and Hesse. Here are the two Neighbor-nets.

Reduced Neighbour-nets for Bavaria and Hesse. Parties competing only in one of the states not included.

We note that some parties keep their position relative to each other. For example, the most severe political antagonists in both states are the Left P. (left-libertarian) and the LKR (distinctly right-authoritarian; political distance PD > 1.5).

The latter is a small party collecting the original founder(s) of the AfD. The AfD is usually described as a (far-)right populist party, but started as a Euro-sceptic conservative and distinctly neo-liberal party. This is well captured in the splits graphs, with the LKR placed either as sister to the Bavarian (less neo-liberal) CSU or at a box connecting the (less authoritarian) CDU with the (more left) AfD. Other small parties (Humanist Party, the animal-rights party P!MUT, and the ÖDP, a conservative-green party) are equally stable.

The "right" is more tree-like in Bavaria than in Hesse because the so far all-ruling CSU tries (tried) to follow an old maxim of Franz-Josef Strauß, who said that there should never be a political party right (i.e. more conservative and nationalist) of the CSU in the Bavarian parliament — hence, it is much more similar to the right-wing populist AfD than the Hesse CDU.

In Hesse, the CDU ruled the state for the last four years with the Greens, which explains the position of the Green Party in both graphs. Being the opposition, and strongly opposing CSU policies (both economically and socially), they are much closer to the Left Party in Bavaria, while occupying a position between their coalition partner CDU and the "left" parties (Left P., SPD) in Hesse.

In Hesse, the Green Party takes effectively the position that in Bavaria is filled by the Pirate Party — the latter had a surge couple of years ago entering several state parliaments but now is back to 2% or less. With the Greens moving right, the Pirate Party Hesse remains more similar to the classical "left" of the political spectrum.

Another jumper is "Die PARTEI". This is hardly surprising, because they answer some questions in the Wahl-O-Mat by flipping a coin, or select the one allowing them to come up with most satiric arguments for their choice (sometimes not so different from those of certain party policies!).

Compared to the last federal election, the federal-state discrepancy in official party policies is striking, and this is well represented in their answers to the Wahl-O-Mat questionaires.

Same-scaled, taxon-pruned Neighbour-nets. The "Big-6" (7 in Bavaria), parties either already sitting in the parliaments or with chance to crack the 5%-hurdle in upcoming elections, in bold. Arrows indicate current ruling coalitions/government parties.

Being a frequent junior partner of the CDU/CSU, but the opposition in Bavaria (for decades) and Hesse (once the dominant party), the federal SPD is drawn much more to the "right" than its state counterparts. But this holds also for the federal CDU in the opposite way, and hence the FDP becomes the closest (still distant) "relative" of the AfD, which campaigned 2017 with a more neo-liberal program than it does now in Bavaria and Hesse (a necessity for populistic parties, as anyone likes free stuff).

The "blue-green" ÖDP comes closer to the Greens, because ecology-related bullet points took a more prominent place in the federal election Wahl-O-Mat. The "net-gap" in between them, and the edges shared by the ÖDP with the AfD or other parties of the "right" (FW, CDU/CSU, FDP), highlight their differences in social policies.

In Lower Saxony fewer parties competed, so let's prune the taxon set further. The Lower Saxony Neighbor-net has a different scale, because a more differentiated answer was possible. Usually, two parties oppose each other on all points, the maximum theoretically possible distance between two parties in the Lower Saxony matrix would be 4, i.e. they would strongly disagree on all bullet points that have no missing data for either one.

Again, parties in (last year's elections) or with chances to enter parliament (upcoming) in bold, and arrows indicating current or leaving government parties/coalitions.

Note how the Green Party and the SPD are placed with respect to the third main party from the traditional "left", the Left Party, and the FDP in comparison to CDU/CSU and AfD, forming the parliamentary "right". In Lower Saxony, the largest (SPD) and second-largeste (CDU) party followed the example of the Bund. The outgoing SPD-Greens coalition lost its tight majority; and although a CDU-FDP-AfD coalition would have had a majority and quite an overlap, involving the AfD in governments has been considered a no-go in Germany to this point (for all involved parties for different reasons).

Also in the Bundestag, the "right" would have a majority, but the SPD is close enough, and obviously Merkel's preferred partner. The polls for the Sunday elections (yesterday, when you read this) predict the CSU will lose its absolute majority. Also, here the natural partner (AfD) will be a no-go, so Bavaria will head towards interesting coalition talks with the Greens, being second in the polls. This would be the first time since 1958. The black-green Hesse government is also likely to lose its majority. However, adding the FDP (called "Jamaica coalition", because of the traditional colors of the three parties) should be no great deal, given its position between the current coalition partners.

Links

The post introducing Neighbor-nets to explore Wahl-O-Mat questionnaires can be found here.

More infographics (including plots of each bullet point on the splits graphs) revolving around political distances expressed in election questionnaires, or politics in general, can be found in my Res.I.P. posts — flagged as "Bundestagswahl" (federal elections, in German or English), "Landtagswahlen" (usually in German), "phylo-networks" (usually in English) and "politics" (again mixed).


Related data are included in a figshare fileset (open data; CC-BY licence), which may get updated when another election happens.

Tuesday, September 26, 2017

Some desiderata for using splits graphs for exploratory data analysis


This is the 500th post from this blog, making it one of the longest-running blogs in phylogenetics, if not the longest. For example, among the phylogenetics blogs that I have previously listed, there has been only one post so far this year that has not been about a specific computer program.

Our first blog post was on Saturday 25 February 2012; and most weeks since then have had one or two posts. We have covered a lot of ground during that time, focusing on the use of network graphs for phylogenetic data, broadly defined (ie. including biology, linguistics, and stemmatology). However, we have not been averse to applying what are know as "phylogenetic networks" to other data, as well; and to discussing phylogenetic trees, when appropriate.


For this 500th post, I though that I should focus on what seems to me to be one of the least appreciated aspects of biology — the need to look at data before formally analyzing it.

Phylogeneticists, for example, have a tendency to rush into some specified form of phylogenetic analysis, without first considering whether that analysis is actually suitable for the data at hand. It is therefore wise to investigate the nature of the data first, before formal analysis, using what is known as exploratory data analysis (EDA).

EDA involves getting a picture of the data, literally. That picture should be clear, as well as informative. That is, it should highlight some particular characteristics of the data, whatever they may be. Different EDA tools are likely to reveal different characteristics — there is not single tool that does it all. That is why it is called "exploration", because you need to have a look around the data using different tools.

This is where splits graphs come into play, perhaps the most important tool developed for phylogenetics over the past 50 years.

Splits graphs

Splits graphs are the best current tools for visualizing phylogenetic data. They were developed back in 1992, by Hans-Jürgen Bandelt & Andreas Dress. These graphs had a checkered career for the first 15 years, or so, but they have become increasingly popular over the past 10 years.

It is important to note that splits graphs are not intended to represent phylogenetic histories, in the sense of showing the historical connections between ancestors and descendants. This does not mean that there is no reason why should not do so, but it is not their intended purpose. Their purpose is to display phenetic data patterns efficiently. In this sense, calling them "phylogenetic networks" may be somewhat misleading — they are data-display networks, not evolutionary networks.

A split is simply a partitioning of a group of objects into two mutually exclusive subgroups (a bipartition). In biology, these objects can be individuals, populations, species, or even higher taxonomic groups (OTUs); and in the social sciences, they might be languages or language groups, or they could be written texts, or verbal tales, or tools or any other human artifacts. Any collection of objects will contain a set of such splits, either explicitly (eg. based on character data) or implicitly (eg. based on inter-object distances). A splits graph simultaneously displays some subset of the splits.

Ideally, a splits graph would display all of the splits; but for realistic biological data this is not likely to happen — the graph would simply be too complex for interpretation. So, a series of graphing algorithms have been developed that will display different subsets of the splits. That is, splits graphs actually form a family of closely related graphs. Technically, the Median Network is the only graph type that tries to display all of the splits; however, the result will usually be too complicated to be useful for EDA.

So, these days there is a range of splits-graph methods available for character-based data (such as Median Networks and Parsimony Splits), distance-based data (such as NeighborNet and Split Decomposition), and tree-based data (such as Consensus Networks and SuperNetworks). In population genetics, haplotype networks can be produced by methods that conceptually modify Median Networks (such as Reduced Median Networks and Median-Joining Networks).

The purpose of this post, however, is not to discuss all of the types of splits graphs, but to consider what computer tools we would need in order to successfully use this family of graphs for EDA in phylogenetics.


Desiderata

The basic idea of EDA is to have a picture of the data. So, any computer program for EDA in phylogenetics needs to be able to quickly and easily produce the splits graph, and then allow us to explore and manipulate it interactively.

To do this, the features listed below are the ones that I consider to be most helpful for EDA (and thanks to Guido Grimm and Scot Kelchner for making some of the suggestions). It would be great to have a computer program that implements all of these features, but this does not yet exist. SplitsTree has some of them, making it the current program of choice. However, there is quite some way to go before a truly suitable program could exist.

Note that these desiderata fall into several groups:
  1. evaluating the network itself
  2. comparing the network to other possible representations of the data
  3. manipulating the presentation of the network
It is desirable to be able to interactively:
  • specify which supported splits are shown in the graph— eg. show only those explicitly supported by character
  • list the split-support values
  • highlight particular splits in the graph — eg. by clicking on one of the edges
  • identify splits for specified taxon partitions (if the split is supported) — this is the complement to the previous one, in which we specify the split from a list of objects, not from the graph itself
  • identify which splits are sensitive to the model used — eg. different network algorithms
  • identify which edges are missing when comparing a planar graph with an n-dimensional one — this would potentially be complex if one compares, say, a NeighborNet to a Median Network
  • map support values onto the graph (ie. other than split support, which is usually the edge length) — eg. bootstrap values
  • evaluate the tree-likeness of the network — ie. the extent of reticulation needed to display the data
  • map edges from other networks or trees onto the graph — this allows us to compare graphs, or to superimpose a specified tree onto the network
  • find out if the network is tree-based, by breaking it down into a defined number of trees —along with a measure for how comprehensive these trees capture the network
  • create a tree-based network by having the network be the super-set of some specified tree — eg. the NeighborNet graph could be a superset of the Neighbor-Joining tree
  • manipulate the presentation of the graph — eg. orientation, colours, fonts, etc
  • remove trivial splits — eg. those with edges shorter than some specified minimum, assuming that edge length represents split support
  • plot characters onto the graph — possibly next to the object labels, but preferably on the edges if they are associated with particular partitions
  • examine which subsets of the data are responsible for the reticulations — eg. for character-based inputs this might a sliding window that updates the network for each region of an alignment, or for tree-based inputs it might be a tree inclusion-exclusion list.
Other relevant posts

Here are some other blog posts that discuss the use of splits graphs for exploring genealogical data.

How to interpret splits graphs

Recognizing groups in splits graphs

Splits and neighborhoods in splits graphs

Mis-interpreting splits graphs

Tuesday, July 12, 2016

Coal — trees and networks of knowledge


The Tree of Knowledge is a well-known concept, and the tree can indeed be used to arrange information. One possible use is to describe the relationships of derivative products (ie. the chemical derivatives of other substances). Indeed, these can be viewed as having a "phylogeny", since the processing follows a time sequence.

The U.S. Geological Survey (in the U.S. Department of the Interior) has provided one such example in Geological Survey Circular 1143 Coal — a Complex Natural Resource. The centerfold of that publication shows:
Coal byproducts in tree form showing basic chemicals as branches and derivative substances as twigs and leaves. [Modified from an undated public domain illustration provided by the Virginia Surface Mining and Reclamation Association.]

However, a tree is a simplification of a network, and the network can thus show more information. In this case, the same information has previously been illustrated using a reticulating network, not a tree.

In the 7th edition (1924) of Joseph Meyer's Große Conversations-lexikon für gebildete Stände (first edition 1840-1855) there is a Steinkohle: Stammbaum der Steintohlenerzeugnisse [Coal: family tree of coal products]:


This has three reticulations, showing coal products produced as a result of combining two different processing routes. This is thus a hybridization network.

Thanks to the Trees of Knowledge page (by Paul Michel) of the "Encyclopedias as Indicators of Change in the Social Importance of Knowledge, Education and Information" web site, for pointing out this unexpected use of trees of knowledge.

Wednesday, October 7, 2015

The Wave Theory: the predecessor of network thinking in historical linguistics


Dendrophilia

It has been mentioned in a couple of previous blogposts that tree-thinking started rather early in historical linguistics (Morrison 07/2013 and Morrison 11/2012).

Although he was not the first to draw language trees, it was August Schleicher (1821-1866) who made tree-thinking quite popular in linguistics with his two papers published in 1853 (1853a and 1853b). Note that there was no notable influence by Darwin here. It is more likely that Schleicher was influenced by stemmatics (manuscript comparison, Hoenigswald 1963: 8); and even today, historical linguistics has certain features that resemble manuscript comparison much more closely than evolutionary biology. It seems that Schleicher's enthusiasm for the drawing of language trees had quite an impact on Ernst Haeckel (1834-1919), since – as Schleicher pointed out himself (Schleicher 1863) – linguistic trees by then were concrete and not abstract like the one Darwin showed in his Origins (Darwin 1859).


Dendrophobia

Schleicher's tree-thinking, however, did not last very long in the world of historical linguistics. By the beginning of the 1870s Hugo Schuchardt (1842-1927) and Johannes Schmidt (1843-1901) published critical views, claiming that vertical descent was not only what language evolution is about (Schmidt 1872, Schuchardt 1870). Schuchardt was (at least in my opinion) really concrete and observant in his criticisms, especially pointing to the problem of borrowing between very closely related languages, which might deeply confuse the phylogenetic signal:
We connect the branches and twigs of the family tree with countless horizontal lines and it ceases to be a tree. (Schuchardt 1870: 11, my translation)
While Schuchardt's observations were based on his deep knowledge of the Romance languages, Schmidt drew his conclusions from a thorough investigation of shared homologous words in the major branches of Indo-European. What he found here were patterns of words that were in a strong patchy distribution, with many gaps in certain languages and only a few (if at all) patterns that could be found in all languages. One seemingly suprising fact was, for example, that Greek and Sanskrit shared about 39% of homologs (according to Schmidt's count, see Geisler and List 2013), Greek and Latin shared 53%, but Latin and Sanskrit only 8%. Assuming that Greek and Latin had a common ancestor, Schmidt found it very difficult to explain how the similarities between the two languages with Sanskrit could be so different (Schmidt 1872: 24). Furthermore, this pattern of patchy distributions seemed to be repeated in all branches of Indo-European that Schmidt compared in his investigation. Schmidt thus concluded:
No matter how we look at it, as long as we stick to the assumption that today's languages originated from their common proto-language via multiple furcation, we will never be able to explain all facts in a scientifically adequate way. (Schmidt 1872: 17, my translation).
Unfortunately, Schmidt did not stop with this conclusion but proposed another model of language divergence instead of the family tree model:
I want to replace [the tree] by the image of a wave that spreads out from the center in concentric circles becoming weaker and weaker the farther they get away from the center. (Schmidt 1872: 27, my translation)
Ever since then, this new model, the so-called wave theory (Wellentheorie in German) lurks around textbooks in historical linguistics, and confuses especially those who are not primarily trained in historical linguistics. What is the wave theory in the end? How could it replace the tree? While Schmidt did not give a visualization in his book from 1872, he gave one 3 years later (Schmidt 1875: 199):


What we can see from this figure is that we can't see anything: It displays languages in a pie-chart diagram in a quasi-geographic space. No information regarding ancestral states of the languages is given, and no temporal dynamics are shown. I find Schmidt's descriptions of the wave theory hard to understand in their core. He doesn't seem to ignore that evolution has a time dimension, but he seems to deliberately neglect it when drawing his waves.

Other scholars, like Hirt (1905), Bloomfield (1933), Meillet (1908), or Bonfante (1931), propososed similar and alternative ways to visualize Schmidt's wave, as shown in the image below. In contrast to the language trees which – after Schleicher's initial rather "realistic" tree drawings – quickly began to be schematized in historical linguistics, the correct way to draw a wave has remained a mysterium up to today.


Problems with Waves and Trees

When reading Schmidt's book from 1872 and also inspecting his data, certain fallacies in his argumentation become obvious. Firstly, he claims that the low amount of shared homologs between Sanskrit and Latin would be a problem for a family tree theory — however, this is of course no problem, as long as we do not assume that the loss of words follows an evolutionary clock. Furthermore, Schmidt underestimated the epistemological aspect of our knowledge. When comparing the three languages in alternative counts of more recent etymological databases (see Geisler and List 2013 for details), the scores change rapidly, with Latin and Greek sharing 40%, Greek and Sanskrit sharing 39% and Latin and Sanskrit sharing (already) 21%. Although no complete account of Schmidt's data is available in digital form, I think we can assume that the data that forced Schmidt to assume that there is no tree behind the Indo-European languages would not scare off an evolutionary dendrophilist. Whether the tree that the different phylogenetic frameworks would present us from Schmidt's data is a tree corresponding to any reality of Indo-European language formation is another question, but the data may well be quite tree-like, despite what Schmidt saw in it.

A further problem of the wave theory is that people contrast it with the family tree model. This does not seem to be justified, since -- as we can see from the visualizations shown above -- the wave theory ignores the temporal dimension of divergence and convergence. In this sense, it is a pure data display model, similar to a data-display network (Morrison 2011: 5-9) to which some geographical information has been added. As long as the wave theory shows only similarities between taxonomic units based on some kind of underlying data, it is neither a "theory" nor a hypothesis. It is no opponent of the family tree, since it serves a completely different purpose.

What Schuchardt already mentioned, and what Schmidt might have been looking for, was the idea of phylogenetic networks: if we cannot ignore the fact that languages exchange material laterally as well as they inherit it vertically, we "connect the branches and twigs of the family tree with countless horizontal lines and it ceases to be a tree" (Schuchardt 1870: 11).

References
  • Bloomfield, L. (1933 [1973]). Language. London: Allen & Unwin. 
  • Bonfante, G. (1931). “I dialetti indoeuropei”. Annali del R. Istituto Orientale di Napoli 4, 69–185.
  • Darwin, C. (1859). On the origin of species by means of natural selection, or, the preservation of favoured races in the struggle for life. Electronic resource. Online available under: http://www.nla.gov.au/apps/cdview/nla.gen-vn4591931. London: John Murray.
  • Geisler, H. und J.-M. List (2013). “Do languages grow on trees? The tree metaphor in the history of linguistics”. In: Classification and evolution in biology, linguistics and the history of science. Concepts – methods – visualization. Hrsg. von H. Fangerau, H. Geisler, T. Halling und W. Martin. Stuttgart: Franz Steiner Verlag, 111–124.
  • Hirt, H. (1905). Die Indogermanen. Ihre Verbreitung, ihre Urheimat und ihre Kultur. Bd. 1. Strassburg: Trübner. Internet Archive: dieindogermaneni01hirtuoft.
  • Hoenigswald, H. M. (1963). “On the history of the comparative method”. English. Anthropological Linguistics 5.1, pp. 1–11. URL: http://www.jstor.org/stable/30022394.
  • Meillet, A. (1922 [1908]). Les dialectes Indo-Européens. Paris: Librairie Ancienne Honoré Champion. Internet Archive: lesdialectesindo00meil.
  • Morrison, D. A. (2011). An introduction to phylogenetic networks. Uppsala: RJR Productions.
  • Schleicher, A. (1853a). “Die ersten Spaltungen des indogermanischen Urvolkes”. Allgemeine Monatsschrift für Wissenschaft und Literatur, 786–787.
  • Schleicher, A. (1853b). “O jazyku litevském, zvlástě na slovanský. Čteno v posezení sekcí filologické král. České Společnosti Nauk dne 6. června 1853”. Časopis Čsekého Museum 27, 320–334. URL: http://books.google.de/books?id=cLMDAAAAYAAJ.
  • Schleicher, A. (1863). Die Darwinsche Theorie und die Sprachwissenschaft. Offenes Sendschreiben an Herrn Dr. Erns Haeckel. Weimar: Hermann Böhlau. ZVDD: urn:nbn:de:bvb:12-bsb10588615-5.
  • Schmidt, J. (1872). Die Verwantschaftsverhältnisse der indogermanischen Sprachen. Weimar: Herman Böhlau.
  • Schmidt, J. (1875): Zur Geschichte des Indogermanischen Vokalismus. Weimar: Hermann Böhlau.
  • Schuchardt, H. (1870 [1900]). Über die Klassifikation der romanischen Mundarten. Probe-Vorlesung, gehalten zu Leipzig am 30. April 1870. Graz. URL: http://schuchardt.uni-graz.at/cgi-bin/print.cgi?action=show&type=pdf&id=724.

Monday, September 7, 2015

The Tree of Trees is a network


A couple of years ago this paper appeared:
Marie Fisler and Guillaume Lecointre (2013) Categorizing ideas about trees: a tree of trees. PLoS One 8: e68814.
The authors note:
We study the history of the use of trees in systematics to represent the diversity of life from 1766 to 1991. We apply to those ideas a method inspired from coding homologous parts of organisms. We discretize conceptual parts of ideas, writings and drawings about trees contained in 41 main writings; we detect shared parts among authors and code them into a 91-characters matrix and use a tree representation to show who shares what with whom. In other words, we propose a hierarchical representation of the shared ideas about trees among authors: this produces a "tree of trees."
The authors continue:
Why should we choose the tree that maximizes contiguity of identical character states (i.e. the most parsimonious tree) and not another one? [That is,] why should we choose the tree maximizing consistency among characters? ... Maximizing consistency among characters is just offering a rational interpretation of the character distribution across the compared entities, by using a hierarchy from the most general to the most particular. We prefer this hierarchical representation over networks in a first step because it is what we need to test for consistency of previous categories, propose new ones and exhibit sharings (even homoplastic ones if needed).
Unfortunately, "the parsimony analysis provides 279 trees of 378 steps, with a C.I. of 0.24 and a R.I. of 0.61". In other words, there is very little consistency among the characters; and there is very little hierarchical structure in the data, as shown by my NeighborNet analysis of the same data.


The conclusion, that "we consider that networks are not useful to represent shared ideas at the present step of the study" seems rather dubious. The tree-makers do not generally form groups, but share phylogenetic  ideas in a more haphazard manner. Nevertheless, the network neighborhoods shared by the various writers sampled do actually show quite clearly who shared tree ideas with whom.

It is interesting that the tree ideas are shared in a network manner, rather than a tree, as this indicates that there are no really clear schools of phylogenetics represented. Indeed, the writers are inter-mingled in a way that shows no development of tree ideas over time, although the various neighborhoods do tend to associate writers of similar vintage. There are no real surprises among the compositions of these neighborhoods.

Perhaps the most interesting aspect of the network, as shown, is that both Wallace and Haeckel changed their ideas about trees through time, whereas most of the other writers were more "of their time".

It is also worth noting that Buffon, Duchesne and Rühling all illustrated reticulated networks, not trees; but this is not one of the characteristics included in the dataset. The paper's authors do acknowledge that Buffon's diagram is "a tree-like extension of maps", but they fail to mention that Linnaeus also likened biological relationships to a map (not a tree), but instead treat him as part of the outgroup (see An outline history of phylogenetic trees and networks).

Wednesday, June 3, 2015

"Basal" and "crown" are dirty words in phylogenetics


There are at least two misleading expressions that one very commonly encounters in the professional phylogenetics literature: "basal branch of the tree", and "derived species".

The first expression is used to refer to an unbranched lineage arising near the common ancestor, when compared to a more-branched lineage. For example, in the first diagram below we might say that taxon A is on a "basal branch", whereas taxon B is not. The taxa associated with taxon B are then referred to as the "crown" of the tree. But, how can one lineage be more basal than another? After all, both lineages connect to the "base" of the tree at the same point. To claim that one is basal and the other not is like saying that one brother is more basal than another in a family tree just because he has fewer children!


The second expression refers to a species that has more "derived" characters than another. For example, in the diagram we might say that taxon B is more derived than taxon A. Characters change from ancestral to derived through time (eg. scaly skin covering is ancestral while fur is derived, because the latter arose later in time). However, this does not make any species more derived. It is the characters that are derived not the species — each species has a combination of ancestral characters and derived ones (including humans).

These issues seem to arise from the tree iconography. Some people seem to conceptualize this as a pine tree rather than a bush (as drawn by Charles Darwin in the Origin). A pine tree, indeed, does have basal branches and a crown. Here is an example from a sign in my local botanical garden, which tries to explain plant phylogenetic relationships to the general public. This tree does, indeed, have basal branches and a distinct crown.


This issue seems to have started with Ernst Haeckel in the late 1800s. Haeckel's first phylogenies (see Who published the first phylogenetic tree?) were drawn as multi-branched bushes, rather similar to the diagram that Darwin himself had published. However, Haeckel then veered away from this approach when explicitly discussing the evolution of humans. Here, he drew a tree with a distinct central trunk and much smaller side-branches (presumably modeled on an oak tree, rather than a bush). This image emphasizes one particular lineage at the expense of the others, because there is one taxon obviously sitting at the crown of the tree while the others are relegated to side-branches.

E. Haeckel (1874) Anthropogenie oder Entwickelungsgeschichte
des Menschen.
Engelmann, Leipzig.

This approach to drawing a phylogeny can be used to put any chosen organism at the crown of the tree, not just human beings, as illustrated by the following diagram from James Scott (which looks like it is actually modeled on a pine tree). This is a fundamental characteristic of a phylogeny — it can be drawn so that any part of the diagram is at the crown. However, to be accurate it should always be drawn so that no one lineage is emphasized over any other one — there should be no taxa sitting at the crown.

J.A. Scott (1986) The Butterflies of North America:
a Natural History and Field Guide.
Stanford University Press, Stanford.

Distorted images occur in several ways in modern evolutionary biology. This topic has received considerable attention in the literature, and there are a number of very readable expositions of various parts of it. Here is a brief list.

Gregory T.R. (2008) Understanding evolutionary trees. Evolution: Education and Outreach 1: 121-137.

O'Hara R.J. (1992) Telling the tree: narrative representation and the study of evolutionary history. Biology and Philosophy 7: 135-160.

Crisp M.D., Cook L.G. (2005) Do early branching lineages signify ancestral traits? Trends in Ecology and Evolution 20: 122-128.

Krell F.-T., Cranston P.S. (2004) Which side of a tree is more basal? Systematic Entomology 29: 279-281.

Omland K.E., Cook L.G., Crisp M.D. (2008) Tree thinking for all biology: the problem with reading phylogenies as ladders of progress. BioEssays 30: 854-867.

Sandvik H. (2009) Anthropocentrisms in cladograms. Biology and Philosophy 24: 425-440.

Monday, April 20, 2015

Domestication networks are complicated


Phylogenetic networks were developed as a professional tool for displaying complicated evolutionary histories. However, this does no mean that such networks cannot be used elsewhere.

As an example, Pete Buchholz produces drawings of dinosaurs as the artist Ornithischophilia at the DeviantArt web site. Among these drawings are some phylogenies, and two of them are networks.

The first one is labelled Citrus is complicated, and refers to the origin of citrus cultivars.


The phylogenetic tree at the left is sourced from the American Journal of Botany, while the network at the right is from information in Wikipedia. The combination of the two appears to be original to the artist. The network is read from left to right — for example, the Limequat is a hybrid of the Key Line and the Kumquat. Compared to the original Wikipedia text, the picture speaks a thousand words.

The second network is labelled Apples are complicated, and refers to the origin of some of the apple cultivars.


No source is given for the information, but I assume that it also comes from Wikipedia. Note that, as before, the network is read from left to right, but this time there is a time scale at the top. The artist refers to it as a "spaghetti diagram", and notes that:
Colors are based on the major parent that the "story" revolves around; purple for Honeycrisp, Yellow for Golden Delicious, Red for Jonathan, Maroon for Red Delicious, Orange for Cox's Orange Pippin, Teal for McIntosh, Green for Granny Smith, and Blue for Topaz.

Sunday, January 4, 2015

Complicated network visualizations


Networks are visually more complicated than trees, because there are extra edges representing reticulate relationships. Technically this means that some of the nodes have in-degree >1, and that there are one-to-many connections among these nodes. This can create visual clutter. I recently presented one simple way that might alleviate this (Circular phylograms for phylogenetic networks).

Another possibility is to add to the network what are called meta-nodes. These meta-nodes represent groups of nodes, so that the edges between the meta-nodes and the other nodes can represent different types of relationship. This reduces the one-to-many connections in the graph.

As pointed out by Elijah Meeks at the Digital Humanities Specialist blog, pedigrees represent a neat example of this concept. In this example, there are several types of traditional relationship that can be represented: husband, wife and child. Since these relationships are explicitly shown (ie. the direction of the relationship is explicitly shown), the figure can be drawn unrooted.


The example shown here (reproduced from Meeks' post) has the meta-nodes in grey, each representing a family. These nodes are unlabeled, while the person-nodes are labeled with the person's name and noble title. Females have pink nodes, and males blue ones. The edges connecting them to the grey nodes are colour-coded as: blue = husband, pink = wife, orange = child.

So, for example, the right-hand family node indicates that Charles I and Henrietta Maria were husband and wife, and that they had three children: Mary Henrietta, James II and Charles II.

In this case, the reduction in one-to-many connections does make the relationships more clear, so that interpretation is easy. However, it potentially makes the network more complicated (as Meeks notes) because of "just how tangled up certain families can be" — adding the extra meta-nodes exacerbates the tangling. Meeks provides another example in his blog post.

Wednesday, December 10, 2014

Circular phylograms for phylogenetic networks


Phylogenetic trees have been drawn in many formats, including what are known as vertical, horizontal, multidirectional, radial, hyperbolic (restricted to interactive trees) and figurative (ie. looking like an actual tree). Radial, or circular, trees are used when there are many taxa — the root is placed at the centre, and the increasing length of the circumference is used to display the increasing number of nodes. An example is shown in the earlier blog post Why do we still use trees for the dog genealogy?

Here, I point out that the radial format also makes it much easier to display reticulations in an evolutionary network. My example comes from The Nam Family: a Study in Cacogenics (Arthur H. Estabrook and Charles B. Davenport. 1912. Eugenics Record Office Memoir No. 2. Cold Spring Harbor, NY). This book involves, among other things, a pedigree study of an extended family in New York state, with a large amount of inbreeding. Two large pedigrees are presented, representing the genealogies of two different parts of the extended family in a place called "Nam Hollow".


One of these pedigrees is drawn in the vertical format, with the earliest generations at the top. The other pedigree is drawn in the radial format, with the earliest generations in the centre.


The difference in choice of format seems to be a result of the fact that in the second case there is extensive reticulation within the earlier generations, and this is obviously much easier to display in the centre of a circle, with increasing circumference for the large number of descendants. Nevertheless, the first pedigree would also be easier to read in the radial format. It is surprising that this format is not used more often.

Eugenics

The study under discussion was one of several projects that arose from the eugenics movement in the USA. The reports include Hill Folk: Report on a Rural Community of Hereditary Defectives (Davenport. 1912), The Kallikak Family: a Study in the Heredity of Feeblemindedness (Henry Herbert Goddard. 1912), and The Jukes (Estabrook. 1916). Eugenics arose in the wake of research on Mendelian inheritance, applying it to the study of human societies. This was thus the initial phase of what we now call the study of human genetics, and large amounts of detailed data were collected in many parts of the world.

Unfortunately, the researchers greatly over-estimated the role of genetics in human behavior, attributing many of the by-products of poverty to "constitutional" characteristics. In particular, many of what we now consider to be environmental aspects of poverty were attributed to inbreeding (which is another feature common in poor communities). This is in contrast to previous studies of the same US families, such as that of Richard L. Dugdale (1874-1877. The Jukes: a Study in Crime, Pauperism, Disease and Heredity), which placed more emphasis on the environment as a factor in criminality, disease and poverty.

So, the eugenics researchers tended to collect data that we would now consider to be seriously biased, where the observations are inextricably confounded with interpretations. For example:
V-166 [person #166 in generation V] is a temperate, sociable, and licentious man, who married his cousin, V-183, a Nam-like, stolid shy, reticent, suspicious harlot. They had eight children ... All have the characteristic slowness in movement, and indolence and lack of ambition of the Nams. They vary little except that some are more reticent and shy than others, and there is some licentiousness. All are illiterate, and probably without the capacity for learning from books. VI-257, who is especially careless, disorderly, and shy, had an illegitimate son, who died of infantile diarrhea. Here again we see the uniformity resulting from inbreeding.
What was worse, the eugenics movement did not stop at mere scientific enquiry. They indulged, with governmental support, in what they politely called "social prophylaxis". For example:
Although our primary aim is the present the bare facts [!] we cannot altogether neglect the natural inquiry as to the proper treatment of such condition as we have described. Various possible modes of treatment will be considered.
First there is the method of laissez faire. The Nam community takes care of itself to a large extent; why do anything? Unfortunately, the community is not wholly isolated. From it families have gone to Minnesota and other points in the West and there formed new centers of degeneration. Harlots go forth from here and become prostitutes in our cities. The tendency to larceny, burglary, arson, assault, and murder have gone, with the wandering bodies in which they are incorporated, throughout the State and to great cities like New York. Nam Hollow is a social pest spot whose virus cannot be confined to its own limits. No state can afford to neglect such a breeding center of feeble-mindedness, alcoholism, sex-immorality, and infanticide as we have here. A rotten apple can infect the whole barrel of fruit. Unless we abandon the ideal of social progress throughout the State we must attempt an improvement here.
The authors seem to be almost foaming at the mouth by the end of their spiel. Option two, "improving the conditions of the persons in the Hollow" is dismissed as "supplying a veneer of good manners to a punky social body." Option three, "scattering the people" is seen as "fraught with danger". Nevertheless, this was the option preferred by the British government in the late 1700s and early 1800s, when they founded penal colonies in Australia for crimes like "stealing five cheeses". The assumption that poverty is hereditary certainly has a long history, and a wide geographical spread.

Option four, preventing the people from breeding, by isolating them, is the recommended one. The final note is: "Of course, asexualization would produce the same result; but it is doubtful if public sentiment would favor such treatment, quite within the province of the State though it be." We now know this to be a very naive conclusion. By the 1930s many western countries had active compulsory sterilization programs (see Wikipedia); and many still do, including states of the USA.

However, eugenics did have positive outcomes, among the obvious negative ones. For example, the first demonstration of simple Mendelian inheritance of a human medical condition concerned Unverricht-Lundborg disease, a form of epilepsy. This was first reported in 1891 by Heinrich Unverricht, in Estonia. However, it was Herman Lundborg, a Swedish physician, who first identified its genetic component (1903. Die progressive Myoclonus-Epilepsie (Unverricht’s Myoclonie). Almqvist and Wiksell, Uppsala).

He traced the ancestry of 17 affected people in one family from southern Sweden, showing that they were all descended from the same ancestors. The pedigree showed the pattern of disease occurrence expected from Mendelian inheritance of a single recessive locus. This study was facilitated by frequent inbreeding within the family (20% of households had first-cousin parents), which Lundborg referred to as "unwise marriages". We now know that the disease results from a mutation in the CCC-CGC-CCC-GCG repeat region of the cystatin B gene — unaffected people have 3-4 repeats while affected people have 40+ repeats.

Lunborg himself was an active member of the eugenics movement in Sweden (which was referred to as 'race biology'), and most of his writings about the epileptic family were as bad as those quoted above (their "degeneration" was attributed to the fact that "they distilled their own alcohol, and thus became drunkards"). He eventually became Professor for Racial Hygiene; and he was influential in the implementation of forced sterilization programs in Sweden, believing that "The future belongs to the racially fine people", which obviously included himself.

Wednesday, December 3, 2014

Visual complexity and phylogenetic networks

Network diagrams have become rather commonplace in the modern world. Most of them are constructed along the same lines — observed entities (objects or concepts, or groups of them) are connected by lines showing observed relationships. Such visualizations are relatively easy to create using computers, and so they represent a relatively new form of visual data analysis. The complexity of the diagrams can be both seen and quantitatively analyzed, thus forming part of what is now grandiosely called "data mining and knowledge discovery".

The Visual Complexity project has been compiling an interesting set of online network visualizations. While the author (Manuel Lima) intends this to be "a unified resource space for anyone interested in the visualization of complex networks", at the moment it is simply a magpie collection of references to web pages. There are currently nearly 800 visualizations referenced, grouped into:
  • Art
  • Music
  • Biology
  • Food Webs
  • Transportation Networks
  • Business Networks
  • Social Networks
  • Political Networks
  • Computer Systems
  • Internet
  • World Wide Web
  • Pattern Recognition
  • Semantic Networks
  • Knowledge Networks
  • Multi-Domain Representation
  • Others
Our interest is in the Biology group, of course, where we have long known about networks, including food webs, which you will notice are grouped separately. There are currently 52 networks (plus 8 in the Food Web group), covering a wide range of topics, such as:
  • Gene interaction networks
  • Protein-protein interaction networks
  • Protein "homology" networks
  • Neuron networks
  • Haplotype blocks
  • Metabolic pathways
  • Genome maps
  • Physiology maps
  • Disease maps
  • Visualizing the aging process

This is all very well. However, we are specifically interested in phylogenetic networks, which are as old-fashioned as food webs. They differ significantly from these other biological networks. Phylogenies connect observed entities (objects, or groups of them) only indirectly, via unobserved nodes, with the lines representing inferred affinity or genealogical relationships. Only at the population level is it likely that all internal nodes, representing individuals, will be observed, and that their relationships might also be observed.

There are currently three phylogenies referenced by Visual Complexity:
Only the last of these is a network, the other two being trees. Sadly, the first one also contains a dead link, which is a problem common for most multi-year internet projects.

Unfortunately, the uniqueness of phylogenies among networks is not acknowledged by the Visual Complexity site. This is not unusual amongst network researchers, most of whom have never even heard of phylogenies. Moreover, many of the people who do seem to have heard of them often fail to understand them and their interpretation, so that they do not notice the fundamental difference. Nevertheless, phylogenetic networks are among the oldest type of recorded network, and there are certainly complex versions of them dating back to the 1700s (see those by Herman and by Batsch in Affinity networks updated).

Finally, the Visual Complexity site does not yet have much from anthropology (as distinct from the social sciences in general) or anything from linguistics (other than programming languages!). These are promising areas for studies of visual complexity.

Monday, November 10, 2014

Trees as art


Trees can be many things: objects, symbols, art, or information.

As objects, they act as homes and shelter, they provide food and oxygen, and they bind soil to hold topography in place. They even provide somewhere to sit while you are waiting to discover gravity. Their most famous use as symbols is the Tree of Life, which recurs in many cultures throughout the world. This was later extended to the Tree of Knowledge, a potent intellectual symbol throughout Western history. In the modern world this latter use has been expanded, so that trees are mathematical representations of the relationships among information.

Trees have also long played a role in art, which continues in the modern works of, for example, Vincent van Gogh and Gustav Klimt.

My first introduction to this was the book The Tree (1979, Aurum Press, UK / Little, Brown and Co, USA) by John Fowles (text) and Frank Horvat (photographs). This is a meditation on the connection between the natural world and human creativity. Horvat provides moody views of trees with (almost) no human objects in sight, and Fowles (the novelist) provides a provocative essay on trees as representations of art, revealing in his usual erudite manner that he particularly dislikes the "taming the wild" aspects of horticulture and science.


More recently, there has been the hand-lithographed book The Night Life of Trees (2006, Tara Books, Chennai, India). This contains a series of tribal-art images from three Gond people of central India (Bhajju Shyam, Durga Bai and Ramsingh Urveti). (And yes, the land of the Gond is Gondwanaland, which was the source of our name for the southern land masses.)

The Gond people have previously decorated their house walls and floors with traditional tattoos and motifs; and these motifs have made their way onto paper as modern representations of the tribal art form. Other tribal art forms that have followed a similar transfomation include the Aboriginal art of Australia, which bears a strong stylistic resemblance to some of the Gond art.

The Gonds are traditionally forest dwellers, and so the lives of humans and trees have been seen as closely entwined. Their lore suggests that trees are hard at work during the day providing shelter and nourishment, but at night they finally rest and their spirits are revealed. It is these spirits that the artists have tried to capture in their book.

I have reproduced two of the images here, because it is clear that the inter-twining reveals a very network-like aspect of the trees. The accompanying text is taken from the book.

Snakes and Earth

The earth is held in the coils of the snake goddess. And the roots of trees coil around the earth too, holding it in place. If you want to depict the earth, you can show it in the form of a snake. It is the same thing.



The Binding Tree

Mahalain trees are found deep inside the thickest jungles, holding each other in a tight embrace. Because it clings and binds so well, Mahalain bark is known for its strength. Our ancestors from earliest times searched for it in the deep jungles and used it to build houses. A house built well with Mahalain bark is said to last a hundred years.


Both books are worth seeking out if you value art as well as science. The Gond book is now in its 9th hardback edition, and is widely available in bookstores. The Fowles book (without the photographs) is currently available as a 30th anniversary paperback edition; but you are better off finding a second-hand hardback with the pictures.

Finally, just by way of contrast, here is the Albero Trinità from Joachim of Fiore's Liber Figurarum (published in 1202), a book that uses many different visualizations to display human knowledge.


My daughter was the inspiration for writing this blog post.