Showing posts with label borrowing. Show all posts
Showing posts with label borrowing. Show all posts

Monday, March 25, 2019

Automatic detection of borrowing (Open problems in computational diversity linguistics 2)


The second task on my list of 10 open problems in computational diversity linguistics deals with detecting borrowings or language contact. The prototypical case of language contact would be lexical borrowing, where a word is borrowed from one language into another, such as English job, which was adopted by Germans in the rather specific meaning of temporary occupation. More complex cases involve semantic borrowing, where a way of denoting something is borrowed, not the form itself, such as, for example, the use of the word for mouse to denote a computer mouse in many languages of the world.

Even less well understood are cases where specific aspects of grammar have been transferred. German has, for example, a certain number of neuter nouns, all borrowed from Ancient Greek or Latin, in which the plural is built according to (or inspired by) the Greek model: Lexikon has Lexika as plural, Komma has Kommata as plural, and Kompositum has Komposita as plural. While these cases are spurious in German and thus rather harmless (as are the similar examples in English), there are other cases of language contact where scholars not only suspect that plural forms have been borrowed along with the words (as in German), but that entire paradigms and strategies of grammatical marking have been adopted by one language from a neighboring variety as a result of close language contact.


Why borrowing is hard to detect

Unless we witness them happening directly, most cases of borrowing are difficult to demonstrate consistently. By comparison with lexical borrowing, however, the borrowing of grammar is probably the hardest to show, especially when dealing with abstract categories that could have actually emerged independently. The reason why borrowing is generally hard to deal with, not only in computational approaches, is that detecting borrowing and demonstrating language contact presupposes that alternative explanations are all excluded, such as universal tendencies of language change (i.e., "convergent evolution" in the biological sense), common inheritance, or simple chance.

While we need to exclude alternative possibilities to prove any of the four major types of similarities (coincidental, natural, genealogical, or contact-induced, see List 2014: 55-57), we have a much harder time in doing so when dealing with borrowings, because linguistics does not know even one procedure for the identification of borrowings. Instead, we resort to a mix of different types of evidence, which are qualitatively weighted and discussed by the experts. While historical linguistics has developed sophisticated techniques to show that language similarities are genealogical, it has not succeeded to reach the same level of sophistication for the identification of borrowings.

In this regard, techniques for contact detection are not much different from other, more specific, types of linguistic reconstruction, such as the "philological reconstruction" of ancient pronunciations (Jarceva 1990, Sturtevant 1920), the reconstruction of detailed etymologies (Malkiel 1954), or the reconstruction of syntax (Willis 2011).

Traditional strategies for detecting borrowing

It is not easy to give an exhaustive and clear-cut overview of all of the qualitative methods that scholars make use of in order to detect borrowings among languages. This is at least partially due to the nature of "cumulative-evidence arguments" (Berg 1998) — or arguments based on consilience (Whewell 1840, Wilson 1998) — which are always more difficult to formalize than clear-cut procedures that yield simple, binary results. Despite the difficulty in determining exact workflows, we can identify a couple of proxies that scholars use to assess whether a given trait has been borrowed or not.

One important class of hints are conflicts with possible genealogical explanations. A first type of conflict is represented by similarities shared among unrelated or distantly related languages. Since English mountain is reflected only in English, with similar words only in Romance, we could take this as evidence that the English word was borrowed. Since these conflicts arise from the supposed phylogeny of the languages under consideration, we can speak of phylogeny-related arguments for interference.

A second conflict involves the traits themselves, most prominently observed in the case of irregular sound correspondence patterns. German Damm, for example, is related to English dam, but since the expected correspondence for cognates between English and German would yield a German reflex Tamm (as it is still reflected in Old High German, see Kluge 2002), we can take this as evidence that the modern German term was borrowed (Pfeifer 1993). We can call these cases trait-related arguments for contact.

In addition to observations of conflicts, two further types of evidence are of great importance for inferring contact. The first one is areal proximity, and the second one is the assumed borrowability of traits. Given that language contact requires the direct contact of speakers of different languages, it is self-evident that geographical proximity, including proximity by means of travel routes, is a necessary argument when proposing contact relations between different varieties.

Furthermore, since direct evidence confirms that linguistic interference does not act to the same degree on all levels of linguistic organisation, the notion of borrowability also plays an important role. Although scholars tend to have different opinions about the concept, most would probably agree with the borrowability scale proposed by Aikhenvald (2007, p. 5), which ranges from "inflectional morphology" and "core vocabulary", representing aspects resistant to borrowing, up to "discourse structure" and the "structure of idioms", representing aspects that are easy to borrow. How core vocabulary can be defined, and how the borrowability of individual concepts can be determined and ranked, however, has been subject to controversial discussions (Lee and Sagart 2008, Starostin 1995, Tadmor 2009, Zenner et al. 2014).

Computational strategies for contact inference

Despite the large number of quantitative applications proposed during the past two decades, computational approaches for the inference of contact situations are still in their infancy. As of now, none of the few approaches proposed in the past can compete with the classical methods. The reasons for this are twofold. First, given the multiple types of evidence employed by the classical approaches, the formalization of the problem of borrowing detection is difficult. Second, given the limited number and suitability of datasets annotated for different types of linguistic interference, scholars have a hard time in developing algorithms, since they lack data for testing and training.

In principle, all algorithms for contact inference proposed so far make use of the strategies used in the classical approaches. Thus, they infer or determine shared traits among two or more languages, and then determine conflicts in these traits, taking geographical closeness and borrowability into account. In contrast to classical approaches, which combine different types of evidence, computational approaches are usually restricted to one type.

The automatic methods proposed so far can be divided into three classes. The first class employs phylogeny-related conflicts to identify those traits whose evolution cannot be explained with a given phylogenetic tree, explaining the conflicts as resulting from contact. Examples include work where I was involved myself (Nelson-Sathi et al. 2011, List et al. 2014), some early and interesting approaches which did not receive too much attention (Minett and Wang 2003), or have been mostly forgotten by now (Nakhleh et al. 2005), along with a recent study on grammatical features (Cathcart et al. 2018).

The second class uses techniques for automatic sequence comparison to search for similar words, but not cognate words, across different languages. Here, the most prominent examples include the work by Ark et al. (2007), and later Mennecier et al. (2016), who searched for similar words among languages known to be not related. Further examples include the work by Boc et al. (2010) and Willems et al. (2016), who experimented with tree reconciliation approaches, based on word trees derived from sequence-alignment techniques. There is also an experimental study where I was again involved myself (Hantgan and List forthcoming), in which we tried to identify borrowings by comparing two automatically inferred similarities among words from related and unrelated languages: surface similarities, as reflected by naive alignment algorithms, and deep similarities, reflected by advanced methods that take sound correspondences into account (List 2014).

The third class searches for distribution-related conflicts by comparing the amount of shared words within sublists of differing degrees of borrowability. This class is best represented by Sergey Yakhontov's (1926-2018) work on stable and unstable concept lists (Starostin 1991), which assumed that deep historical relations should surface in those parts of the lexicon that are stable and resistant to borrowing, while recent contact-induced relations would surface rather in those parts of the lexicon that are more prone to borrowing. Yakhontov's work was independently re-invented by Chén (1996), and McMahon et al. (2005); but given how difficult it turned out to distinguish concepts prone to borrowing from those resistant to borrowing, it has been largely disregarded for some time now.

Problems with computational strategies for contact inference

All three classes of approaches discussed so far have certain shortcomings. Phylogeny-based inference of borrowing, for example, tends to drastically overestimate the number of borrowed traits, simply because conflicts in a phylogeny can result from undetected borrowings in the data but they never need to (see Appendix 1 of Morrison 2011 on causes of reticulation in biology, which has many parallels to linguistics). Saying that all instances in which a dataset conflicts with a given phylogeny are borrowings is therefore generally a bad idea. It can be used as a very rough heuristics to come up with potentially wrongly annotated homologies in a dataset, which could then be checked again by experts, but deriving stronger claims from it seems problematic.

While sequence comparison techniques applied to unrelated languages are basically safe in my opinion, and the results are very reliable, unless one compares words that occur in all languages, such as "mama" and "papa" (Jakobson 1960, see also "Mama and papa" on Wikipedia).

Using methods for tree reconciliation on individual word trees, calculated from word distances based on phonetic alignment techniques or similar, yields the same problems of over-counting conflicts as we get for phylogeny-based approaches to borrowing. The problem here is a general misunderstanding of the concept differences between gene trees in biology, where surface similarity of gene sequences is thought to reflect evolutionary history, and word trees in linguistics. While we can use qualitative methods to draw a word tree for a given set of homologous words, the surface similarity among the words says little, if anything, about their evolutionary history.

Attempts to distinguish borrowed from inherited traits with sublists have lost their popularity in most recent studies. When properly applied, they might, indeed, provide some evidence in the search for borrowings or deep homologies. So far, however, all stability rankings of concepts that have been proposed have been based on too small an amount of either concepts (we would need rankings for some 1,000 concepts at least), or languages from which the information was derived. If we could manage to get reliable counts on some 1,000 concepts for a larger sample of the world's languages, this might greatly help our field, as it would provide us with a starting point from which people could search (even qualitatively) for borrowings in their data.

Outlook

Assuming that currently we have no realistic way to operationalize arguments based on consilience, there is no direct hope to have a fully automatic method for detecting borrowings any time soon. By developing promising existing methods further, however, there is a hope that we can learn a lot more about borrowing processes in the world's languages. What is needed here are, of course, the data that we need in order to apply the methods.

In addition to the above-mentioned automatic approaches for borrowing detection, so far, nobody has tried to use trait-related conflicts to infer borrowings. Since these are usually considered to be quite reliable by experts in historical linguistics, it seems inevitable to work in this direction as well, if we want to tackle the problem of consistent automatic detection of borrowing. Here, my recently proposed framework for a consistent handling and identification of patterns of sound correspondences across multiple languages (List 2019), could definitely be useful, although it will again be challenging to find the right balance of parameters and interpretation, since not all conflicts in sound correspondences necessarily result from borrowings.

Whether it will be possible to identify even the direction of borrowings, when developing these methods further, is an open question. Borrowability accounts might help here, but again, since no clear-cut strategies are being used by scholars, it is difficult to formalize any of the existing qualitative approaches. The greatest challenge will perhaps consist in the creation of a database of known borrowings that could assist digital linguists in testing and training new approaches.

References
Aikhenvald, Alexandra Y. (2007) Grammars in contact. A cross-linguistic perspective. In: Aikhenvald, Alexandra Y. and Dixon, Robert M. W. (eds.) Grammars in Contact. Oxford:Oxford University Press. 1-66.

van der Ark, René and Mennecier, Philippe and Nerbonne, John and Manni, Franz (2007) Preliminary identification of language groups and loan words in Central Asia. In: Proceedings of the RANLP Workshop on Acquisition and Management of Multilingual Lexicons, pp. 13-20.

Berg, Thomas (1998) Linguistic Structure and Change: an Explanation from Language Processing. Gloucestershire:Clarendon Press.

Boc, Alix and Di Sciullo, Anna Maria and Makarenkov, Vladimir (2010) Classification of the Indo-European languages using a phylogenetic network approach. In: Locarek-Junge, H. and Weihs, C. (eds.) Classification as a Tool for Research. Berlin and Heidelberg:Springer. 647-655.

Cathcart, Chundra and Carling, Gerd and Larson, Filip and Johansson, Richard and Round, Erich (2018) Areal pressure in grammatical evolution. An Indo-European case study. Diachronica 35.1: 1-34.

Chén Bǎoyà 陈保亚 (1996) Lùn yǔyán jiēchù yǔ yǔyán liánméng 论语言接触与语言联盟 [Language Contact and Language Unions]. Běijīng 北京:Yǔwén 语文.

Hantgan, Abbie and List, Johann-Mattis (forthcoming) Bangime: Secret language, language isolate, or language island? Journal of Language Contact.

Jakobson, Roman (1960): Why 'Mama' and ‘Papa'?. In: Perspectives in Psychological Theory: Essays in Honor of Heinz Werner, pp. 124-134.

Jarceva, V. N. (1990) Lingvistil'eskij enciklopedil'eskij slovar'. Moscow: Sovetskaja Enciklopedija.

Kluge, Friedrich (2002) Etymologisches Wörterbuch der deutschen Sprache. Berlin:de Gruyter.

Lee, Yeon-Ju and Sagart, Laurent (2008) No limits to borrowing: The case of Bai and Chinese. Diachronica 25.3: 357-385.

List, Johann-Mattis and Nelson-Sathi, Shijulal and Geisler, Hans and Martin, William (2014) Networks of lexical borrowing and lateral gene transfer in language and genome evolution. Bioessays 36.2: 141-150.

List, Johann-Mattis (2014) Sequence Comparison in Historical Linguistics. Düsseldorf: Düsseldorf University Press.

List, Johann-Mattis (2019) Automatic inference of sound correspondence patterns across multiple languages. Computational Linguistics 1.45: 137-161.

Malkiel, Yakov (1954): Etymology and the structure of word families. Word 10.2-3: 265-274.

McMahon, April and Heggarty, Paul and McMahon, Robert and Slaska, Natalia (2005) Swadesh sublists and the benefits of borrowing: an Andean case study. Transactions of the Philological Society 103: 147-170.

Phillipe Mennecier and John Nerbonne and Evelyne Heyer and Franz Manni (2016) A Central Asian language survey. Language Dynamics and Change 6.1: 57–98.

Minett, James W. and Wang, William S.-Y. (2003) On detecting borrowing. Diachronica 20.2: 289–330.

Morrison, D. A. (2011) An Introduction to Phylogenetic Networks. Uppsala: RJR Productions.

Nakhleh, Luay and Ringe, Don and Warnow, Tandy (2005) Perfect Phylogenetic Networks: a new methodology for reconstructing the evolutionary history of natural languages. Language 81.2: 382-420.

Nelson-Sathi, Shijulal and List, Johann-Mattis and Geisler, Hans and Fangerau, Heiner and Gray, Russell D. and Martin, William and Dagan, Tal (2011) Networks uncover hidden lexical borrowing in Indo-European language evolution. Proceedings of the Royal Society of London B: Biological Sciences 278.1713: 1794-1803.

Pfeifer, Wolfgang (1993) Etymologisches Wörterbuch des Deutschen. Berlin: Akademie.

Starostin, Sergej Anatolévic (1991) Altajskaja problema i proischoždenije japonskogo jazyka [The Altaic Problem and the Origin of the Japanese Language]. Moscow: Nauka.

Starostin, Sergej Anatolévic (1995) Old Chinese vocabulary: A historical perspective. In: Wang, William S.-Y. (ed.) The Ancestry of the Chinese Language. Berkeley: University of California Press, pp. 225-251.

Sturtevant, Edgar H. (1920) The Pronunciation of Greek and Latin. Chicago: University of Chicago Press.

Tadmor, Uri (2009): Loanwords in the world’s languages. Findings and results. In: Haspelmath, Martin and Tadmor, Uri (eds.) Loanwords in the World's Languages. Berlin and New York: de Gruyter, pp. 55-75.

Whewell, William D. D. (1847) The Philosophy of the Inductive Sciences, Founded Upon Their History. London: John W. Parker.

Willems, Matthieu and Lord, Etienne and Laforest, Louise and Labelle, Gilbert and Lapointe, François-Joseph and Di Sciullo, Anna Maria and Makarenkov, Vladimir (2016) Using hybridization networks to retrace the evolution of Indo-European languages. BMC Evolutionary Biology 16.1: 1-18.

David Willis (2011) Reconstructing last week's weather: Syntactic reconstruction and Brythonic free relatives. Journal of Linguistics 47.2: 407-446.

Wilson, Edward O. (1998) Consilience: the Unity of Knowledge. New York: Vintage Books.

Zenner, Eline and Dirk Speelman and Dirk Geeraerts (2014) Core vocabulary, borrowability and entrenchment. Diachronica 31.1: 74–105.

Monday, April 30, 2018

Stratification: how linguists traditionally identify borrowings


In my previous blog post, I illustrated how important it is to take the systemic aspects of sound change into account when comparing languages. What surfaces as a surprisingly regular process is in fact a process during which the sound system of a language changes. Since the words in a given language are derived from the sound system, a change in the system will necessarily change all words in which the respective sound occurs.

On one hand, this makes it much more difficult for linguists to identify homologous words across languages. On the other hand, however, it enables us to identify borrowings, by searching for exceptions to regular sound correspondences. I will be discussing the latter here.

Sound changes and borrowing

In order to illustrate how this can be done in practice, consider the examples of 15 cognates between German and English in the following table:

No. German  English
1 Dach  thatch
2 Daumen  thumb
3 Degen  thane
4 Ding  thing
5 drei  three
6 Durst  thirst
7 denken  think
8 Dieb  thief
9 dreschen  thresh
10 Drossel  throat

When comparing these words quickly, it is easy to see that in all cases where German has a d as the initial sound, English has a th. This sound correspondence, as we call it in historical linguistics, reflects a very typical systematic similarity between English and German, which we can identify for all related words in English and German which go back to Proto-Germanic θ-, a very regular sound change which is well accounted for in Indo-European linguistics.

Not all homologous words between English and German, however, show this correspondences, as we can easily see from the five examples provided in the next table:

No. German English
11 Dill dill
12 dumm dumb
13 Damm dam
14 Dunst dunst
15 Dollar dollar

It is easy to see that these words don't fit our expected pattern (d matching th as the first consonant). It is also clear from the overall similarity of the words that it is rather unlikely that they trace back to different words, and thus turn out to be not cognate at all. One of the simplest possible explanations for the divergence from our initial d in German corresponding to θ in English, which now surfaces as d = d, is borrowing, be it from German to English, from English to German, or from some third language.

Among the five examples, the final one, Dollar is the easiest to explain, as we are dealing with a recent borrowing of the name of the U.S. currency. English dollar itself has another cognate with German, namely German Taler, the name of a currency from ancient times (see here for the full etymology, based on Pfeifer 1993).

The other four terms in the table may seem less straightforward to explain as borrowings, as they are by no means of recent origin; but we can confirm their exceptional status by contrasting them with older Middle High German readings (11-14th century), which are listed in the following table for all 15 of our examples:

No. German English Middle High German
1 Dach thatch dah
2 Daumen thumb dūm
3 Degen thane degan
4 Ding thing ding
5 drei three drī
6 Durst thirst durst
7 denken think denken
8 Dieb thief diob
9 dreschen thresh dreskan
10 Drossel throat drozze
11 Dill dill tilli
12 dumm dumb tumb
13 Damm dam tam
14 Dunst dunst tunst
15 Dollar dollar

As can be easily seen from this table, examples 11-14 all have a t as the initial consonant in Middle High German, and not d, as in the other cases. The change from original Proto-Germanic d to t in German is a well-attested sound change, for which we have many examples in the form of sound correspondences (cf. day vs. Tag, do vs. tun, etc.). We can therefore conclude that the Middle High German readings like tilli vs. English dill reflect the readings we would expect if all words had changed according to the rules. Since no regular change from t in Middle High German to d in Standard High German can be attested, it is furthermore safe to assume that the words have been modified under the influence of contact with other Germanic language varieties.

Here, English is not the most obvious candidate for contact; and the influence is rather due to contact with neighboring language varieties in the North-West of Germany, such as Frisian or Dutch. Similar to English, they have retained the original d (cf. Dutch dille vs. English dill). If speakers of High German varieties borrowed the term from speakers of Low German varieties, they would re-introduce the original d into their language, as we can see in our examples 11-14.

Why some of these borrowings took place and some did not is hard to say. That people in the North-West, living on the coast, know more about the building of dams, for example, is probably a good explanation why High German borrowed the term: obviously, the High German speakers did not use the word tam all that frequently, but instead heard the word dam often in conversations with neighboring varieties closer to the coast. For the other words, however, it is difficult to tell what was the reason for the success of the alternative forms.

Conclusions

Despite its important role for historical language comparison, the kind of analysis described here, by which linguists infer exceptional patterns in order to identify borrowings, is not well documented, either in handbooks of historical linguistics or in the journal literature. Following Lee and Sagart (2008), it is probably best called stratification analysis, since linguists try to identify the layers of contact and inheritance which surface in the form of sound correspondences. If these layers are correctly identified, linguists can often not only determine the direction in which a borrowing occurred, but also the relative time window in which this borrowing must have happened. This is the reason why linguists can often give very detailed word histories, which show where a word was first borrowed and how it then traveled through linguistic landscapes.

As for so many methods in historical language comparison, it is difficult to identify a straightforward counterpart of this technique in biology. What probably comes closest is the usage of GC content as a proxy for the inference of directed networks of lateral gene transfer (as described in, for example, Popa et al. 2011). In contrast to lateral gene transfer in biology, however, our linguistic word histories are often much more detailed, especially in those cases where we have well-documented languages.

For the future, I hope that increased efforts to formalize the process of cognate identification, cognate annotation, and phonetic alignments in computer-assisted frameworks to historical language comparison may help to improve the way we infer borrowings in linguistics. There are so many open questions about lateral word transfer in historical linguistics that we cannot answer by sifting manually through datasets. We will need all the support we can get from automatic and semi-automatic approaches, if we want to shed some light on the many mysterious non-vertical aspects of language evolution.

References

Lee, Y.-J. and L. Sagart (2008) No limits to borrowing: The case of Bai and Chinese. Diachronica 25.3: 357-385.

Pfeifer, W. (1993) Etymologisches Wörterbuch des Deutschen. Akademie: Berlin.

Popa, O., E. Hazkani-Covo, G. Landan, W. Martin, and T. Dagan (2011) Directed networks reveal genomic barriers and DNA repair bypasses to lateral gene transfer among prokaryotes. Genome Research 21.4: 599-609.

Tuesday, February 28, 2017

Models and processes in phylogenetic reconstruction


Since I started interdisciplinary work (linguistics and phylogenetics), I have repeatedly heard the expression "model-based". This expression often occurrs in the context of parsimony vs. maximum likelihood and Bayesian inference, and it is usually embedded in statements like "the advantage of ML is that it is model-based", or "but parsimony is not model-based". By now I assume that I get the gist of these sentences, but I am afraid that I often still do not get their point. The problem is the ambiguity of the word "model" in biology but also in linguistics.

What is a model? For me, a model is usually a formal way to describe a process that we deal with in our respective sciences, nothing more. If we talk about the phenomenon of lexical borrowing, for example, there are many distinct processes by which borrowing can happen.

A clearcut case is Chinese kāfēi 咖啡 "coffee". This word was obviously borrowed from some Western language not that long ago. I do not know the exact details (which would require a rather lengthy literature review and an inspection of older sources), but that the word is not too old in Chinese is obvious. The fact that the pronunciation comes close to the word for coffee in the largest European languages (French, English, German) is a further hint, since the longer a new word has survived after having been transplanted to another language, the more it resembles other words in that language regarding its phonological structure; and the syllable does not occur in other words in Chinese. We can depict the process with help of the following visualization:


Lexical borrowing: direct transfer
The visualization tells us a lot about a very rough and very basic idea as to how the borrowing of words proceeds in linguistics: Each word has a form and a function, and direct borrowing, as we could call this specific subprocess, proceeds by transferring both the form and the function from the donor language to the target language. This is a very specific type of borrowing, and many borrowing processes do not directly follow this pattern.

In the Chinese word xǐnǎo 洗脑 "brain-wash", for example, the form (the pronunciation) has not been transferred. But if we look at the morphological structure of xǐnǎo, being a compound consisting of the verb "to wash" and nǎo "the brain", it is clear that here Chinese borrowed only the meaning. We can visualize this as follows:
Lexical borrowing: meaning transfer

Unfortunately, I am already starting to simplify here. Chinese did not simply borrow the meaning, but it borrowed the expression, that is, the motivation to express this specific meaning in an analogous way to the expression in English. However, when borrowing meanings instead of full words, it is by no means straightforward to assume that the speakers will borrow exactly the same structure of expression they find in the donor language. The German equivalent of skyscraper, for example, is Wolkenkratzer, which literally translates as "cloudscraper".

There are many different ways to coin a good equivalent for "brain-wash" in any language of the world but which are not analogous to the English expression. One could, for example, also call it "head-wash", "empty-head", "turn-head", or "screw-mind"; and the only reason we call it "brain-wash" (instead of these others) is that this word was chosen at some time when people felt the need to express this specific meaning, and the expression turned out to be successful (for whatever reason).

Thus, instead of just distinguishing between "form transfer" and "meaning transfer", as my above visualizations suggest, we can easily find many more fine-grained ways to describe the processes of lexical borrowing in language evolution. Long ago, I took the time to visualize the different types of borrowing processes mentioned in the work of (Weinreich 1953[1974]) in the following graphic:

Lexical borrowing: hierarchy following Weinreich (1953[1974])

From my colleagues in biology, I know well that we find a similar situation in bacterial evolution with different types of lateral gene transfer (Nelson-Sathi et al. 2013). We are even not sure whether the account by Weinreich as displayed in the graphic is actually exhaustive; and the same holds for evolutionary biology and bacterial evolution.

But it may be time to get back to the models at this point, as I assume that some of you who have read this far have began to wonder why I am spending so many words and graphics on borrowing processes when I promised to talk about models. The reason is that in my usage of the term "model" in scientific contexts, I usually have in mind exactly what I have described above. For me (and I suppose not only for me, but for many linguists, biologists, and scientists in general), models are attempts to formalize processes by classifying and distinguishing them, and flow-charts, typologies, descriptions and the identification distinctions are an informal way to communicate them.

If we use the term "model" in this broad sense, and look back at the discussion about parsimony, maximum likelihood, and Bayesian inference, it becomes also clear that it does not make immediate sense to say that parsimony lacks a model, while the other approaches are model-based. I understand why one may want to make this strong distinction between parsimony and methods based on likelihood-thinking, but I do not understand why the term "model" needs to be employed in this context.

Nearly all recent phylogenetic analyses in linguistics use binary characters and describe their evolution with the help of simple birth-death processes. The only difference between parsimony and likelihood-based methods is how the birth-death processes are modelled stochastically. Unfortunately, we know very well that neither lexical borrowing nor "normal" lexical change can be realistically described as a birth-death process. We even know that these birth-death processes are essentially misleading (for details, see List 2016). Instead of investing our time to enhance and discuss the stochastic models driving birth-death processes in linguistics, doesn't it seem worthwhile to have a closer look at the real proceses we want to describe?

References
  • List, J.-M. (2016) Beyond cognacy: Historical relations between words and their implication for phylogenetic reconstruction. Journal of Language Evolution 1.2. 119-136.
  • Nelson-Sathi, S., O. Popa, J.-M. List, H. Geisler, W. Martin, and T. Dagan (2013) Reconstructing the lateral component of language history and genome evolution using network approaches. In: : Classification and evolution in biology, linguistics and the history of science. Concepts – methods – visualization. Franz Steiner Verlag: Stuttgart. 163-180.
  • Weinreich, U. (1974) Languages in contact. With a preface by André Martinet. Mouton: The Hague and Paris.