Showing posts with label Review. Show all posts
Showing posts with label Review. Show all posts

Tuesday, July 28, 2015

Comparison of Online Y-STR Predictors (Petrejcíková et al.) [Review]

Introduction
An interesting study was published in 2014 based on Slovak Y-STR samples testing for 12 microsatellite markers. The main scope of this paper appears to be the investigation of the efficacy of three publicly available Y-STR haplogroup predictors (Athey, Cullen and YPredictor in alphabetical order) based on these 12 Y-STRs. Study contents shown below.

Y-SNP analysis versus Y-haplogroup predictor in the Slovak population.
Petrejcíková E, Carnogurská J, Hronská D, Bernasovská J, Boronová I, Gabriková D, Bôziková A, Maceková S. Anthropol Anz. 2014;71(3):275-85.
Human Y-chromosome haplogroups are important markers used mainly in population genetic studies. The haplogroups are defined by several SNPs according to the phylogeny and international nomenclature. The alternative method to estimate the Y-chromosome haplogroups is to predict Y-chromosome haplotypes from a set of Y-STR markers using software for Y-haplogroup prediction. The purpose of this study was to compare the accuracy of three types of Y-haplogroup prediction software and to determine the structure of Slovak population revealed by the Y-chromosome haplogroups. We used a sample of 166 Slovak males in which 12 Y-STR markers were genotyped in our previous study. These results were analyzed by three different software products that predict Y-haplogroups. To estimate the accuracy of these prediction software, Y-haplogroups were determined in the same sample by genotyping Y-chromosome SNPs. Haplogroups were correctly predicted in 98.80% (Whit Athey's Haplogroup Predictor), 97.59% (Jim Cullen's Haplogroup Predictor) and 98.19% (YPredictor by Vadim Urasin 1.5.0) of individuals. The occurrence of errors in Y-chromosome haplogroup prediction suggests that the validation using SNP analysis is appropriate when high accuracy is required. The results of SNP based haplotype determination indicate that 39.15% of the Slovak population belongs to R1a-M198 lineage, which is one of the main European lineages.
[Abstract] [Direct Link]

Are They Really Comparable?
Although all three predictors returned similar efficacy rates (~97-99%), it should be noted the authors' chief divisions of interest appear to be the conventional subclade designations currently used in both literature and the genetic genealogy community (e.g. R1a1a-M198). The authors correctly state Y-SNP testing is paramount in definitively gauging subclade classifications, especially for lines substantially downstream of a given haplogroup's phylogeny.

The rest of this entry determines whether these calculators display any other features which may give aspiring researchers reasons to choose one over another.

Subclade Coverage
A substantial difference is observed between the three. Athey's output is oriented around 21 categories spread across most of the major clades/subclades, although haplogroups not commonly found in West Eurasia (e.g. A-D) are unrepresented. Cullen improves on this significantly with 86 subclades, with Y-DNA I receiving the most attention (R1b to a lesser extent), with some improvements, such as well as the inclusion of "A&B". YPredictor has the highest count, hosting over 100 subclades, with the majority found in Y-DNA haplogroups E, G, J, N and R. With the exception of Y-DNA M and S, all are accounted for here.

STR count
Athey is capable of handling 111 Y-STR's (21 and 27-STR versions also available) with the format being listed in either numerical or Family Tree DNA (FTDNA) order. Cullen accepts a maximum of 67 STR's. YPredictor houses approximately 82 STR's. As such, all three are capable of handling a considerable number.

Interface
All three predictors permit the use of batched data and provide different means of categorising the data as seen fit by the user. Instructions are adequately provided for all three as well. As a research utility, however, YPredictor stands out through its' custom YFiler iterations (widely-used format in population genetics publications concerning Y-STRs) and debug feedback before predictions are made by the calculator.

Computational Time
This varies based on the user's CPU processing time, as well as whether they are manually entering STR values or inserting batched data. As such, this probably shouldn't be a pertinent factor in deciding which calculator to use.

Output Information
All three produce similar information (subclade prediction with probability expressed as a percentage).

Conclusion
Before summarising these findings, it is worth noting that Athey's predictor precedes Cullen's and YPredictor. As such, any perceived deficiencies in subclade breakdown or functionality are likely a result of age. Athey's predictor was widely used in the past, irrespective of the current application rate.

All three predictors are of use to genetic genealogists. This entry concludes the following "idealised" purposes for each:

  • Athey - For users keen to utilise upwards of 111 FTDNA Y-STR's as cross-validation against the other two
  • Cullen - Best for those seeking refined Y-DNA I or R1b subclade predictions
  • YPredictor - Most versatile and research-friendly, best worldwide coverage of Y-DNA subclades

As such, the three calculators certainly are comparable for making basic Y-STR predictions for West Eurasians, but obvious differences exist with respect to non-West Eurasian subclade coverage.

If compelled to make a single choice, I would recommend Cullen first to genetic genealogists of Northwest European paternal heritage (given the high frequencies of Y-DNA's I and R1b). YPredictor would be the best choice for those belonging to subclades more common outside Europe. This also explains why it has been extensively used in this blog to date. Athey's function has otherwise been usurped by the other two. 

Friday, September 5, 2014

Worldwide Population Y-DNA Collated (Xu et al.) [Review]

Approximately one week has passed since a new paper by Xu et al. was indexed by PubMed and made available online ahead of printing:

"The Y chromosome is one of the best genetic materials to explore the evolutionary history of human populations. Global analyses of Y chromosomal short tandem repeats (STRs) data can reveal very interesting world population structures and histories. However, previous Y-STR works tended to focus on small geographical ranges or only included limited sample sizes. In this study, we have investigated population structure and demographic history using 17 Y chromosomal STRs data of 979 males from 44 worldwide populations. The largest genetic distances have been observed between pairs of African and non-African populations. American populations with the lowest genetic diversities also showed large genetic distances and coancestry coefficients with other populations, whereas Eurasian populations displayed close genetic affinities. African populations tend to have the oldest time to the most recent common ancestors (TMRCAs), the largest effective population sizes and the earliest expansion times, whereas the American, Siberian, Melanesian, and isolated Atayal populations have the most recent TMRCAs and expansion times, and the smallest effective population sizes. This clear geographic pattern is well consistent with serial founder model for the origin of populations outside Africa. The Y-STR dataset presented here provides the most detailed view of worldwide population structure and human male demographic history, and additionally will be of great benefit to future forensic applications and population genetic studies."

This paper showcases a staggering 979 distinct Y-DNA 17 STR haplotypes across 44 distinct populations from across the world. These haplotypes are soon to be uploaded to the Y-STR Haplotype Resource Database (YHRD). The authors have made all the haplotypes, together with a slew of additional information, publicly available independent of the official article (raw haplotypes, Y-DNA haplogroup predictions).

In this entry, the collated results of all populations are reviewed, together with cursory inferences provided with the intention of aiding interpreting them.


Method
All 979 haplotypes were retrieved through the above link. Each population dataset was run through Vadim Urasin's YPredictor (v1.5.0). A 70% prediction strength threshold was implemented. All nomenclature were reduced to the haplogroup level to avoid confusion for future readers should these change in time. These haplotypes formed the collated population results.


Results
877 haplotype predictions met the 70% threshold established. Without having access to the original study, it is apparent that the authors also used Urasin's YPredictor, given the identical predictions.

The collated population results have been organised by the location of sampling by continent or region and can be found in the Data Sink. Direct links to each section accompanied by the list of populations sampled are listed below for the reader's convenience with a brief runthrough of some interesting findings under each.

1. Europe Adygei (Russia), Chuvash (Russia), Danes (Denmark), Finns (Finland), Hungarians (Hungary), Irish (Ireland), Khanty (Russia), Komi (Russia), Russians (Archangelsk), Russians (Vologda), Yakut (Russia)

The Adygei present as expected; they are predominantly G-P15 and J-L26 with various subclades of haplogroup R. Various subclades of haplogroups N and R define the Chuvash, with an additional appearance by J-L26 and Q-MEH2. Ethnic Russian populations appear to have their own regionalised diversity on the backdrop of being predominantly R-M198 and downstream subclades (particularly R-M458). The Irish are predominantly (~81%) R-M269, although the presence of a single man with H-M82 is surprising. Finally, the Yakut too belong overwhelmingly to haplogroup N (~78%) with a single man being predicted as I-P37.2.

2. Middle-East Druze (Israel), Samaritans (Israel), Yemenite Jews (Yemen)

The Druze are one of the better-sampled populations in this study, where they are mostly represented by various subclades of haplogroups E and G, together with R-M269 and T-L162. The Samaritans are defined (in order of decreasing frequency) exclusively by J-L26, J-P58 and E-V22. Finally, the Yemenite Jews present with a similar (though more restricted) spectrum as the Druze with some differences in frequency.

3. East Asia Ami (Taiwan), Atayal (Taiwan), Cambodians (Cambodia), Chinese (USA), Chinese (Taiwan), Hakka (Taiwan), Japanese (USA), Koreans (S. Korea), Laotians (Laos)

The Ami are unsurprisingly defined mostly by downstream subclades of haplogroup O, although there does appear to be an I-M223 and L-M317 among them. The Atayal, also of Taiwan, are exclusively O-MSY2.2. The Cambodians appear to have even more lineages which are typically expected further west. The Japanese boast the highest frequency of D-M55 out of all the populations sampled (21.1%). The Korean results contrast with this through the presence of men with N*-LLY22g(xM128,P43,Tat) and Q-MEH2. The Laotians appear to have one man with DE*-M1, although this will require SNP testing to definitively confirm.

4. Africa Ashkenazi Jews (S. Africa), Biaka Pygmies (CAR), Chagga's (Tanzania), Ethiopian Jews (Ethiopia), Hausa (Nigeria), Ibo (Nigeria), Masai (Tanzania-Kenya), Mbuti Pgymies (Congo R.), Sandawe (Tanzania), Yoruba (Nigeria)

The Ashkenazi Jews of South Africa appear to have a Y-DNA spectrum that is completely typical of Southwest Asians (please compare with the Druze). The Bagandu are largely defined by subclades of haplogroups B and E. Tanzanians here are completely haplogroup E and T. The presence of G-M15, J-L26 and R-M269 among the Hausa is surprising and may be attributed to a colonial European presence or some other forms of interaction.  The Sandawe have some rather unusual results given their geographical position (I-P37.2 and Q-MEH2), raising the possibility these haplotypes were predicted incorrectly.

5. Australasia Micronesians (Micronesia), Nasioi Melanesians (Solomon Islands)

Both the Micronesians and Melanesians have an unusually diverse spectrum. It is difficult to ascertain whether the parahaplogroups shown are genuine or, as described above, a result of incorrect predictions. A recent paper revealing the presence of newly discovered offshoots from haplogroup K in Southeast Asia [1] raise the possibility some of these may be genuine.

6. Americas Karitiana (Brazil), African Americans (USA), European Americans (USA), Maya (Mexico), Pima (USA), Rondonian Surui (Brazil), Ticuna (Brazil)

The Karitiana are predominantly Q-MEH2 but appear to have some non-American admixture through E-U175. African Americans are represented as an approximately 4:6 mix of R-M269 against various haplogroup E subclades. The Maya population, like the Karitiana, are Q-MEH2 with additional markers from outside the Americas, as are the Pima. The trend continues with the Quechua people, although C-M217 and T-L162 make their first appearance here. Finally, the Rondonian Surui and Ticuna are completely Q-MEH2.


Criticisms
There are at least two areas of the authors' methodology which are deemed to be drawbacks and prevent this study from being exceptionally informative.

Firstly, the authors evidently used the YFiler sampling array to complete this investigation. In an era where commercial testees can enjoy upwards of 111 Y-STR's, the long-term usefulness of this paper's extensive worldwide sampling is cut short. Another recent paper presenting Y-STR's worldwide has done so using 23 rather than just 17. [2]

My comments are more critical of the authors' sampling strategy. More data is never strictly a burden in the world of population genetics, but the informativeness of groups such as "European Americans", "Irish" and Chinese born in the USA is questionable. For instance, these groups are already richly represented, be it in the current literature or FTDNA Project groups. The apparent issue with these samples would have been rectified if they were simply obtained from a single area, providing regional specificity which may prove useful in better establishing genetic variation within Ireland, for example.

Finally, the haplotypes could have also received a "backbone" SNP test each to definitively place them within the current phylogeny. The drawbacks of STR-alone testing became readily apparent with some of the African samples. I can only speculate it is the highly divergent nature of certain uniquely African haplotypes from Eurasian ones which produced these spurious results.


On Mutation Rates (Quick Discussion)
In this study, both BATWING and the average squared distance (ASD) method were used. Within each, four different mutation rates were implemented. On initial inspection these appear to vary wildly. However, on closer examination, it appears all the BATWING most recent common ancestor (MRCA) calculated ages are approximately twice as old as those generated by the ASD method. Even within each technique there is substantial variation; the evolutionary rate appears approximately three times greater than the others. Furthermore, these "other" mutation rates do tend to congregate around a common similar value (e.g. through BATWING, the calculated global age of their Y-DNA R-M198 haplotypes was 5.5k, 6.1k and 6.2kya), which would intuitively suggest the "actual" value lies somewhere within these either through BATWING or ASD. The discrepancy here cannot be overstated and calls into question why some researchers are still utilising a "blanket" mutation rate across several loci which are shown to have significantly different tendencies to mutate (colloquially described as "slow", "medium" and "fast" mutators). I am uncertain whether the authors are in fact doing this, but the implications of this are apparent, as they prevent rational "fitting" of these numbers into candidate prehistoric narratives from happening. This entire topic will likely be explored in a future entry.


Conclusion
Although at least three drawbacks (four including the MRCA calculations) are identified here, this study provides researchers worldwide with a plethora of data from populations that are either poorly represented in the current literature or have been entirely absent until present. The majority of the results outline the wide Y-chromosomal diversity across the world, whilst also revealing specific trends that have been established in both the current literature and in online discussion boards. An mtDNA counterpart of this paper would be a wonderful addition to see sometime in the near future.

There is a bountiful amount of data to be interpreted with pre-existing ideas/models and compared with prior studies which place a premium on each population's area. I welcome any form of dialogue regarding the results. There, is, for many of us, plenty to elucidate. The conclusion does not end here; I encourage as much further investigation and thought by the readers as the data permits.

[Addendum @ 05/09/2014]: Error regarding Karitiana data. Modified and updated.


Citations
1. Karafet TM, Mendez FL, Sudoyo H, Lansing JS, Hammer MF. Improved phylogenetic resolution and rapid diversification of Y-chromosome haplogroup K-M526 in Southeast Asia. [Last Retrieved 03/09/2014]: http://www.nature.com/ejhg/journal/vaop/ncurrent/full/ejhg2014106a.html 

2. Purps J, Siegert S, Willuweit S, Nagy M, Alves C, Salazar R et al. A global analysis of Y-chromosomal haplotype diversity for 23 STR loci. [Last Retrieved 05/09/2014]: http://www.fsigenetics.com/article/S1872-4973%2814%2900084-2/abstract

Saturday, July 13, 2013

A Hidden Gem in Central Asia: Previously Unknown Y-DNA R1b Haplotype [Original Work]

1. Introduction

Central Asian Y-DNA diversity has been an area of constant intrigue in the genetics community. Wells et
al.'s The Eurasian Heartland: A continental perspective on Y-chromosome diversity paved the way, with several others following in their regard. Members of the same team (including Dr. Wells) produced another paper - A Genetic Landscape Reshaped by Recent Events: Y-Chromosomal Insights into Central Asia - on the same topic in the following year, this time headed by Dr. Tatania Zerjal. I noted a greater emphasis on East-Central Asian populations as well as a mentioning of Y-STR analysis in the study itself. However, none of this data was supplied, with only Y-SNP information included (shown sporadically in this entry). The age of this paper is apparent through the nomenclature used (see Method section).

Several months ago, I made a request to obtain the Y-STR data from this study to one of the co-authors, Dr. Tyler-Smith, who kindly replied with the results of all sampled populations (Data Sink > Zerjal et al. Raw Data).

In this blog entry, the Y-STR data is showcased with a special emphasis on the Y-DNA R1b-M269 which was discovered.


2. Method
Y-SNP Phylogeny in original paper (Zerjal et al.) [1]


The maximum number of compatible Y-STR's were utilised for processing in Urasin's YPredictor for easier haplogroup identification (14 of a possible 16, DYS434 and 435 were excluded). All data was run through YPredictor. Only samples with ≥70% probability were included in the final results (Data Sink > Processed Data). As discussed below, relevant findings are compared with the basic Y-SNP haplogroups shown in the original study (on right).

One point which needs to be addressed immediately is the high frequency of "_DE-M1" and "P-M45". It appears that the STR selection has led to a phantom result, rendering many of the samples useless. For instance, the original study shows the Kazakhs belong overwhelmingly to C3c-M48, [1] although the probable results shown here are mostly "_DE-M1".  The exclusion of DYS434 and 435 from my level of processing likely contributed to this; if one assigns equal weight to the statistical strength of a prediction, removal of two STR's from a panel numbering 16, accuracy is reduced by 12.5%. Additionally, some conversion error seems to have applied with DYS437 (i.e. a value <12 is unusual). Therefore, "_DE-M1" and "P-M45" results were dismissed on account of the mismatch between predicted and likely confirmed haplogroups probably due to a compatibility issue between the study's STR panel and YPredictor..


3. Results

As the majority of samples were removed owing to the caveat described above, this entry will take a qualitative rather than quantitative approach to analysis on the general picture formed. Much of the remaining results are congruent with findings in other papers. Populations around the Caucasus are signified by plenty of R1b-M269, J2a-M410 and G2a-P15. Tajiks and the Kyrgyz were predominantly R1a1a-M17. Mongolians and other East-Central Asian ethnic groups yielded the most O3-M122 and "NO-M14" (likely to be Y-DNA N or O suffering from the STR restrictions described in the Method section).


Y-SNP distribution in Central Asia (Zerjal et al.) [1]


3i. The R1b Signal 

R1b-M269 was found across Central Asia and not only in the Caucasus (Armenians, Azeris, Georgians, Ossetians). It was mostly detected among the Turkmen (trk1, trk2, trk4, trk6, trk7, trk22, T29, T32) with a single sample among the Uzbek (uz-s110). [1]

Analysis of the haplotypes (including DYS434 and DYS435) revealed the nine Central Asian R1b samples belonged to a secure haplotype (Data Sink > R1b Results). trk6 diverged greatest, albeit with two 1-step mutations on DYS393 and DYS434. The rest match this haplotype exactly or have single 1-step mutations. [1] When this Central Asian R1b haplotype is compared with the other Caucasian samples, a mixed picture emerges, with the poorest being an Armenian (arm47) at 8/16, whereas the best are another Armenian (arm12) and Azeri (az48), both at 15/16. [1]
One interesting point is the Kurds sampled in this study (some of whom also belong to R1b-M269) are actually the displaced population positioned on the Iranian-Turkmenistani border. All of whom match the Central Asian R1b haplotype with a similar value (12-13/16). This definitively rules out the Kurds as a source for the haplotype, particularly as better matches can be found further to the west. It should be noted the Kurds themselves formed their own R1b haplotype (defined here by DYS389II=27, DYS391=10). [1]

In summary, the data reveals that the Turkmen are particularly abundant in R1b-M269 and all belong to the same haplotype as one of the Uzbek samples. This haplotype matched some Caucasians very well, but others not so well. The Kurds living in Turkmenistan belonged to their own haplotype.


3ii. Is This Actually R1b-M269?

Attention must first be shown to the original paper again; any potential R1b-M269 here will be present as P(xR1a)-92R7 (shown in the paper as "Haplogroup 1"). [1] Evidently, this makes up approximately half of the Turkmen lines and a quarter of Uzbek ones. Other haplogroups (such as other forms of R1b, R2a-M124, various Q subclades) presumably make up the rest of "Haplogroup 1" shown.

The next step is to verify whether or not this Central Asian R1b haplotype matches other R1b haplotypes online. As Y-DNA R1b-M269 is fortunately well-represented in the world of genetic genealogy, searching for the haplotype's matches on ySearch is a reasonable enterprise. DYS437 had to be excluded here due to a conversion issue, leaving the haplotype at 15 STR's. A genetic distance (GD) of 3 was allowed on these 15 markers. Results are shown on the right.

ySearch results for Central Asian R1b haplotype
With some confidence, the search has demonstrated that the Central Asian R1b haplotype does indeed belong to R1b-M269, as all the seven matches shown (one of whom is Armenian) belong to it.

Expanding the line of inquiry one further step came through comparing this haplotype with Iranian haplotypes [2] which were readily available. Due to differences in STR panels (an overlap of only 11) this proved to be inconclusive, aside from the observation that DYS389i+ii was completely different between the Central Asian modal (10-26) and the Iranian values. At this point I suspect that, much like DYS437, there is a conversion issue with DYS389 also.

Finally, a comparison was made with the R1b found in Afghanistan last year [3]. Interestingly, if DYS389i+ii and DYS437 are excluded, the two Uzbeks (samples 35 and 181) match the Central Asian R1b haplotype almost exactly based on the remaining 11 STR's. The one Tajik (sample 32) is less likely to be related due to two 1-step mutations on different STR's.


4. Conclusion

The inferences made from the data hang by a metaphorical thread due to the persistent STR issue; different labs have used different panels in the past decade, making it excruciatingly difficult to use materials from older papers. Fortunately, the presence of a specific strain of R1b-M269 in Central Asian (in Turkmen and Uzbeks) has successfully been demonstrated after select exclusions and no modifications to the data.

However, some larger questions remain. If STR limitations were not an issue, how would the Iranians from Haber et al. have compared? Would the Tajik from the other Haber et al. paper have belonged to the same haplotype in the end?

The origin of this Central Asian R1b haplotype will, I anticipate, also be a point discussed heavily among interested parties. At this point in time, I must stress that none of the evidence thus far points to anything in particular without ruling other theories out, although it leaves the door for interpretation wide open.

Having given this cautionary statement, the main thrust of this entry should be emphasised; R1b-M269 in Central Asia is a confirmed reality and here to stay. I will defer any subsequent analyses to the experts on Y-DNA R1b which grace several genetic genealogy boards for their take on the flavour of this haplotype.


5. Acknowledgement

I publicly extend my gratitude to Dr. Tyler-Smith for being so kind in sending me the raw STR's from this important paper for my research, as well as co-authoring the other two excellent studies I have cited here and in the past.


6. References

1. Zerjal T, Wells RS, Yuldasheva N, Ruzibakiev R, Tyler-Smith C. A genetic landscape reshaped by recent events: Y-chromosomal insights into central Asia. Am J Hum Genet. 2002 Sep;71(3):466-82. Epub 2002 Jul 17.

2. Haber M, Platt DE, Badro DA, Xue Y, El-Sibai M, Bonab MA. Influences of history, geography, and religion on genetic structure: the Maronites in Lebanon. Eur J Hum Genet. 2011 Mar;19(3):334-40. doi: 10.1038/ejhg.2010.177. Epub 2010 Dec 1.

3. Haber M, Platt DE, Ashrafian Bonab M, Youhanna SC, Soria-Hernanz DF, Martínez-Cruz B. Afghanistan's ethnic groups share a Y-chromosomal heritage structured by historical events. PLoS One. 2012;7(3):e34288. doi: 10.1371/journal.pone.0034288. Epub 2012 Mar 28.

Tuesday, March 26, 2013

Y-DNA Haplogroup N in India: Wayward Uralics or Lab Error? [Original Work]

Introduction
Y-DNA Haplogroup N Eurasian Distribution


Per ISOGG's 2013 SNP tree and as has been the case for years, Y-DNA Haplogroup N is defined by the M231 mutation (G->A at rs9341278) on the Y-Chromosome. With a predominantly North Eurasian distribution, it peaks in Europe among the Finnish people and various ethnic groups residing in Russia's far north through the N1c-Tat subclade. N1c-Tat specifically is frequently associated with Uralic-speaking populations in the literature.

Haplogroup N also appears to have an association with Central Asia as shown in the N Y-DNA Haplogroup Project (FTDNA) results, with several samples coming in from Kazakhstan, Uzbekistan and Mongolia. It has also been observed in Turkey (KurdishDNA blog entry) as well as appearing in 1.6% of Iran's Azeri population (Grugni et al. entry).

The finding of Haplogroup N in India through Sharma et al.'s The Indian origin of paternal haplogroup R1a1* substantiates the autochthonous origin of Brahmins and the caste system [1] is a curious one. Unfortunately, the paper did not include any Y-STR material to help understand the basis of N's presence in India.


Significance of Potential Haplogroup N in India

Linguistics provides us with a plausible scenario regarding how Haplogroup N may have arrived in the Indian Subcontinent. Contacts between early Finno-Ugric and Indo-Iranian groups took place around the Ural mountains, specifically between the forest and steppe zones. Evidence of transmission in horsekeeping techniques, economy, deities and common words are firmly established from Andronovo archaeological horizon on the steppes into the "Andronovoid" societies living in the nearby forests. [2]

The presence of Haplogroup N in India, if present in relevant populations and displaying MRCA values or STR clusters consistent with a Neolithic origin further north, would satisfy the likelihood of Haplogroup N representing an accompanying genetic signal from the steppe zone roughly four thousand years ago, as well as serving as a genetic remnant of the interactions that undoubtedly took place between Indo-Iranian and Finno-Ugric tribes.


Current Findings

In 2009, Sharma et al. published a paper highlighting the Y-Chromosome haplogroup differences between various upper caste (Brahmin) and tribal populations across India. The paper went on to deduce that Haplogroup R1a1a in India was autochthonous in origin based on their findings [1] (now disputable and improbable based on Underhill et al.'s landmark study on Y-DNA R1a1a and recent findings by the R1a Subclades FTDNA Project, although this topic is beyond the scope of this entry).

It was this very paper by Sharma et al. which revealed the presence of Y-DNA N in India. Haplogroup N1-LLY22g was found in Brahmins from Gujarat, Madhya Pradesh and Mahastra (3.13%, 2.38% and 3.33% respectively), as well as tribal populations from Uttar Pradesh (1.56%). Their results were extended to include greater caste differentiation (Brahmins vs. Scheduled Castes vs. Tribals); here, Brahmins were found to have five times greater the frequency of N1-LLY22g than tribal groups (0.5% vs. 0.1% respectively).  [1]

Although the frequencies were arguably insignificant, the inference stood - Y-DNA Haplogroup N showed an association with the upper caste practitioners of Hinduism in India, paving the way for the scenario described in the above chapter to be considered.

However, the strength of this conclusion is weakened greatly by cross-sectional data from numerous studies concerning the Indian Subcontinent produced in the past decade:


  • Sengupta et al. (2006) revealed that, out of 1090 samples, with the majority coming from the Indian Subcontinent, the only populations revealing any Haplogroup N (N-M231) and associated downstream subclades were either East Asian (Chinese ethnicities, Cambodian) or Siberian (Yakut). [3] No groups from India belonged to Haplogroup N-M231.
  • Furthermore, Sahoo et al. (2006) also sampled individuals from across the Indian Subcontinent (n=1074) and failed to find a single instance of N-M231. [4]
  • In a recent study on various populations in Tamil Nadu (South India), Haplogroup N was completely absent in the 1680 samples tested. [5]
  • Y-DNA N1c-Tat was absent in the 607 tribal populations tested from East and Northeast India. [6]
  • Returning to the north of the country, 560 men from various upper castes and Muslim groups were tested by Zhao et al. and N1c-Tat was absent from all. [7] 
  • Focused specifically on Brahmins from Saraswat (Jammu-Kashmir), Yadav et al. found none of the approximately 109 haplotypes to belong to any derivative of Haplogroup N. [8]

Finally, the N Y-DNA Haplogroup Project at FTDNA currently does not show any samples whatsoever from the Indian Subcontinent.


Possible Explanation

Despite over 4,000 samples over five studies representing various groups from across India, not a single trace of Haplogroup N has been detected. What explains this glaring discrepancy with Sharma et al.'s findings? Differences in sampling strategy between the other studies with Sharma et al. cannot account for this; there is enough regional overlap to rule this out.

As was the case with Sengupta et al. where several Hazara haplogroup classifications were allegedly due to a laboratory error, it is probable the Haplogroup N seen here follows the same suit. By reasonable deduction, if one study reveals a trend that several others covering thousands of samples cannot verify, there must be something intrinsically erroneous in the former.


Conclusion

Until I can physically view the purported Haplogroup N haplotypes reported in Sharma et al., it is the conclusion of this entry that they are most likely the result of a laboratory error given the complete absence of any flavour of N-M231 in India through other recent studies. If any Haplogroup N is found, it must be contrasted against Sharma et al. and should be investigated on a separate line of inquiry. As ever, details of any future cases of Haplogroup N in India should be taken into consideration. If of a Mughal background, the paternal origins are readily explained by Medieval Central Asian ancestry. If from the furthest northeast of the Indian Subcontinent, the possibility of Nepali ancestry should be sought. [9] Although prehistoric indirect influence from Finno-Ugric interactions in the second millennium BC onwards shouldn't be dismissed outright, other more recent explanations exist.


References


1. Sharma S, Rai E, Sharma P, Jena M, Singh S, Darvishi K. The Indian origin of paternal haplogroup R1a1* substantiates the autochthonous origin of Brahmins and the caste system. J Hum Genet. 2009 Jan;54(1):47-55. doi: 10.1038/jhg.2008.2. Epub 2009 Jan 9.

2. Kuz'mina EE. The Origin of the Indo-Iranians. Koninklijke Brill NV, Leiden, The Netherlands. 2007.

3. Sengupta S, Zhivotovsky LA, King R, Mehdi SQ, Edmonds CA, Chow CE. Polarity and temporality of high-resolution y-chromosome distributions in India identify both indigenous and exogenous expansions and reveal minor genetic influence of Central Asian pastoralists. Am J Hum Genet. 2006 Feb;78(2):202-21. Epub 2005 Dec 16.

4. Sahoo S, Singh A, Himabindu G, Banerjee J, Sitalaximi T, Gaikwad S. A prehistory of Indian Y chromosomes: evaluating demic diffusion scenarios. Proc Natl Acad Sci U S A. 2006 Jan 24;103(4):843-8. Epub 2006 Jan 13.

5. Arunkumar G, Soria-Hernanz DF, Kavitha VJ, Arun VS, Syama A, Ashokan KS. Population differentiation of southern Indian male lineages correlates with agricultural expansions predating the caste system. PLoS One. 2012;7(11):e50269. doi: 10.1371/journal.pone.0050269. Epub 2012 Nov 28.

6. Borkar M, Ahmad F, Khan F, Agrawal S. Paleolithic spread of Y-chromosomal lineage of tribes in eastern and northeastern India. Ann Hum Biol. 2011 Nov;38(6):736-46. doi: 10.3109/03014460.2011.617389. Epub 2011 Oct 6.

7. Zhao Z, Khan F, Borkar M, Herrera R, Agrawal S. Presence of three different paternal lineages among North Indians: a study of 560 Y chromosomes. Ann Hum Biol. 2009 Jan-Feb;36(1):46-59. doi: 10.1080/03014460802558522.

8. Yadav B, Raina A, Dogra TD. Genetic polymorphisms for 17 Y-chromosomal STR haplotypes in Jammu and Kashmir Saraswat Brahmin population. Leg Med (Tokyo). 2010 Sep;12(5):249-55. doi: 10.1016/j.legalmed.2010.05.003.

9. Gayden T, Chennakrishnaiah S, La Salvia J, Jimenez S, Regueiro M, Maloney T. Y-STR diversity in the Himalayas. Int J Legal Med. 2011 May;125(3):367-75. doi: 10.1007/s00414-010-0485-x. Epub 2010 Jul 21.

Saturday, December 22, 2012

Yaghnobi Tajiks: Preliminary Results May Reveal Iranian Plateau Affinity [Original Work]

Slipping under the radar of the genetic genealogy world is this paper by Elisabetta Cilli and her colleagues, which investigated the mitochondrial data of 62 individuals from Tajikistan's Yaghnobi population. [1]

The Yaghnobis are of interest given their geographical isolation and the East Iranic nature of their language. Living just northeast of the predominantly Persian (Dari) speaking capital, Dushanbe, Yaghnobi is a continuation of a fully agglutinative Soghdian dialect representing the sole survivor of this language following the Persianization of Central Asia in Medieval times [2]. Despite its' East Iranic vocabulary, Yaghnobi demonstrates several linguistic features (i.e. gender loss, past imperfective preservation from present stem of a verb) which separates it from those modern East Iranic languages immediately surrounding it. Furthering the uniqueness of the Yaghnobi language in this context is the unity it forms through these features with languages mostly spoken further west in the Iranian plateau (e.g. Persian, Gilaki, Kurdish dialects). [2]

Although the results are preliminary and lack any empirical data, Cilli et al. have discovered some interesting connections between the Yaghnobi and relevant populations. In summary, they found the following:

MDS Plot of Results
  • 42 individuals used for the preliminary work belonged to only 19 distinct mtDNA haplotypes. Of these, 11 were distinct among the Yaghnobi.
  • The Yaghnobi have less mtDNA genetic diversity than other Central Asian populations (0.930) and this is attributed to their geographical isolation and recent history of displacement by the U.S.S.R. in the 1970's for agricultural purposes, where a small group (300) returned and repopulated their original homelands.
  • Intriguingly, the Yaghnobi shared all of the mutual haplotypes (8/19) with populations from Iran (e.g. Gilakis, Mazandaranis and Iranians from Tehran and Esfahan) instead of other Central Asian groups, including their Tajik compatriots.
  • The Yaghnobi shared most of these mutual haplotypes with Gilakis, Kurmanji Kurds and Avars from the Caucasus (4 each).
  • However, owing to their predominantly distinct mtDNA character, the Yaghnobi are clear outliers from the general zone occupied by the reference groups. 

My critique and interpretation of these results are as follows:

  • At least two instances of genetic drift occurring (founder effect via geographic isolation, bottleneck due to Soviet relocation) is likely responsible for the decreased mtDNA diversity. Thus, it is clearly simply a reflection of their environment.
  • As a result of the Soviet relocation, it may be useful to determine whether results from the displaced parent population match what has been stated here. This is quite possible given the relocations occurred just over one generation ago (~40 years).
  • It is difficult to criticise the decision to test 62 individuals and the utilisation of 42 haplotypes, given the Yaghnobi population in their homeland between 2007-9 only numbered approximately 500. Approximately 8% of the entire Yaghnobi population was therefore analysed here, which is a generous frequency given the amount of attention the region has received.
  • The MDS plot would have benefited from the inclusion of populations in Europe, Southwest Asia and South Asia to comprehensively flesh out the position of Yaghnobis in Eurasia.
  • Accepting that this is a preliminary investigation, it would still have been pleasing to see some raw data published. Aside from confirming that some/one Yaghnobi matched the Cambridge Reference Sequence (CRS, thus Haplogroup H2a2a which happened to be found in all the populations tested), there is no indication as to what the other mutations looked like. Or, for that matter, what mtDNA haplogroups were even present!


Correlation with Y-Chromosomal Data?

The Yaghnobi have been studied at least one other time through their inclusion in Dr. Spencer Wells et al.'s seminal piece The Eurasian heartland: a continental perspective on Y-chromosome diversity. The breakdown of their Y-Chromosomal SNP data (n=31) is as follows: [3]

3% C-M130(xC3a3-M48)
32% J2-M172
Y-SNP clustering reveals Yaghnobis sit near SE Europe and the Near-East
3% K-M9(xO-M175, O3-M122, O1a-M119, O2a1-M95, N1c1-M46) (possibly parahaplogroup such as K*-M9)
10% L-M20
3% P-M45 (xQ1a1-M120, Q1a3a1-M3, R2a-M124)
32% R1-M173 (likely R1b1a1-M73 or R1b1a2-M269)
16% R1a1a-M17(xR1a-M87, private marker)

Despite the double genetic drift undoubtedly affecting the frequencies, it is worth pointing out that the Yaghnobi presented with a broadly similar Y-DNA spectrum as Iran, where J2-M172, L-M20, R1-M173 and R1a1a-M17 (including subclades) comprise approximately 53% of the national average (refer to Grugni et al. analysis). 

This comparison should be taken with a grain of salt given the Iranian national average also comprises non-Iranic-speaking ethnic groups, the Wells Yaghnobi data does not present with thorough downstream Y-SNP evidence, the sample size is contentious and at least two contributors of a founder effect exist. However, that the Yaghnobi appear rich in J2, L and R is certainly reminiscent of Iranic-speaking populations in the region.


Conclusions

The Yaghnobi are an exceedingly interesting population whose overall parental markers seem to support a connection with populations further west than one would anticipate.

Despite the misgivings of all the data concerning them to date, the mtDNA similarity does corroborate specific linguistic features between the Yaghnobi language with those in the Iranian plateau, such as Kurdish or Persian.

If the data holds up in future investigations, it certainly calls to question whether the proposed model of linguistic inheritance exclusively down the parental line (as represented by Y-DNA data) is entirely correct given this connection.

How the Yaghnobi came to display the markers within them whilst speaking an East Iranic dialect with traits akin to those found in West Iranic languages is an intriguing question. One possible scenario is that the Yaghnobi are partly descended from ancient Iranians from the Iranian plateau during the Achaemanid era. This would also account for the linguistic commonalities noted in current literature.

Time (with the assistance of more mtDNA, Y-DNA and auDNA) will help us understand what happened in Central Asia during the formative period that was the Indo-Iranian migrations.



Reference

1. Cilli E, Delaini P, Costazza B, Giacomello L, Panaino A, Gruppioni G. Ethno-anthropological and genetic study of the Yaghnobis;an isolated community in Central Asia. A preliminary study. J Anthropol Sci. 2011;89:189-94.

2. Windfuhr, G. The Iranian Languages. 1st ed. Routledge Language Family Series. 2009.

3. Wells RS, Yuldasheva N, Ruzibakiev R, Underhill PA, Evseeva I, Blue-Smith J. The Eurasian heartland: a continental perspective on Y-chromosome diversity. Proc Natl Acad Sci U S A. 28;98:10244-9. 2001.

Thursday, July 19, 2012

Interpreting New Iranian Y-Chromosomal Data (Grugni et al.) [Review]


Introduction


A new study on Iranian Y-Chromosomes released just yesterday has, to my satisfaction, adequately sampled every major ethno-linguistic group as well as determining inter-provincial variation between them. Grugni et al. sampled 938 unrelated Iranian men from 15 ethnic groups (including Assyrians, Zoroastrians and Turkmen) in 14 provinces across the country.


Abstract

"Knowledge of high resolution Y-chromosome haplogroup diversification within Iran provides important geographic context regarding the spread and compartmentalization of male lineages in the Middle East and southwestern Asia. At present, the Iranian population is characterized by an extraordinary mix of different ethnic groups speaking a variety of Indo-Iranian, Semitic and Turkic languages. Despite these features, only few studies have investigated the multiethnic components of the Iranian gene pool. In this survey 938 Iranian male DNAs belonging to 15 ethnic groups from 14 Iranian provinces were analyzed for 84 Y-chromosome biallelic markers and 10 STRs. The results show an autochthonous but non-homogeneous ancient background mainly composed by J2a sub-clades with different external contributions. The phylogeography of the main haplogroups allowed identifying post-glacial and Neolithic expansions toward western Eurasia but also recent movements towards the Iranian region from western Eurasia (R1b-L23), Central Asia (Q-M25), Asia Minor (J2a-M92) and southern Mesopotamia (J1-Page08). In spite of the presence of important geographic barriers (Zagros and Alborz mountain ranges, and the Dasht-e Kavir and Dash-e Lut deserts) which may have limited gene flow, AMOVA analysis revealed that language, in addition to geography, has played an important role in shaping the nowadays Iranian gene pool. Overall, this study provides a portrait of the Y-chromosomal variation in Iran, useful for depicting a more comprehensive history of the peoples of this area as well as for reconstructing ancient migration routes. In addition, our results evidence the important role of the Iranian plateau as source and recipient of gene flow between culturally and genetically distinct populations."

[PDF]


Interpretation of Results

Iranian Y-SNP Frequencies

Data from the original study can be found opposite. In addition, several contour maps showing the frequency of select Y-DNA Haplogroups found across the country are shown along the right. Armenians, Zoroastrians and Assyrians from Tehran, as well as Afro-Iranians from Hormozgan province, are excluded. Note that updated ISOGG nomenclature was applied wherever deemed appropriate (refer to SNP's for clarification of status). Frequency ranges shown on maps are from 0-100%. Please note the maps are only intended to depict general trends rather than specific figures. Refer to the figures from the study (above) for these.


- Consistent with anthropological data and historical records from South Iran, the Y-DNA Haplogroups with frequencies greater in Africa than Eurasia (B-M60 and E2-M75) peak in Hormozgan province. 

- Over half a dozen para-Haplogroups (C*-M216, F*-M89, H*-M69, IJ*-M429, J2*-M172, L*-M61, NO*-LLY22g, Q1*-P36.2 and R*-M207) were found scattered across Iran. Although the presence of para-Haplogroups within a region are often taken as an indicator of a lineage's antiquity there, both their consistency and correspondence with downstream younger clades must be considered before such a conclusion is made. As such, I do not consider H*-M69, NO*-LLY22g or C*-M216's presence in this cohort to indicate anything other than Iran's position as a geographic crossroad. The remaining ones (particularly J2*-M172, L*-M61 and R*-M207) require further investigation to elucidate whether Iran does stake the claim to the origins of each.

- Further to the above, it is likely that the R*-M207 reported in this paper is in fact R2*-M479 based on the dated SNP array used.

- C5-M356 makes a sporadic appearance across Iran. A mysterious clade with a spotty distribution across much of Eurasia. In the region, it is more commonly associated with the Indian Subcontinent.
Iranian J1c3-PAGE08

- Haplogroup G makes a strong appearance with, in my opinion, enough clade diversity to validate an origin in Iran or a close-by region. This is partially supported by its' presence in every ethnic group, albeit through different subclades.

- Although IJ*-M429 has finally been found, Grugni et al.'s decision not to publish STR data does not give us the means to determine if the two Mazandarani and Persian men are in fact related within a genealogical timeframe. The significance of this find in Iran will have to remain pending.

The lacklustre SNP definition in the Y-DNA I found in Iran (Gilaki, Bandari, Kurdish and Armenian populations between I1-M253 and I2-M438) dissuades strong conclusions regarding the development of I-M170 relative to IJ*-M429's discovery. The lack of STR's prevents us from ascertaining whether these are recent contributions from Europe or not, or whether there is any European connection to begin with.

- Both the frequency and subclade diversity of Haplogroup J2-M172 (as well as the presence of J2*-M172 and J2a*-M410 across the country) makes Iran a strong candidate for the origin of this lineage.

The strong presence of J1c3-PAGE08 is one of the surprising finds of this study. With an absence only amongst Assyrians from Azarbaijan province and a peak in Khuzestani Arabs (31.6%), I speculate this is an early Near-Eastern pastoralist nomad marker that is only accentuated in Khuzestani Arabs because the L147.1 marker (J1c3d), which is commonly associated with the expansion of Semitic languages (particularly Arabic in literature) was not tested here. Otherwise, it would be difficult to reconcile medieval Arabic admixture among Iran's Zoroastrians being comparable (and often greater) than Azeris, for instance, as Azerbaijan hosted Arab garrisons following the Sassanid collapse.

- Haplogroup Q presents with a very distorted picture. 42.6% of Turkmens belonging to Q1a2-M25 is not in agreement with Wells et al.'s The Eurasian Heartland: A continental perspective on Y-chromosome diversity, where Haplogroups J, N, R1a and R1b predominated, suggesting either an extensive Founder effect has taken place (i.e. regionalisation of certain branches from a common Oghuz Turk pool) or the Golestani Turkmen values have experienced a more generic form of genetic drift.
On the matter of Turkic affinities, Azeri's from Azarbaijan province have greater subclade variation than all other ethnic groups. However, the total frequency is either comparable (or less) than Persians nationwide. As it stands, if one were to presume Haplogroup Q in Iran was of Turkic origins, it would appear their contribution to the Persian and Azeri genepools is comparable despite linguistic differences. Although more data would certainly flesh this matter out, this diversity combined with the presence of N-M216 among Iran's Azeri population certainly gives a genetic basis for their linguistic heritage.

Haplogroup R1a1a-M17 is regularly found at frequencies greater than 15% across Iran, contrary to the assertion made by Dr. Wells one decade ago regarding the limited samples he obtained, again from The Eurasian Heartland: A continental perspective on Y-chromosome diversity ;

Iranian G2a-P15
"Intriguingly, the population of present-day Iran, speaking a major Indo-European language (Farsi), appears to have had little genetic influence from the M17-carrying Indo-Iranians."

It is somewhat ironic, however, to note that the Persians from Fars province presented one of the lowest R1a1a-M17 frequencies observed in this study. Whether sampling chance is an issue here, or the sparsity of M17 is indeed a reality, is an open question.

- The presence of both R1a1-SRY1532.2 (shown as R1a* due to old nomenclature) and R1b*-M343 repeat the presence of these para-Haplogroups in the region, indicating West Asia was from whence Haplogroup R1-M173 began differentiating into the two primary subclades we see today in Eurasia.

Haplogroup R1b1a2a-L23 is more frequent in the north and west of the country, which (together with its' presence in the furthest southern and eastern poles at ~3%) suggests it likely moved in an overall south-easterly direction via diffusion, probably during the Neolithic.

- The distribution of Haplogroup R2a-M124 is, much like C5-M356, irregular. Contrary to what is shown in Haber et al.'s research, R2a is not more common in the east of the country. Instead, it can be found amongst Esfahani Persians at a frequency of 9.1%. That Iran's R2a frequency achieves its' peak in the centre of the country is reminiscent of Sahoo et al.'s A prehistory of Indian Y chromosomes: Evaluating demic diffusion scenarios;


The sensationalist question of the hour; what accounts for the spike in R2a-M124 that has been picked up in Central Iran for the past half decade?

- Finally, Haplogroup T-M70 enjoys a frequency of 10.1% amongst Assyrians from Azarbaijan province, whilst also being more common among Persians across the country and Iranians from the western periphery of the country (Azeris and Kurds). This would suggest, therefore, an at least passive but deep association with ancient Near-Eastern cultures.

Criticisms of Paper

Despite the rich sampling pool, I have several immediate criticisms;

Iranian J1-M267
  • There are some issues with the sampling strategy employed by this paper. For instance, the Assyrians (Christian non-Arab Semitic-speaking minority) are represented by 39 men, although Persians from Esfahan (a major Iranian city) are by 11 only. 
  • Inadequate haplotype data has been released; the only offering is 8-STR's from select lineages (e.g. J1*-M267) which were used for variance analysis.
  • Furthermore, a maximum of 10 Y-STR's were analysed, rendering some of their variance calculations questionable at such a low resolution. This also does away with the possibility of MRCA and intra-subclade age calculations.
  • Grugni et al. have approached Haplogroup R1a1a-M17 in a similar vein to past studies (e.g. Haber et al., see Showcasing of Y-DNA Variation Among Afghan Ethnic Groups) by not referring to current data concerning the structure of R1a1a. As with Haber et al., R1a1a-M458 is taken as the "European" strain, despite research undertaken by the R1a1a and Subclades Y-DNA Project revealing the apparent schism between the upstream Z283 and Z93 SNP's being far more informative in this regard.
  • Haplogroup R1b1a2*-L23 is considered as a "West Eurasian" paternal contribution to the Iranian plateau rather than the possibility it may have originated within or in proximity to the country's western zone. 
  • As shown in Interpretation of Results, Grugni et al.'s use of dated nomenclature poses problems for those who may not be intimately familiar with recent Y-SNP Tree changes by ISOGG.

Acknowledgements

Map of Iran courtesy of D-Maps.com.

Tuesday, July 17, 2012

The Secrets of Central Asia: Chapter II - The Nomads of West Siberia [Review]

Introduction
Molodin et al. have conveniently released an exciting paper just days ago, revealing the convergence and possible origins of maternal lines in several West Siberian sites across different points of time.

The authors made the following conclusions based on the data they had gathered;

"We therefore consider the appearance of the Haplogroup T-lineage as the most likely genetic marker of the Andronovo migration wave to the region....
Apparently, the Andronovo group... assimilated the aboriginal... population, from which it obtained these East-Eurasian mtDNA haplogroups. Obviously, there was reciprocal genetic contact between the migrant and indigenous groups in the region.
...These [autochthonous] components were represented by the Eastern Eurasian haplogroups A, C and Z, and the Western Eurasian haplogroup U5a. On the other hand, the results also reveal some changes in the mtDNA pool structure throughout the Bronze Age. Some of these changes, which point to migration waves to the West Siberian forest steppe zone, are in agreement with the archaeological and anthropological evidence. The most relevant migration waves occurred during the Middle Bronze Age (represented by the migration of the Andronovo culture, probably marked by Haplogroup-T lineages) and the transition from the Bronze to the Iron Age (represented by the migration from the south, marked by the U1a, U3 and H haplogroup lineages)."

[PDF]

In this blog entry, these conclusions reached are scrutinised together with the deeper ancestral associations of these haplogroup lineages with modern (and other ancient) populations.


The Original Paper's Findings
A total of 92 ancient DNA (aDNA) haplotypes in the form of mitochondrial DNA (mtDNA) were retrieved from five sites stratified across seven distinct archaeological periods in a fixed portion of West Siberia known as the Baraba forest-steppe, lying between the network formed between the Irtysh and Ob rivers. These haplotypes were obtained from Hypervariable Region 1 (HVR1) of mtDNA and are included in the original study (shown as Table 3). 

Sampling Sites in Babara Forest-Steppe

As no burial remains have been found dating to the Pleistocene (11th-12th millenium BC) in or around the Baraba forest-steppe, which is the earliest period where anatomically modern humans reached this region, the ultimate origins of the Early Bronze Age lineages are left open to interpretation. Nonetheless, below is a summary of each archaeological culture showcased in the paper, as well as relevant extracts from the literature. [1]


Ust-Tartas (4000-3000 B.C.)

The inhabitants of the earliest grave-containing Baraba prehistoric culture appeared to be Caucasoid-Mongoloid hybrids based on anthropological data whose distribution spanned the swathe of forest from Karelia and the Baltic through to the Ural region. Numerous Russian sources have previously described this concept as the Northern Eurasian Anthropological Formation (e.g. Bunak V.V.). Additionally, a comparison with the nearby Comb-pit Ware culture revealed enough anthropological similarities to suggest the individuals of Ust-Tartas were likely to be autochthonous and not recent migrants.

Extent of the N. Eurasian Anthropological Formation
Of the 18 mtDNA haplotypes retrieved, East Eurasian lineages (A, C, D, Z) comprised a slight majority (11/18). The authors noted "widely distributed root haplotypes" for Haplogroups C and D, which presumably indicates greater antiquity of both in the region. The two individuals belonging to haplogroup A "[represent] a subcluster that is apparently characteristic of West Siberia and the Volga-Ural Region". There was surprise at the presence of Haplogroup Z based on its' absence in modern inhabitants of West Siberians, a topic explored later in this entry.
The seven West Eurasian mtDNA Haplogroups belonged entirely to U, comprising of U2e, U4* and U5a1. The authors recalled the findings of several other recent studies on ancient DNA, stating it likely belonged to "Eastern, Central and Northern European hunter-gatherer groups". [1] 

Besides affirming previous literature concerning the migration corridor between East Europe to East Asia, the haplotypes also complement the anthropological data concerning their status as Mongoloid-Caucasoid hybrids.


Odinovo (3000 B.C.) and Krotovo (Early, 2000 B.C.)

Both of these cultures, regardless of stage, represent a fairly linear continuity from the populations and traditions of the Ust-Tartas culture before them.

The Odinovo culture succeeds Ust-Tartas, although it is viewed as a synthesis between it and the Comb-Pit Ware archaeologically. Anthropological kinship between it and contemporary Baraba findings also confirm the autochthonous nature of Odinovo. However, it differs from its' antecedents in grave objects, funeral rites and the presence of bronze artefacts belonging to the Seima-Turbino cultural phenomenon, a short-lived (2200-1700 B.C.) but "striking" package of metallurgical  goods originating around the Sayan-Altai region in South Siberia that was oriented westwards towards Europe. [2]

In turn, the Krotovo culture is partially derived from Odinovo, although it isn't without its' own influences from adjoining regions. As well as "strikingly different" funeral rites, [1] new archaeological features, including items fashioned out of chalcedony, jaspilite and enstatite, point toward interactions of some degree with the Petrovo culture found further south in Kazakhstan, where the nearest deposits of these materials lie. It is worth noting the physical type of the Krotovo people revealed no significant changes, remaining in-line with the previous autochthonous type.

A total of 16 mtDNA haplotypes were recovered from both Odinovo and the Early Krotovo stage. The spectrum of mtDNA Haplogroups remain unaltered from the Ust-Tartas samples, supporting the archaeological record of continuity. 

The paper goes on to elaborate on the discrepancy between the mtDNA results and the archaeological features of Krotovo by stating "our data did not allow us to detect any Central Asian genetic influence". [1] Several possible explanations which may be considered;
  1. New material items from Petrovo accompanied a male-mediated migration towards Krotovo, resulting in some level of cultural assimilation
  2. In support of the above, the Petrovo culture natives may have themselves been a southward extension of the "Northern Eurasian Anthropological Formation" and belong to the same basic physical type as Ust-Tartas, Odinovo and Krotovo individuals further north, making any inter-culture interactions difficult to infer
  3. Some mode of transmission between Krotovo and Petrovo took place (trade, "package diffusion")

Further information is needed to ascertain which is more probable, including (but not restricted to) Y-Chromosomal data from all concerned cultures for evidence of (dis)continuity between Odinovo and Krotovo through southern influence, as well as anthropological data from Petrovo to determine if they were indeed of the same basic physical type.

The summation of the evidence provided, however, indicates material items from further south were brought northwards into the Baraba forest-steppe after 2000 B.C., but these cultural changes do not reflect in the native maternal lineages, implying less overt processes (or male-mediated migration) were causative.


Krotovo (Late, 1750 B.C.) and Andronovo (1500 B.C.)

The next significant period of Baraban history comes with the arrival of semi-nomadic pastoralists whose origins lay further to the west. We are, of course, referring to the founders of the Andronovo archaeological complex, whose Indo-European language, culture and even ideology had eventually infiltrated deep into the Iranian plateau and Indian subcontinent through their utilisation of both horse and chariot. [3]

Schematic Tree of mtDNA Haplogroups  Found
Within Baraba, despite the Krotovo population coexisting with these newcomers for a length of time (presumably due to their occupancy of different pastoralist niches), we see evidence of a shift from Seima-Turbino to Andronovo with regard to their material traditions. Andronovan dominance is also reflected in the eventual northward displacement of some Krotovo natives based on archaeological data. [1] However, cranioanalysis presents a more complicated picture; the presence of an "autochthonous Mongoloid" variant not typically seen in the Baraba steppe-forest, differing from the hybrid type seen for hundreds of years prior, may suggest the two were not in direct contact and Andronovan influence was exerted by proxy of other native groups who were displaced northwards and east following their assimilation. This is anecdotally supported by Keyser et al.'s discovery of one Andronovo male (specimen S07) from near Krasnoyarsk in South Siberia carrying Y-DNA Haplogroup C*. [4] It is worth stating the physical type of those from Andronovo are commonly described as "Variants of three proto-Europoid types " with minor Mongoloid. [1]

40 mtDNA haplotypes from Late Krotovo (1750 B.C) and Andronovo (1500 B.C.) sites and time periods were taken.  As expected, the same spectrum of mixed West-East Eurasian lineages made an appearance, except for the strong introduction of one new Haplogroup.

In both Late Krotovo and Andronovo, Haplogroup T reaches a stable frequency of 15% in both despite being completely absent in 34 earlier haplotypes. The authors cite this as direct genetic evidence of Andronovan influence on Late Krotovo and postulate this lineage was, as a result, a major contingent in the Andronovo culture's spread.

All of these events precede the Irmen culture (1400-900 B.C.), the eventual successor to Andronovo. Those Irmen individuals found in the Baraba region were found to be predominantly Caucasoid and practiced a mixed economy of agriculture and animal husbandry. Only data from the Late stage (900-800 B.C.) was considered in the study.


Baraba (Late, 1000 B.C.)

The Late Baraba culture is a consequence of a Krotovo-modified Andronovo successor (known as Suzgan) interacting with the Irmen culture (described above). This was a particularly tumultuous period in West Siberian prehistory with tribes continuously coalescing unto one another, forming new identities in the process.

Anthropological data from the Late Baraba culture painted a far more diverse picture than over the previous three millennia. The authors noted that, contrary to the general insignificance of gender on physical type, the men were found to be more similar to a "Southern Eurasian Anthropological Formation", whereas females were closer to the Andronovan derivatives in North Kazakhstan. 

Only five mtDNA haplotypes were recovered from this period. Haplogroups A and C once again were represented, as was U5b and T, indicating the previous assimilation events had been maintained uptil this point. 


Irmen (Late, 900-800 B.C.)

From 1000 B.C. onwards, a complex set of migrations took place in West Siberia between the cultures formed by this point. Archaeologists attribute this to ecological changes involving climatic cooling across the region. 

The last sampled site is the Late Irmen culture, which is a continuation of the Irmen culture proper described earlier in this entry. The intricate interactions between cultures of this period are evident through multi-plural settlements in the archaeological record here.

The final 14 mtDNA haplotypes were, unexpectedly, a complete departure from the partial continuity that we have seen since Ust-Tartas uptil Late Baraba. Almost all belonged to West Eurasian lineages, such as Haplogroups J, K and W. The study had suggested the ultimate origins of these lineages came from further south, in the vicinity of West Kazakhstan and West Central Asia (Turkmenistan and Uzbekistan likely implied). This suggestion will be assessed in detail later in this entry.


Confirmation of the 'Migration Corridor'?
It is remarkable to finally find genetic evidence of the migration corridor, an archaeological concept mentioned several times in Vaêdhya, firmly imprint it in such a definitive way (visit North European Component Variation within the Eurasian Heartland for additional information). 

As it stands, we can now safely conclude that prehistoric hybridisation between hunter-gatherer Paleo-European populations and those from along the East across the Eurasian steppe. The crossover of both along opposing ends of this corridor has been supplemented with aDNA and anthropological evidence, with the finding of a near-equal hybrid population midway between the two poles all but confirming what the raw results have already revealed. Therefore, the connection between Northeast Europe and East Asia through the Eurasian steppe (even before Proto-Indo-European's formation) can no longer be considered a hypothesis, but a verified reality of demic prehistory. If supported with autosomal DNA (auDNA) from similar gravesites, it will drastically alter our perception of the migrations that happened afterwards, as well as doing away with over-simplified models of how certain languages and cultures permeated across Eurasia.


Afanasievo: Without a trail?
It is interesting to note that, despite covering over 3,000 years of prehistory, there is yet to be a trace of the Afanasievo culture, the earliest known offshoot of Yamnaya in the east, across this territory. Under the Eurasian steppe theory, the Afanasievo culture is connected with pastoral nomads who spoke an early (proto) form of the Tocharian branch, an extinct Centum Indo-European language which subverts the Centum:Satem isogloss in Eurasia. [5] The only attested connection between Afanasievo and the Baraba forest-steppe is through interactions between its' successor culture, the Karasuk, with the easternmost of the early Irmen. [1]

The question that persists is thus; where is the Afanasievo trail from Yamanaya through to the Urals and their final archaeological seat in South Siberia? Why have none of the Baraba forest-steppe cultures shown any indication of influence, be it cultural or anthropological, of Caucasoid pastoral nomads before those of Andronovo? 

To arrive at one likely answer, Frachetti's Pastoralist Landscapes and Social Interaction in Bronze Age Eurasia clarifies the material culture and mode of living in Central Asia during the Bronze Age;

"The calibrated C14 dates of Afanas'evo material are generally slightly earlier than those taken from Yamnaya contexts in the western steppe, which complicates a diffusionist explanation of the emergence of pastoralists in the eastern steppe. Although their origins may be obscure, communities associated with Afanas'evo materials still represent the earliest mobile pastoralists east of the Ural Mountains... [their] incipient strategy of cattle and sheep/goat herding, supplemented by hunting and fishing.
The Afanas'evo subsistence economy might best be characterized as a mixed or transitional form between hunting/fishing and localized pastoralism, arising from local antecedents or combining native strategies with diffused domestic innovations among local populations.
...Perhaps the strongest evidence that divides the Yamnaya and Afanas'evo pastoralists in the mid-fourth millenium BCE is the discontinuity of pastoral economic strategies among societies living between these territories."
[6]

If the Afanasievo culture was itself a combination of local hunting strategies and farming practices with their origins further west in the Yamnaya despite differing with contemporary societies above the Black and Caspian seas, one can postulate the Afanasievo people would have likely intermingled with native cultures in South Siberia whilst retaining their core pastoral attributes, and such an event would have occurred some time earlier. 
The Afanasievo bearers needn't travel through the Baraba forest-steppe neither; the maps shown in Chernykh's The “Steppe Belt” of stockbreeding cultures in Eurasia during the Early Metal Age, for instance, show a straight trajectory from the Urals to the Sayan-Altai region out of clarity rather than a factual basis. Little is currently known about the journey taken by these nomads, but the findings of this paper do help in confirming the founders of Afanasievo did not stray along the northern rim of the forest-steppe towards South Siberia.


References
1. Molodin VI, Pilipenko AS, Romaschenko AG, Zhuravlev AA, Trapezov RO. Human migrations in the southern region of the West Siberian Plain during the Bronze Age: Archaeological, palaeogenetic and anthropological data. 2012. Retrieved from here: http://www.degruyter.com/dg/viewbookchapter.fullcontentlink:pdfeventlink/contentUri?t:ac=books$002f9783110266306$002f9783110266306.93$002f9783110266306.93.xml [Last Accessed 17th July 2012]

2. Chernykh E. The “Steppe Belt” of stockbreeding cultures in Eurasia during the Early Metal Age. Trabajos De Prehistoria. 2008;65:73-93.

3. Kuz'mina EE. The Origin of the Indo-Iranians. Koninklijke Brill NV, Leiden, The Netherlands. 2007.

4. Keyser C, Bouakaze C, Crubézy E, Nikolaev VG, Montagnon D. Ancient DNA provides new insights into the history of south Siberian Kurgan people. Hum Genet. 2009;126:395–410.

5. Anthony DW. The Horse, the Wheel, and Language: How Bronze-Age Riders from the Eurasian Steppes Shaped the Modern World. Princeton University Press. 2007.

6. Frachetti MD. Pastoralist Landscapes and Social Interaction in Bronze Age Eurasia. University of California Press, Ltd. 2008.