February 14, 2018

Dystruct versus Admixture

Not really able yet to discern if this is an alternative way ahead for autosomal archaeogenetics or just another dead end. But it does seem interesting enough to mention here in any case.

It may be very important in the deciphering of the so-called "ANE" ghostly genetic influence.

Tyler A. Joseph & Itsik Pe'er. Inference of population structure from ancient DNA. bioRXiv 2018 (pre-pub). DOI:10.1101/261131

Methods for inferring population structure from genetic information traditionally assume samples are contemporary. Yet, the increasing availability of ancient DNA sequences begs revision of this paradigm. We present Dystruct (Dynamic Structure), a framework and toolbox for inference of shared ancestry from data that include ancient DNA. By explicitly modeling population history and genetic drift as a time-series, Dystruct more accurately and realistically discovers shared ancestry from ancient and contemporary samples. Formally, we use a normal approximation of drift, which allows a novel, efficient algorithm for optimizing model parameters using stochastic variational inference. We show that Dystruct outperforms the state of the art when individuals are sampled over time, as is common in ancient DNA datasets. We further demonstrate the utility of our method on a dataset of 92 ancient samples alongside 1941 modern ones genotyped at 222755 loci. Our model tends to present modern samples as the mixtures of ancestral populations they really are, rather than the artifactual converse of presenting ancestral samples as mixtures of contemporary groups.

Still digesting this one but I do find very intriguing that they claim that Dystruct has much less time-entropy than ADMIXTURE (i.e. the relation between ancient and modern populations seems to be better identified) and that, using this method they get that the Samara (proto-Indoeuropean) population becomes much more clearly related to Kostenki-14 (a Gravettian hunter-gatherer from the Don area) and that the Paleo-Siberian "ANE" individuals form then their own distinct cluster with very limited impact in Europe (but much larger in parts of Asia (not labeled: South Asia?). This Kostenki-Samara "orange" component keeps influencing Western Indoeuropeans (Corded Ware, Unetice) but at markedly decreasing frequencies of "purity". 

However the first admixture of Corded Ware is not with earlier farmers (mostly "green") but with some sort of late "hunter-gatherer" population ("brown" or "maroon" component. Only after the backlash of Bell Beaker, which in Central Europe appears as a mix of Neolithic peoples, Indoeuropeans and maybe even more of that mysterious extra HG element, we see some "return of the farmers", which clearly persists in Unetice.

In general, modern Europeans are (fig.5a, not shown here) quite "greener" than Unetice and some populations (I'm guessing Sardinians and Basques, no labels provided) have zero "orange" (IE) component, which ranges (my visual estimate) between 9% and  27% otherwise.

Fig.5-b (click to expand): Ancestry estimates for 92 ancient samples. The three leftmost samples are the Pleistocene hunter-gatherers. In Dystruct, late Neolithic samples and beyond present as a mixture of hunter-gatherers, Yamnaya steppe herders,and early Neolithic samples, matching supported historical migrations of steppe herders into Eastern and Western Europe.


  1. Thank you. Very interesting how the CWC interacted with HG & the BBC had more Farmer input( from where?). It’s a great time to be alive. We know so much about our ancient ancestors migrations. I’ve studied the Pontic Caspian Steppe Bronze Age Herders as my mtDNA spread with those Herders migrations. However, it had very little impact on my Autosomal DNA.

    1. Something I'm pretty sure about is that the CW invasion did not kill all the farmers. The data we have is from very specific sites and regions and for example West Germany is poorly studied, among many other West European regions; the people who are really at the forefront of archaeogenetic research are based in East Germany, and to some extent in Sweden also, and that implies some important sampling biases for autosomal data. For ADNmt, a simpler technique, there is more abundance of data and to my eyes it does suggest that Western Europe and very possibly regions like Rhineland-Westfalia, Denmark, etc. could be critical in explaining the changes. Of course I'd like more archaeogenetic research in France: a huge state with a most important role in European prehistory, role still not properly ascertained. Also from the Low Countries, Iberia, Britain, Italy even: we know so little from West of the Elbe!

    2. If you read the paper, they discuss a lot their simulations and one thing they state is that it's perfectly possible to sample some key ancestral population just once: it happened to them in their virtual scenarios.

  2. Glad to see you back! This has always been my favorite blog about prehistory, and I was distraught to see it die :-)

    Looking at the paper it seems that in the authors' model the Samara_HG is not related to the Yamnaya herders, but rather to the other European hunter gatherers. I think you misread that bit.

    Though it seems kind of hard to believe that unmixed descendants of Kostenki still existed by the bronze age - unless the were tucked away in the Caucasus, southern Central Asia or some other region yet unsampled. The dystruct model would suggest that the newcomers completely replaced the Mesolithic population of Russia & Ukraine.

    1. Why not? Kostenki was Gravettian and these guys were soon after the last Epigravettian. There are only so many prehistorical migrations of some size to Europe, you know: (1) the Protoaurignacian or "Aurignacoid" one c. 50 Ka BP, (2) the Gravettian one c. 32 Ka BP and (3) the Neolithic one c. 8 Ka BP. Also it gets rid of that pesky ANE ghost, which is on its own definitely a good sign.

      As for the blog, I don't think i'll be the same. I'm less into following everything now and more in the mood of "this is worth noting". But thanks for the kind comment in any case.

    2. I guess moreso than the temporal distance I was surprised by the fact that the dysfunct model gets rid of the relationship of Yamnaya and the Karelia-Samara hunter-gatherers. Though I think that said relationship mostly had to do with the quite muddled ANE ghost component. So this might actually be more accurate than earlier models.

      Modern Caucasus populations show the highest contribution of the orange Kostenki component. I guess that would make sense given the proximity to the steppe. It's a shame they didn't include the CHGs and early Iranian samples. I'd think they'd come out very 'orange' as well.

    3. Yeah, what you say about Karelia-HG is true and a bit perplexing. I had not thought about it twice, as they appear as "absolutely normal European-HGs". I did comment on the pre-pub study that in order "to be reasonably certain of this, I'd suggest reprocessing using also Iran-Neolithic and Caucasus-HG samples"...


Please, be reasonably respectful when making comments. I do not tolerate in particular sexism, racism nor homophobia. Personal attacks, manipulation and trolling are also very much unwelcome here.The author reserves the right to delete any abusive comment.

Preliminary comment moderation is... ON (it may take some time before your comment is published, thank that to Trumptards).