Thursday, April 14, 2011

IBS similarity matrix and Population Concordance Ratio for Dodecad populations

In Dienekes' Anthropology Blog, I presented a new method of comparing populations, the population concordance ratio. You can refer to that post for the rationale, definitions, and code, but for the present, I will just say that this ratio estimates the probability that two random individuals from a population A are more similar to each other than either of them is to a random individual from another population B. Its expected value ranges from 0.25 (two very similar populations) to 1 (two very dissimilar populations).

Another common way of comparing populations is by computing an identity-by-state (IBS) similarity matrix. Comparing the genomes of two individuals across many loci, you can get a number (IBS) ranging beteween 0 and 1: in humans 0 is almost never encountered, as two random individuals may share some alleles in common by pure chance, while 1 indicates either monozygotic twins or a clerical error.

I have computed these two statistics over populations of the Dodecad Project with at least 5 individuals. The analysis is based on 282,409 SNPs with a 99%+ genotyping rate over the combined sample.

The results can be found in this spreadsheet.

[NOTE: I have taken down the spreadsheet on Apr 15, in order to investigate a possible error in the Brazilian_D sample]

[NOTE II: The results seem to be correct, so spreadsheet is back up]

For the population concordance ratio each row represents an estimate of the probability that two individuals from that population are more similar to each other than either of them is to a member from a population in each column; this is an asymmetric matrix.

Below are some visualizations of these statistics for the Greek_D sample.

First, the IBS similarity matrix. These ranged between 0.70383 and 0.73689, so I have subtracted 0.7 in order to bring out the scale of the differences.

Second, the population concordance ratio:


5 comments:

  1. If I correctly understood that stuff means that Greeks and other caucasoid groups are closer to kongoid (aka negroid) populations than to Mongoloid ones?
    If it's the case it's another major difficulity opposing the opinion that Indo-European is GENETICALLY connected to uralo-siberian, altaic, chukotki-kamchtakan, nivkh and eskimo-aleut; and the existing similarities should be explained rather by indo-european/para-indo-european/nostratic populations exapnding to siberia and influencing[both lexically+pronouns&verb endings]the "primitive" languages of the siberian hunters&gatherers populations.

    ReplyDelete
  2. There are no Sub-Saharan African populations currently in the Dodecad Project. East Africans are indeed somewhat closer than East Asians are, but this can be explained by variable amounts of Caucasoid admixture.

    ReplyDelete
  3. Brazilian_D and Swedish_D PCR is 0.05? Perhaps I'm not reading this correctly, but it doesn't make sense, and it's a lot lower than 0.25.

    ReplyDelete
  4. You are right, it doesn't make sense. This is the first time I've run the Brazilian_D sample, or it could be an error of sorting/spreadsheet preparation along the way. I'll investigate and update the post.

    ReplyDelete
  5. What seems to be happening is because of the variable levels of admixture in the Brazilian sample, two random Brazilians tend not to be very similar to each other. I will have to think exactly how this applies to admixed populations, but here are the raw data for anyone willing to take a crack at it.

    IBS similarities between 5 Brazilians:


    V1 V2 V3 V4 V5
    1 1.000000 0.733057 0.726173 0.733717 0.734259
    2 0.733057 1.000000 0.725256 0.733967 0.733061
    3 0.726173 0.725256 1.000000 0.725253 0.725146
    4 0.733717 0.733967 0.725253 1.000000 0.734226
    5 0.734259 0.733061 0.725146 0.734226 1.000000

    IBS similarities between 5 Brazilians (rows) and 6 Swedes (columns)


    V42 V43 V44 V45 V46 V47
    1 0.735585 0.733919 0.732065 0.734764 0.735230 0.734643
    2 0.736570 0.733727 0.732779 0.733094 0.735389 0.734379
    3 0.725671 0.724739 0.724271 0.726271 0.725720 0.725812
    4 0.735987 0.736539 0.736205 0.735156 0.736287 0.733187
    5 0.736539 0.733947 0.733122 0.735796 0.735827 0.734480


    60 trios (n=5, m=6, so there are n(n-1)m/2 = 60 trios). Each row has:

    1. IBS between two Brazilians
    2. IBS between first Brazilian and a Swede
    3. IBS between second Brazilian and a Swede
    4. 1 if concordant trio; 0 otherwise


    [1] 0.733057 0.735585 0.736570 0.000000
    [1] 0.733057 0.733919 0.733727 0.000000
    [1] 0.733057 0.732065 0.732779 1.000000
    [1] 0.733057 0.734764 0.733094 0.000000
    [1] 0.733057 0.735230 0.735389 0.000000
    [1] 0.733057 0.734643 0.734379 0.000000
    [1] 0.726173 0.735585 0.725671 0.000000
    [1] 0.726173 0.733919 0.724739 0.000000
    [1] 0.726173 0.732065 0.724271 0.000000
    [1] 0.726173 0.734764 0.726271 0.000000
    [1] 0.726173 0.735230 0.725720 0.000000
    [1] 0.726173 0.734643 0.725812 0.000000
    [1] 0.733717 0.735585 0.735987 0.000000
    [1] 0.733717 0.733919 0.736539 0.000000
    [1] 0.733717 0.732065 0.736205 0.000000
    [1] 0.733717 0.734764 0.735156 0.000000
    [1] 0.733717 0.735230 0.736287 0.000000
    [1] 0.733717 0.734643 0.733187 0.000000
    [1] 0.734259 0.735585 0.736539 0.000000
    [1] 0.734259 0.733919 0.733947 1.000000
    [1] 0.734259 0.732065 0.733122 1.000000
    [1] 0.734259 0.734764 0.735796 0.000000
    [1] 0.734259 0.735230 0.735827 0.000000
    [1] 0.734259 0.734643 0.734480 0.000000
    [1] 0.725256 0.736570 0.725671 0.000000
    [1] 0.725256 0.733727 0.724739 0.000000
    [1] 0.725256 0.732779 0.724271 0.000000
    [1] 0.725256 0.733094 0.726271 0.000000
    [1] 0.725256 0.735389 0.725720 0.000000
    [1] 0.725256 0.734379 0.725812 0.000000
    [1] 0.733967 0.736570 0.735987 0.000000
    [1] 0.733967 0.733727 0.736539 0.000000
    [1] 0.733967 0.732779 0.736205 0.000000
    [1] 0.733967 0.733094 0.735156 0.000000
    [1] 0.733967 0.735389 0.736287 0.000000
    [1] 0.733967 0.734379 0.733187 0.000000
    [1] 0.733061 0.736570 0.736539 0.000000
    [1] 0.733061 0.733727 0.733947 0.000000
    [1] 0.733061 0.732779 0.733122 0.000000
    [1] 0.733061 0.733094 0.735796 0.000000
    [1] 0.733061 0.735389 0.735827 0.000000
    [1] 0.733061 0.734379 0.734480 0.000000
    [1] 0.725253 0.725671 0.735987 0.000000
    [1] 0.725253 0.724739 0.736539 0.000000
    [1] 0.725253 0.724271 0.736205 0.000000
    [1] 0.725253 0.726271 0.735156 0.000000
    [1] 0.725253 0.725720 0.736287 0.000000
    [1] 0.725253 0.725812 0.733187 0.000000
    [1] 0.725146 0.725671 0.736539 0.000000
    [1] 0.725146 0.724739 0.733947 0.000000
    [1] 0.725146 0.724271 0.733122 0.000000
    [1] 0.725146 0.726271 0.735796 0.000000
    [1] 0.725146 0.725720 0.735827 0.000000
    [1] 0.725146 0.725812 0.734480 0.000000
    [1] 0.734226 0.735987 0.736539 0.000000
    [1] 0.734226 0.736539 0.733947 0.000000
    [1] 0.734226 0.736205 0.733122 0.000000
    [1] 0.734226 0.735156 0.735796 0.000000
    [1] 0.734226 0.736287 0.735827 0.000000
    [1] 0.734226 0.733187 0.734480 0.000000

    ReplyDelete