Wednesday, September 11, 2013

Detailed East Asian Calculator Update

Since my project announcement on 23andme, I have received 30 raw data so far. 19 of which I merged with the following database:
  1. HGDP
  2. HapMap
  3. Rasmussen et al
  4. Metspalu et al
  5. Fedorova et al
  6. Chaubey et al
  7. Behar et al
  8. Yunusbayev et al
  9. Schlebusch et al
  10. Henn (2011) et al
  11. Haber et al


I was not able to include the few new 23andme donors are I already started with the cross validation runs. Once I complete the runs I will merge the rest of the unrelated 23andme samples and start with the calculator.

I ran a script that was supposed to run it up to K=16. The other day, my machine finished up to 9 so far so I stopped it. I dug through the K=9 output to perform check which I can supervise. I have not run supervised runs so I do not know exactly how the algorithm will behave but I assume it will basically not further find sub-populations for the ones I really am not interested at the moment (mainly non-East Asians). For example if I assign all Africans as supervised Africans, it will not further look into African subpopulations (i.e. N Africans, S Africans, Khoisan, etc.). The supervise run will help me with computing time so Admixture will extract populations for mainly East Asians.

While deciding the supervise runs, I restarted running again with the new used desktop. Today it finished up to K=11. I want to present these two data K=9 and K=11.

Fst table above is for K=9. I sorted the first column as what is closest to E Asians.

Dendrogram above is for K=9

Fst table above is for K=11. I sorted the first column as what is closest to E Asians.

Dendrogram above is for K=11


The dendrograms are as expected for E Asians and other populations.

The S Asians further subdivided into two populations: S Asians & Dravidan speakers. Dravidans are from South India; they speak Telugu, Malayalam, Kannada, Tamil etc. I have a a few co-workers which I also consider friends that speak the language. Once Dravidan was separated from the rest of the S Asian population, the S Asians became much closer to the Europeans.

The Africans further subdivided into the Khoisan and Niger-Congo. The Hadza peoples language are considered an isolate but would not be surprised if linguists starts linking them to the Niger-Congo due to close genetic distance.

This further subdivision of non E/N Asians also shows why I need supervised runs. I really want to focus on E/N Asian subgroups.