Since my project announcement on 23andme, I have received 30 raw data so far. 19 of which I merged with the following database:
- HGDP
- HapMap
- Rasmussen et al
- Metspalu et al
- Fedorova et al
- Chaubey et al
- Behar et al
- Yunusbayev et al
- Schlebusch et al
- Henn (2011) et al
- Haber et al
I was not able to include the few new 23andme donors are I already started with the cross validation runs. Once I complete the runs I will merge the rest of the unrelated 23andme samples and start with the calculator.
I ran a script that was supposed to run it up to K=16. The other day, my machine finished up to 9 so far so I stopped it. I dug through the K=9 output to perform check which I can supervise. I have not run supervised runs so I do not know exactly how the algorithm will behave but I assume it will basically not further find sub-populations for the ones I really am not interested at the moment (mainly non-East Asians). For example if I assign all Africans as supervised Africans, it will not further look into African subpopulations (i.e. N Africans, S Africans, Khoisan, etc.). The supervise run will help me with computing time so Admixture will extract populations for mainly East Asians.
While deciding the supervise runs, I restarted running again with the new used desktop. Today it finished up to K=11. I want to present these two data K=9 and K=11.
Dendrogram above is for K=9
Fst table above is for K=11. I sorted the first column as what is closest to E Asians.
Dendrogram above is for K=11
The dendrograms are as expected for E Asians and other populations.
The S Asians further subdivided into two populations: S Asians & Dravidan speakers. Dravidans are from South India; they speak Telugu, Malayalam, Kannada, Tamil etc. I have a a few co-workers which I also consider friends that speak the language. Once Dravidan was separated from the rest of the S Asian population, the S Asians became much closer to the Europeans.
The Africans further subdivided into the Khoisan and Niger-Congo. The Hadza peoples language are considered an isolate but would not be surprised if linguists starts linking them to the Niger-Congo due to close genetic distance.
This further subdivision of non E/N Asians also shows why I need supervised runs. I really want to focus on E/N Asian subgroups.