Friday, October 18, 2013

Detailed East Asian Calculator Update (K=19)

For obvious reasons, my machine is now taking longer to run higher Ks. Because of this, I have enlisted the help of a friend who works at a local university which has really fast machines. I removed the 23andme participants that I do not own and used the university machine to run the higher Ks. Same purpose of breaking down and finding the E & N Asian subdivisions. Again, once I find the optimum K, I will rerun the set with the 23andme participants on my machine.

I’ll try a jump in K value. Maybe I’ll try 50 just to see.

So far, the last interesting run is K=19.

To save time (since I'm doing this on a part time basis) I will likely not detail the non- E/N Asian populations. I'll assign them as African, W Asian, S Asian, & European.

The S Asian - Indian can also be S Asian as a whole including populations in Pakistan.




Friday, October 4, 2013

Detailed East Asian Calculator Update (K=16)

Just completed up to K=21 but K=16 has the lowest cross validation error, so I’ll present K=16 results here.

As expected, basically has 4 large clusters:

  1. Africans
  2. Papuans
  3. Europeans/Middle Easterners/S, SW Asians
  4. N & E Asians, Native Americans


In general, the Papuans & Native Americans remains as is (no further breakdown).

NORTH & EAST ASIANS

The E Asians further subdivided into 3 groups. The Nihonjin (Japanese) is the first ethnicity that is appearing; they are however still part of the larger E Asian population. The SE Asians in this run are not only from SE Asia (both Mainland and Island SE Asia) but also comprised of non-Han people from South China (from Guangdong to Yunnan provinces). I expect the SE Asians will be very diverse, hence, will breakdown into further subgroups.

The Eskimo-Aleuts are also forming as a subgroup of N Asians. Just to note, the Eskimo-Aleuts include populations from North America; it’s basically people in and around the Bering Strait. For the mean time I will call them “North Asians – Eskimo-Aleut”.

One thing that would be great to have are participants from Malagasy people of Madagascar. They are migrants from both mainland Africa and SE Asia (more specifically Borneo). Here’s the genetic and linguistic evidence of the SE Asian source.

One of the Hawaiian participants have large SE Asian component. This is expected because the Polynesian origins are of SE Asia (linguistic evidence of the SE Asian source). One distinction though is the larger Papuan component. One theory (Bellwood), that I agree with, shows that as they migrated out of Island SE Asia, they came and admixed with the local Papuan population. As they explored further to populate Oceania, they brought with them the Papuan genes.

Here’s more info on the Papuan and Negrito populations of SE Asia.

AFRICANS

I stand corrected. The Hadza people seems to be diverging away from most of the African populations. No wonder the language is considered an isolate. We now have the Khoisans, E & W Africans.

I noticed the “African Caribbean in Barbados” (ACB) and “African Ancestry in SW USA” (ASW) data from 1000Genomes cluster with the West Africans. I know this could be a touchy subject but I would like to say objectively that the ACB & ASW population may have been descendants of West Africans (mainly Niger-Congo speakers) or at least descendants from those people similar to West Africans.

EUROPEANS, MIDDLE EASTERNERS, & S/SW ASIANS

The S Asians include the Dravidans. It’s worth noting the SW Asians and N Europeans are clustered together and the same for the Middle Easterners and S Europeans; not sure why but it could be some clues to migration patterns. What I called SW Asians include Caucasus people. The Middle Easterners are basically Semitic speaking people (Arabic mostly). I was checking these results to 23andme and Gedmatch tools and some minor details are not aligning well but overall is pretty representative. But again, the focus of this project is the E & N Asians. If you’re interested in learning more of the S Asian and the Indian Subcontinent, Zack Jamal of Harappa project has a lot of information. For the Europeans/Middle Easterners, Dienekes Pontikos is a great source.

NOTES


  • I initially wanted to run supervised but I misunderstood the model. For every supervised K run, the algorithm needs the same amount of pre-identified K (not just African or any other groupings). This defeats the purpose of the project. So no supervised runs at the moment. I will keep running unsupervised until I see further subgrouping of the N & E Asians. Perhaps once I find the optimum K, I will re-run supervised to reduce the African and European subgroups.
  • Still no luck resurrecting the Singapore Malays but I haven’t given up yet.
  • I’m looking at creating a pseudo Negrito population; a hybrid. The plan is to extract all the Negrito markers from PASNP and use someone else’s markers and replace those markers with the Negrito. Not sure what the result will looks like but we’ll see. OpenSNP has a lot of open DNA raw data. I will pick from those.
  • OpenSNP may have some N & E Asian folks. I’ll run it with PASNP to see which ones and maybe include it in the project.
  • Current run includes 27 of the 38 donors. Out of 38, only 27 were unrelated.
  • I included one sample, SL00010 (who agreed to allow me to post it on this blog). The large SE Asian chunk needs to be subdivided further which should happen in larger K runs.
  • My European, South Asian, Middle Eastern and African assignments may not be accurate but it’s the best the list has been giving me. I’m mainly focusing on the E & N Asian components which we’ll probably see in higher K runs.
  • The CV error at K=16 is 0.52234
  • Based from the Admixture paper, the minimum amount of markers we must have is inversely proportional to the target Fst. For this project, the smallest Fst so far is 0.015 which means I need to have at least 66,667 markers. I have about 200,000 markers for this project which is more than enough for accuracy. This means the min Fst I can go would be 0.005; we’ll see how far K will take us :-)