Y Chromosome Haplotype Reference Database

The Y chromosome Haplotype reference database (YHRD) aims to help with the interpretation of results from comparisons of evidentiary samples typed with Y-STRs and reference samples and to formulate conclusions. Since Y-STRs are located on the non-recombining part of the Y chromosome the profile generated by Y-STR analysis should be considered as one trait coded by one locus (a Haplotype). Consequently, the YHRD provides Haplotype frequencies (>1 locus typed per sample) for common formats consisting of 9-23 loci.
It's been created 1999 and curated until now by Lutz Roewer and Sascha Willuweit at the Institute of Legal Medicine and Forensic Sciences, Charité - Universitätsmedizin Berlin. The database is endorsed by the ISFG[1] and partially funded by Applied Biosystems (merged to Life Technologies in 2008) and Promega Corporation. All population data published in FSI:Genetics are required to be validated by the YHRD curators and afterwards included in the YHRD[2].
By September 2013 almost 115,000 9-locus Haplotypes (including almost 56,000 YFiler Haplotypes) from 851 sampling locations in 113 countries have been submitted by 237 institutes and laboratories. In geographic terms, about 39% of the YHRD samples are from Europe, 32% from Asia, 17% from South America, 6% from North America, 4% from Africa and 2% from Oceania/Australia [3].
Database Structure
The database supports the most frequently used Haplotype formats (e.g. Minimal, SWGDAM, Powerplex Y[4], YFiler[5] and Y23 Haplotypes[6]) for which differently-sized databases exist.
In population genetics the term Metapopulation describes discrete spatially distributed population groups which are interconnected by geneflow and migration[7]. By analogy, the term Metapopulation is used in forensic genetics to describe a set of geographically dispersed populations which are connected by geneflow and are thus more similar within the Metapopulation than to groups outside the Metapopulation [8].
Match calculations performed on basis of such pooled population data should not be significantly affected by sub-population structure. However, large genetic distances between Y chromosome Metapopulation affect match calculations with respect to the used Metapopulation.
Sampled individuals can be grouped as a population according to several shared characteristics such as nationality, geography, language affiliation or ethnic ancestry. Notoriously, these definitions often cannot conclusively define individuals ancestry to a group (i.e. a population). Therefore none of them should be used alone or regarded as superior to other shared proxies.
A new pooling approach in forensic genetics using Y chromosome STR databases is to cluster individual samples to groups according to their phylogenetic ancestry (e.g. Y-SNP defined Haplogroups).
Currently the YHRD database recognizes four separate Metapopulation structures: national, continental, linguistic/ethnic and phylogenetic affiliation with several categories within.
National
The concept of pooling data to build "national databases" has a very straightforward explanation: law enforcement agencies and forensic services rely on their national population to built reference databases. In most instances offenders and victims stem from the national population, and their genetic profiles should thus be represented in the database. In countries like USA, Brazil, UK or China which are characterized by strong population substructure national reference databases are often built on basis of a historical concept of ethnic affiliation, e.g. the US population is sub-structured in a Caucasian, African, Hispanic, Asian and Native American populations or UK differentiates English, Afro-Caribbean, Indo-Pakistani and Chinese. National databases due to their importance in national legislation are thus searchable in the YHRD. Each national Metapopulation in the YHRD comprises all individuals sampled in a particular country regardless of the ancestry of the individuals.
Continental
Continental Metapopulations in the YHRD comprises all individuals sampled in a particular continent regardless of their ancestries. The YHRD defines six continental Metapopulations following the United Nations classification of geographical regions: Africa, Asia, Europe, Latin America, North America, Oceania.
Linguistic/Ethnic
The Metapopulation structure built on basis of "ethnicity/linguistic affiliation" takes to a larger extent the ancestry of sampled individuals into account. "Ancestry" is an term collating historical, cultural, geographical and linguistic categories. Of course, a Metapopulation concept on basis of "ethnicity" is by no means ideal, fully rational or fully translatable, but simply takes the fact into account that on a global level categories other than "nation" or "geography" far better describe the observed genetic clustering and inhomogeneity of Y chromosome patterns.
For a global reference database the "major language group" criterion seems most appropriate to group data by taking the ancestry into account and produce subdatabases with respect to genetic similarity. The reasoning in doing so is twofold: first, language is often an inherited cultural trait and thus the language phylae show strong similarity to genetic traits including the Y chromosome. Second, since languages are well examined by science and mostly understood by the public due to the long tradition of language research, the linguistic terminology is in principal more understandable and translatable into practice than their genetic pendant. Aside from the pure linguistic categorization (e.g. the Altaic language family comprising people speaking Turk and Mongol languages) we took also unifying geographic criterions (Sub-Saharan Africa comprising speakers of different African language groups which live south of the Sahara).
It is important to state, that the current Metapopulation structure is an a-priori categorization which needs a continuous evaluation and verification by means of statistical methods to quantify the genetic similarity/dissimilarity between the samples. While the current categorization of eight large Metapopulations gains some support from genetic distance analysis done on basis of ~41,000 Haplotypes [8] a further subdivision of the "Eurasian - European Metapopulation" was implemented solely on basis of Y-STR Haplotypes. The analysis of ~12,000 European Haplotypes by AMOVA demonstrates that three largely homogeneous pools of European Haplotypes exist: the Western, the Eastern and the Southeastern Metapopulation [9].
Currently the YHRD has eight non-overlapping Metapopulations: Admixed, African, Afro-Asiatic, Amerindian, Australian Aboriginal, East Asian, Eskimo-Aleut, and Eurasian. Some of these Metapopulations are further subdivided, e.g. Eurasian into six subcategories, from which European subgroup splits further into three groups of Western, Eastern and Southeastern Europeans.
Database Tools
AMOVA
Analysis of molecular variance (AMOVA) is a method for analyzing population variation using molecular data, e.g. Y-STR Haplotypes [10]. With AMOVA it is possible to evaluate and quantify the extent of differentiation between two or more population samples. AMOVA is implemented as an online tool in the YHRD and provides a way of estimating ΦST and FST values. The online tool accepts Excel files and creates entry files from it. Attention: All entries highlighted in red will be ignored (e.g. a column ID or population name). So make clear that if you want to compare YFiler Haplotypes with reference studies including only minimal Haplotypes all additional loci are ignored. After you have submitted your entry file the program asks to confirm it and you can still do changes if necessary. As much as 9 reference populations selected from the YHRD as well as population sets can be added to the AMOVA analysis. The online calculation returns as a result a *.csv table with pairwise FST or ΦST(RST) values plus p-values as a test for significance (10,000 permutations). In addition, an MDS plot is generated to illustrate the genetic distance between the analyzed populations graphically. The program shows the references for the selected population studies which facilitates the correct citation.
Releases
Date | Release | Haplotypes | Milestone |
---|---|---|---|
August 1, 1999 | 1 | 2,517 | YHRD 1.0 |
June 16, 2000 | 1a | 3,589 | |
January 1, 2003 | 2 | 18,050 | |
August 18, 2003 | 3 | 19,482 | |
October 30, 2003 | 4 | 20,152 | |
July 11, 2003 | 5 | 20,320 | |
October 12, 2003 | 6 | 20,865 | |
December 29, 2003 | 8,9 | 21,446 | |
February 24, 2004 | 10 | 21,546 | |
February 26, 2004 | 11 | 22,872 | |
April 13, 2004 | 12 | 24,524 | YHRD 2.0 |
May 24, 2004 | 13 | 25,066 | |
July 1, 2004 | 14 | 26,325 | |
September 18, 2004 | 15 | 28,649 | |
December 17, 2004 | 16 | 32,196 | |
May 31, 2005 | 17 | 34,558 | |
October 14, 2005 | 18 | 38,761 | |
January 31, 2006 | 19 | 41,965 | |
August 1, 2006 | 20 | 46,831 | |
December 28, 2006 | 21 | 51,253 | |
April 13, 2007 | 22 | 52,655 | |
August 10, 2007 | 23 | 54,833 | |
July 23, 2008 | 24 | 59,004 | YHRD 3.0 |
October 1, 2008 | 25 | 65,165 | |
January 29, 2009 | 26 | 68,108 | |
February 13, 2009 | 27 | 72,082 | |
March 23, 2009 | 28 | 72,055 | |
June 12, 2009 | 29 | 74,742 | |
August 21, 2009 | 30 | 79,147 | |
November 16, 2009 | 31 | 81,099 | |
December 18, 2009 | 32 | 84,047 | |
March 3, 2010 | 33 | 86,568 | |
July 16, 2010 | 34 | 89,237 | |
December 30, 2010 | 35 | 91,601 | |
May 15, 2011 | 36 | 93,290 | |
June 21, 2011 | 37 | 97,575 | |
December 30, 2011 | 38 | 99,881 | |
February 17, 2012 | 39 | 101,055 | |
August 29, 2012 | 40 | 104,174 | |
October 1, 2012 | 41 | 105,498 | |
January 11, 2013 | 42 | 108,949 | |
January 18, 2013 | 43 | 112,005 | |
July 12, 2013 | 44 | 114,256 |
See also
References
- ^ "ISFG Homepage". Retrieved 25 September 2013.
- ^ "FSIGEN Publishing Guidelines". Retrieved 25 September 2013.
- ^ "YHRD Homepage". Retrieved 25 September 2013.
- ^ "Promega PowerPlex Y". Retrieved 25 September 2013.
- ^ "Applied Biosystem YFiler". Retrieved 25 September 2013.
- ^ "Promega PowerPlex Y23". Retrieved 25 September 2013.
- ^ Hanski, I. and Gilpin, M. (1997). Metapopulation Biology: Ecology, Genetics, and Evolution., Academic Press, San Diego.
- ^ a b Willuweit, S., Roewer, L. and The International Forensic Y Chromosome User Group (2007). Y chromosome haplotype reference database (YHRD): Update., Forensic Sci Int Genet 1(2): 83--87.
- ^ Roewer, L., Croucher, P. J. P., Willuweit, S., Lu, T. T., Kayser, M., Lessig, R., de Knijff, P., Jobling, M. A., Tyler- Smith, C. and Krawczak, M. (2005). Signature of recent historical events in the european y-chromosomal STR haplotype distribution., Hum Genet 116(4): 279--291.
- ^ Roewer, L., Kayser, M., Dieltjes, P., Nagy, M., Bakker, E., Krawczak, M. and de Knijff, P. (1996). Analysis of molecular variance (AMOVA) of y-chromosome-specific microsatellites in two closely related human populations., Hum Mol Genet 5(7): 1029--1033.