How are SNPs Named?
The names for SNPs are a source of great confusion. Many SNPs have multiple
names and some of the names have no relationship to the nature of the SNP.
We can't go into all the complexities here, but we can describe a general
picture and give examples.
"SNP" stands for single-nucleotide polymorphism, referring to the fact that
a base pair at a particular position may mutate from one nucleotide base to another.
A cytosine (C) may be replaced with adenine (A) or guanine (G) with thymine
(T). This will also change the complementary nucleotide on the other half of
SNP Name Systems
The basic systems in place for naming a SNP include:
- By who discovered it and in when
- By chromosome position and allele form.
- NCBI reference number
All of these systems are in current use.
Who found it
It's long been the custom in science that a person discovering a
phenomenon is entitled to name it. So, too, with SNPs. If you discover a new
scientific principle (Boyle's Law) , comet (Halley's), spider (Aphonopelma
johnnycashi) or quantum mechanics particle (Higgs Boson) you get to name it.
(TYou aren't supposed to name it after yourself but others can attributie it
With SNPs, a convention was established that names would consist of leading
letters, followed by a number representing the order in which that person (or
group) found it. For example, when Dr. Michael Hammer at the U. of Arizona found
a mutation from C (cytosine) to A (adenine) at position 22157311 (build GRCh37)
on the Y chromosome, he named it "P312"
because he'd found 311 other mutations previously.
P312 (S116, PF6547, 22157311A>C, rs34276300) turned out to be an important SNP,
marking a hereditary boundary between two very large bunches of European R1b
men who are M269+ (AKA PF6517, 22739367T>C & rs9786153). The other large group is
U106+ (AKA S21 & M405) with the NCBI name rs16981293 and
position name 8796078C>T.
This "name by finder" system works fairly well when the pace of discovery isn't
too fast for scientists to check with each other and agree on a comon name. It
breaks down when (A) discoveries come faster than publication and (B) scientists
forget the "don't use you own name" rule.
The problem is that multiple people have
discovered (and continue to discover) the same, identical things,
giving them multiple names. Dr. James Wilson at Edinburgh University also
discovered the mutation from C to A at position 22157311 and named it S116
(his 116th). And Dr. Paolo Francalacci, at Universita di Sassari in Italy, found the same thing and
named it PF6547.
We have created a situation of mutual incomprehension; we don't understand
each other. Discussion groups are
filled with debates about the "proper name" for the same SNP. We've
seen it said, "I can't be P312+; I'm S116+."
This, however, is not an intentional conspiracy to confuse and mislead. It
just "grew like Topsy".
Another way of identifying a SNP is by where it occurs on the chromosome
and its allele form. For example, P312 (AKA S116, AKA PF6547) would be
In this system, each SNP has only one name -- one at a time, that is. (See
below.) Many researchers (especially FTDNA) use this system internally to
identify and catalogue SNPs.
Position numbers change
Because scientists are still learning more of the intricacies of the Y-chromosome
and building revised models to better map the chromosome, position numbers can change with
each successive model (known as a "build). To be clear which SNP is meant,
it's best to cite the build number for the position ID.
Some (but not all) SNPs have been registered with the National Center for
Biotechnology information (NCBI), which assigns them a name consisting of the
letters "rs" followed by a unique number. Submission for
registration is a formal process with extensive documentation requirements.
Further, NCBI registration results in publication; some researchers may not
want the SNP published.
Example: P312/S116/PF6547 is registered with NCBI as
rs34276300 [Homo sapiens].
Its NCBI page gives this information:
Problem common to all the above
None of these systems relate, in any way, to where an SNP falls on the
phylogenetic tree. In practice, SNPs are first discovered, then their
phylogenetic meanings are teased out.
"Who found" List
SNP Naming Convention Sun Oct 16, 2016 9:37 pm (PDT) . Posted by:
- A = YSEQ.net, Houston, Texas: Thomas Krahn, MSc (Dipl.-Ing.),
- AD = Ministry of Education (Kuwait): Dr. Mohammed Al Sharija,
- AF = University of Arizona, Tucson, Arizona: Fernando
- B = Estonian Genome Center
- BY = Big Y, Family Tree DNA, Houston, Texas
- CTS = The Wellcome Trust Sanger Institute, Hinxton, England: Chris
- DC = Dal Cais, an Irish group believed to be descended from Cas, b. CE 347, related to SNP R-L266; Dennis Wright
- DF = anonymous researcher using publicly available full-genome-sequence data, including 1000 Genomes Project data; named in honor of the DNA-Forums.org genetic genealogy community
- F = Fudan University, Shanghai, China: Li Jin, Ph.D.,
- F* = Fudan University, Shanghai, China: Chuan-Chao Wang, Hui Li, (Beginning letter F; second letter Haplogroup, i.e. FI is Fudan Haplogroup I)
- FGC = Full Genomes Corp. of Virginia and Maryland
- G = IPATIMUP Instituto de Patologia e Imunologia Molecular da Universidade do Porto (Institute of Molecular Pathology and Immunology of the University of Porto):
- IMS-JST = Institute of Medical Science-Japan Science and Technology Agency
- K = Youngmin JeongAhn, grad student; Education: Seoul National University and the University of Arizona
- KHS = Functional Genomics Research Center, Korea Research Institute of Bioscience and Biotechnology
- KL = Key Laboratory of Contemporary Anthropology, School of Life Sciences and Institutes of Biomedical Sciences, Fudan University, Shanghai, China
- KMS = Segdul Kodzhakov; Albert Katchiev; Anatole Klyosov; Astrid
Krahn; Thomas Krahn; Bulat Muratov; Chris Morley; Ramil
Suyunov; Vadim Sozinov; Pavel Shvarev; SF "National clans DNA project"; EHP "Suyun" Ph.D. of Technical Science; Prof. Elsa
Khusnutdinova, Sc.D. of Biological Sciences, Laboratory of Molecular Human Genetics, Institute of Biochemistry and Genetics, Ufa Research Centre, Russian Academy of Sciences:
- L = snps named in honor of the late Leo Little: Thomas Krahn, MSc (Dipl.-Ing.) of Family Tree DNA's Genomics Research Center;
- M = Peter Underhill, Ph.D. of Stanford University
- MC = Christopher McCown, University of Florida; Thomas Krahn, MSc (Dipl.-Ing.), YSEQ.net, Houston, Texas
- N = The Laboratory of Bioinformatics, Institute of Biophysics, Chinese Academy of Sciences, Beijing
- NWT = Northwest Territory, Theodore G. Schurr, Ph.D., Laboratory of Molecular Anthropology, University of Pennsylvania, Philadelphia, PA
- P = Michael Hammer, Ph.D. of University of Arizona
- Page, PAGES or PS = David C. Page, Whitehead Institute for Biomedical Research
- PF = Paolo Francalacci, Ph.D., Universita di Sassari, Sassari, Italy
- PH = Pille Hallast, Ph.D., University of Leicester, Department of Genetics, United Kingdom
- PK = Biomedical and Genetic Engineering Laboratories, Islamabad,
- PR = Primate (gorilla and chimpanzee), Thomas Krahn's WTTY
- S = James F. Wilson, D.Phil. at Edinburgh University
- SA = South America, Theodore G. Schurr, Ph.D., Laboratory of Molecular Anthropology, University of Pennsylvania, Philadelphia, PA
- SK = Mark Stoneking, Ph.D., Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
- SUR = Southern Ural; SF "National clans DNA project"; B.A. Muratov; EHP "Suyun" Ph.D. of Technical Sciences; Ramil Suyunov; Prof. E.K. Khusnutdinova, Sc.D. of Biological Sciences, Laboratory of Molecular Human Genetics, Institute of Biochemistry and Genetics, Ufa Research Centre Russian Academy of Sciences; Alexander Zolotarev; Igor Rozhanskii; Bayazit Yunusbaev, Institute of Biochemistry and Genetics, Ufa Research Centre, Russian Academy of Sciences
- TSC = Gudmundur A. Thorisson and Lincoln D. Stein, The SNP
Consortium, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY
- U = Lynn M. Sims, University of Central Florida; Dennis Garvey, Ph.D. Gonzaga University; and Jack Ballantyne, Ph.D., University of Central Florida
- V = Rosaria Scozzari and Fulvio Cruciani, Dipartimento di Biologia e Biotecnologie “Charles Darwin” , Sapienza Universita di Roma, Rome, Italy.
- VL = Vladimir Volkov, Tomsk University, Russia
- YP = SNPs identified by citizen scientists from genetic tests, then submitted to the Y Full team for verification.
- YSC = Thomas Krahn, MSc (Dipl.-Ing.) of Family Tree DNA's Genomics Research Center
- Z = Gregory Magoon, Ph.D., Richard Rocca, Vince Tilroe, David F. Reynolds, Bonnie Schrack, Peter M. Op den Velde Boots, Ray H. Banks, Roman Sychev, Victar Mas, Steve Fix, Christian Rottensteiner, Alexander R. Williamson, Ph.D. and an anonymous individual, independent researchers of publicly available whole genome sequence datasets, and Thomas Krahn, MSc (Dipl.-Ing.), with support from the genetic genealogy community.
- ZP = Peter M. Op den Velde Boots, David Stedman using Big Y and other NGS sources.
- ZS = Gregory Magoon, Ph.D., Aaron Salles Torres from samples from the 1000 Genome Project.
We recommend these references: