FAQ about gene nomenclature
- What is the HGNC?
- What is HGNC-approved nomenclature and why do we need it?
- Where can I find information about existing human gene symbols?
- What is a stem symbol?
- Where can I find the Nomenclature Guidelines?
- Do I have to use the approved symbols?
- How should I cite HGNC nomenclature resources?
- Are there nomenclature committees for other species?
- Does the HGNC collaborate with specialist nomenclature committees and advisors?
- How should orthologs be identified?
- How should I refer to the protein encoded by a gene?
- Do alternative gene transcripts or splice variants have approved symbols?
- Where can I read more about nomenclature and related issues?
- How do a perform a quick search with a term that contains spaces or commas?
HGNC symbol reports
Requesting a gene symbol
- My gene doesn't have an approved symbol. How do I propose one?
- What is the difference between a gene symbol and a gene name?
- Will my gene symbol request remain confidential?
- Why can't punctuation be used in a gene symbol?
The HUGO Gene Nomenclature Committee is the only worldwide authority that assigns standardised nomenclature to human genes. Please see the "About the HGNC" page for more information on the committee and our remit and history.
What is HGNC-approved nomenclature and why do we need it?
The HGNC approves both a short-form abbreviation known as a gene symbol, and also a longer and more descriptive name. Each symbol is unique and the committee ensures that each gene is only given one approved gene symbol. This allows for clear and unambiguous reference to genes in scientific communications, and facilitates electronic data retrieval from databases and publications. In preference, symbols also maintain parallel construction for different members of a gene family (see “What is a stem symbol?”) and can also be used for orthologous genes in other vertebrate species.
You can search all approved human gene symbols using the HGNC search facility.
A stem (or root) symbol is used as the basis for a series of approved symbols which are defined as members of either a functional or structural gene family. Stem symbols are usually devised in consultation with scientists in the relevant field, e.g. (# denotes number in series) CYP#: cytochrome P450; HOX#: homeo box; DUSP#: dual specificity phosphatase; and SCN2A#: sodium channel, voltage-gated, type II, alpha 2 polypeptide.
Where can I find the Nomenclature Guidelines?
The current guidelines can be accessed here.
We try to encourage as many researchers as possible to contribute towards development of nomenclature systems in the hope that they will then be more likely to use them. We do realise that not everyone will consistently use approved symbols; but if they are at least mentioned in a publication, it will ensure that the symbol can be used as a search term. This then gives a reference point to facilitate data retrieval in a number of databases including PubMed, GenBank, OMIM, Entrez Gene and MGI. Some journals do have editorial policies that require the use of HGNC-approved symbols.
Authors are requested to cite:
Gray KA, Daugherty LC, Gordon SM, Seal RL, Wright MW, Bruford EA. genenames.org: the HGNC resources in 2013. Nucleic Acids Res. 2013 Jan;41(Database issue):D545-52. doi: 10.1093/nar/gks1066. Epub 2012 Nov 17 PMID:23161694
To cite data within the database use the following format:
HGNC Database, HUGO Gene Nomenclature Committee (HGNC), EMBL Outstation - Hinxton, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK www.genenames.org.
Please include the month and year you retrieved the data cited.
Yes, we interact with other nomenclature committees and databases on a regular basis, particularly the Mouse Gene Nomenclature Committee (MGNC). Please see the following links:
- Mouse: http://www.informatics.jax.org/mgihome/nomen/
- Rat: http://rgd.mcw.edu/nomen/nomen.shtml
- Chicken: http://www.agnc.msstate.edu/
- Anolis lizard http://lizardbase.org/pages/agnc.html
- Xenopus: http://www.xenbase.org/gene/static/geneNomenclature.jsp
- Zebrafish: https://wiki.zfin.org/display/general/ZFIN+Zebrafish+Nomenclature+Guidelines
- Drosophila: http://flybase.org/static_pages/docs/nomenclature/nomenclature3.html
- C. elegans: http://wiki.wormbase.org/index.php/Nomenclature
- Arabidopsis: http://www.arabidopsis.org/portals/nomenclature/guidelines.jsp
- S. cerevisiae: http://www.yeastgenome.org/gene_guidelines.shtml
- S. pombe: http://www.pombase.org/submit-data/gene-naming-guidelines
HGNC is also now funded to assign standardized gene names to genes in other vertebrate species that do not have an existing gene nomenclature authority.
Yes, a table listing other nomenclature committees we collaborate with that work on specific groups of genes/proteins can be found here, and a page with our gene family/grouping specialist advisors is found here.
Where clear orthology can be asserted between genes in different vertebrate species, best efforts are made via interaction with other nomenclature groups to ensure they are assigned the same symbol. Human orthologs of genes first identified in other species should not be designated by a symbol beginning with H (or h) for human. When necessary to distinguish the species of origin for orthologous genes with the same gene symbol, the letter-based code for different species already established by SWISS-PROT is recommended. The code is for use in publications only and not incorporated as part of the gene symbol. The species designation is added as a prefix, in parentheses, to the gene symbol. For example: (HUMAN)G6PD; (HUMAN)HBB; orthologous mouse genes: (MOUSE)G6pd; (MOUSE)Hbb. Further examples of the species codes can be found in the Guidelines.
The HGNC will not usually assign gene symbols to alternative transcripts or splice variants.
Ideally, protein names and symbols would be identical to those used for the gene. However, we are a gene nomenclature committee and do not have any guidelines pertaining to proteins or authority over protein nomenclature. There is a recommendation for the use of italics for gene symbols, and non-italicised letters for the encoded protein; but some journals have editorial policies that prevent this convention from being used, so it is not by any means universal.
A list of HGNC nomenclature publications is available.
If you have a term that you would like to search for within the database that contains a space or a comma, such as "protein kinase", the term should be double quoted. Anything that is within double quotes will be taken literally as one search term. Without the quotes you will be searching for any entries that contain "protein" and any entries that contain "kinase".
HGNC Symbol Reports
This is a symbol by which a gene has been alternatively known in the literature or databases, or which groups it into a known gene family. Synonyms are usually recorded along with the approved symbols as part of the gene entry to facilitate database searching. The following databases all contain both approved symbols and synonyms:
"Symbol Withdrawn" refers to a previously approved HGNC symbol for a gene that now has a different approved symbol. "Entry Withdrawn" refers to a previously approved HGNC symbol for a gene that has since been shown not to exist.
Mapped data are identified in Gene Symbol Reports by the disclaimer “mapped data supplied by [source]” in the header of the relevant symbol report field. Mapped data are derived from external sources and as such are not subject to our strict manual checking and curation procedures. Therefore, the HGNC are unable to guarantee the same high quality for mapped data as for our curated data.
Requesting a gene symbol
Fill in a gene symbol request form and submit it to the HGNC. Remember that you need to propose a name (description) and symbol (short-form abbreviation) for your gene e.g. ADK: adenosine kinase, and ideally include sequence data wherever possible. Please read the Gene Symbol Request Form Notes prior to sending your request.
The "symbol" is a unique series of Latin letters (upper case in human), often with Arabic numerals, which should ideally be no longer than six characters in length. The longer descriptive "name" should be concise and convey the character or function of the gene. The first letter of the symbol should be the same as that of the name in order to facilitate alphabetical listing and grouping e.g. the gene with the name "G protein-coupled receptor 1" has the symbol “GPR1".
Upon submission, you can specify that you want all information associated with your request to remain confidential until publication. Both the human and mouse nomenclature committees maintain a confidential database in which symbols can be reserved prior to publication if required. For each reserved symbol we maintain confidential information, including sequences and cytogenetic locations, against which we can check any new gene symbol request. For more information on confidentiality options for gene symbol requests, please see here.
Why can't punctuation be used in a gene symbol?
Most types of punctuation marks are not permitted in symbols as they can cause difficulty in searches of electronic databases. Use of hyphens is restricted to certain groups of genes, such as components of the major histocompatibility complex (e.g. HLA-DPA1).