Symbol report help
Each gene with an approved HGNC symbol has its own Symbol Report that contains our manually curated data and links to many other external biomedical resources. The HGNC "core data" is displayed at the top of the page in a separate box and presents the approved nomenclature, the unique HGNC ID number, aliases, previous nomenclature, locus type, chromosomal location, gene group and a HCOP orthology prediction link. The table below the HGNC "core data" provides links to external resources such as homologs in other species (i.e mouse and rat), nucleotide resources, gene resources, protein resources, clinical resources, publications and other database links.
The text that follows is a field-by-field guide to the information provided in the Symbol Report.
Core Data fields
The official gene symbol approved by the HGNC, which is typically a short form of the gene name. Symbols are approved in accordance with the Guidelines for Human Gene Nomenclature (please refer to our guidelines page).
A unique ID provided by the HGNC for each gene with an approved symbol. IDs are of the format HGNC:n, where n is a unique number. HGNC IDs remain stable even if a name or symbol changes.
Alternative symbols that have been used to refer to the gene. Aliases may be from literature, from other databases or may be added to represent membership of a gene group.
Alternative names for the gene. Aliases may be from literature, from other databases or may be added to represent membership of a gene group.
Specifies the genetic class of each gene entry. All HGNC locus types are listed below:
- gene with protein product - protein-coding genes (the protein may be predicted and of unknown function) (SO:0001217)
- RNA, cluster - region containing a cluster of small non-coding RNA genes
- RNA, long non-coding - non-protein coding genes that encode long non-coding RNAs (lncRNAs) (SO:0002127); these are at least 200 nt in length. Subtypes include intergenic (SO:0001641), intronic (SO:0002184) and antisense (SO:0002182).
- RNA, micro - non-protein coding genes that encode microRNAs (miRNAs) (SO:0001265)
- RNA, misc - non-protein coding genes that encode miscellaneous types of ncRNAs.
- RNA, ribosomal - non-protein coding genes that encode ribosomal RNAs (rRNAs) (SO:0001637)
- RNA, small nuclear - non-protein coding genes that encode small nuclear RNAs (snRNAs) (SO:0001268)
- RNA, small nucleolar - non-protein coding genes that encode small nucleolar RNAs (snoRNAs) containing C/D or H/ACA box domains (SO:0001267)
- RNA, transfer - non-protein coding genes that encode transfer RNAs (tRNAs) (SO:0001272)
- RNA, vault - non-protein coding genes that encode small transcripts of roughly 100 nucleotides with a conserved panhandle-like secondary structure, examples of which were originally identified in the vault complex (SO:0002358)
- RNA, Y - non-protein coding genes that encode RNAs found in the Ro ribonucleoprotein particle (SO:0002359)
- phenotype only - mapped phenotypes where the causative gene has not been identified (SO:0001500)
- pseudogene - genomic DNA sequences that are similar to protein-coding genes but do not encode a functional protein (SO:0000336)
- complex locus constituent - transcriptional unit that is part of a named complex locus
- endogenous retrovirus - integrated retroviral elements that are transmitted through the germline (SO:0000100)
- fragile site - a heritable locus on a chromosome that is prone to DNA breakage (SO:0002349)
- immunoglobulin gene - gene segments that undergo somatic recombination to form heavy or light chain immunoglobulin genes (SO:0000460). Also includes immunoglobulin gene segments with open reading frames that either cannot undergo somatic recombination, or encode a peptide that is not predicted to fold correctly; these are identified by inclusion of the term "non-functional" in the gene name.
- immunoglobulin pseudogene - immunoglobulin gene segments that are inactivated due to frameshift mutations and/or stop codons in the open reading frame (SO:0002098)
- readthrough - a naturally occurring transcript containing exonic sequence from two or more genes that can also be transcribed individually
- region - extents of genomic sequence that contain one or more genes, also applied to non-gene areas that do not fall into other types (SO:0000001)
- T cell receptor gene - gene segments that undergo somatic recombination to form either alpha, beta, gamma or delta chain T cell receptor genes (SO:0000460). Also includes T cell receptor gene segments with open reading frames that either cannot undergo somatic recombination, or encode a peptide that is not predicted to fold correctly; these are identified by inclusion of the term "non-functional" in the gene name.
- T cell receptor pseudogene - T cell receptor gene segments that are inactivated due to frameshift mutations and/or stop codons in the open reading frame (SO:0002099)
- transposable element - a segment of repetitive DNA that can move, or retrotranspose, to new sites within the genome (SO:0000101)
- unknown - entries where the locus type is currently unknown
- virus integration site - target sequence for the integration of viral DNA into the genome
Indicates the cytogenetic location of the gene or region on the chromsome. In the absence of that information one of the following may be listed:
- not on reference assembly - named gene is not annotated on the current version of the Genome Reference Consortium human reference assembly; may have been annotated on previous assembly versions or on a non-reference human assembly
- unplaced - named gene is annotated on an unplaced/unlocalized scaffold of the human reference assembly
- reserved - named gene has never been annotated on any human assembly
We do not include "start and stop" genomic coordinates in our data because these often differ between annotations; instead we provide links out to annotation resources in the gene resources section of the symbol report."
Links to HGNC-curated gene group pages. Each link is to the relevant gene group the gene has been assigned to, according to either sequence similarity or information from publications, specialist advisors for that group or other databases. Groups may be either structural or functional; note that a gene may belong to more than one group.
This field contains additional information related to this entry that has been manually added by an HGNC curator.
A link to orthology search results for the gene in question from the HGNC Comparison of Orthology Predictions tool (HCOP).
All other symbol report data fields
This section only appears on Symbol Reports if the gene in question is listed in an external database which is specific to certain classes of genes. A full list is provided here:
- CD - Human Cell Differentiation Antigens
- Enzyme Commission (EC) - a repository of information relative to the nomenclature of enzymes
- HomeoDB - a database of homeobox gene diversity
- HORDE - Human Olfactory Receptor Data Exploratorium
- Human Intermediate Filament Database - information on intermediate filament genes
- IMGT/GENE-DB - immunoglobulin and T cell receptor gene nomenclature database hosted at the ImMunoGeneTics information system
- IUPHAR/BPS Guide to PHARMACOLOGY - An expert-driven guide to pharmacological targets and the substances that act on them.
- KIAA/HUGE - a database of Human Unidentified Gene-Encoded Large Proteins analysed by the Kazusa Human cDNA project
- Mamit-tRNAdb - a compilation of mammalian mitochondrial tRNAs genes
- MEROPS - an information resource for peptidases
- miRBase - a searchable database of published miRNA sequences and annotation
- Pseudogene.org - a database of identified pseudogenes
- snoRNABase - a comprehensive database of human snoRNAs
- BioParadigms SLC tables provides the latest up-to-date information on the SLC families and their members.
- LNCipedia - a comprehensive compendium of human long non-coding RNAs.
- ncRNAdb - a database that provides comprehensive annotations of eukaryotic long non-coding RNAs.
- GtRNAdb - a database that contains tRNA gene predictions made by tRNAscan-SE on complete or nearly complete genomes.
This section contains links to orthologs of the gene in selected species. The table contains the following:
- mouse orthologs that link to the Mouse Genome Informatics resource. Mouse gene symbols are approved by the Mouse Genomic Nomenclature Committee (MGNC).
- rat ortholog that link to the Rat Genome Database. Rat gene symbols are approved by the Rat Genome and Nomenclature Committee (RGNC).
- Other vertebrate orthologs named by the Vertebrate Gene Nomenclature Committee (VGNC). VGNC is responsible for assigning standardized names to genes in vertebrate species that currently lack a nomenclature committee.
A list of links to nucleotide sequences associated with the gene. Links are to the following nucleotide resources:
- Matched Annotation from NCBI and EMBL-EBI (MANE) is a collaboration between the National Center for Biotechnology Information (NCBI) and the European Molecular Biology Laboratories-European Bioinformatics Institute (EMBL-EBI). The goal of this project is to provide a minimal set of matching RefSeq and Ensembl transcripts of human protein-coding genes, where the transcripts from a matched pair are identical (5’ UTR, coding region and 3’ UTR), but retain their respective identifiers. Currently we only display the MANE select (i.e One high-quality representative transcript per protein-coding gene that is well-supported by experimental data and represents the biology of the gene) transcripts on our symbol reports.
- GenBank/ENA/DDBJ sequence accession records. Links are to sequence accessions curated by members of the HGNC. Note that representative accessions are curated; a full list is not provided for each gene. The user is given a choice of all three sequence databases links at which they may view the sequence accession.
- RefSeq the Reference Sequence (RefSeq) identifier for that entry, provided by the NCBI. As we do not aim to curate all variants of a gene only one selected RefSeq is displayed per gene report. RefSeq aims to provide a comprehensive, integrated, non-redundant set of sequences, including genomic DNA, transcript (RNA), and protein products. RefSeq identifiers are designed to provide a stable reference for gene identification and characterization, mutation analysis, expression studies, polymorphism discovery, and comparative analyses.
- CCDS the consensus CDS (CCDS) sequence for the gene. The CCDS project is a collaborative effort to identify a core set of human and mouse protein coding regions that are consistently annotated and of high quality. The long term goal is to support convergence towards a standard set of gene annotations.
Provides links to external pages dedicated to information on the gene and to genome browsers. Links are to the following pages:
- The NCBI Gene page at the NCBI provides curated sequence and descriptive information about genetic loci including official nomenclature, aliases, sequence accessions, phenotypes, EC numbers, MIM numbers, UniGene clusters, homology, map locations, and related web sites. There is also a link to the gene annotation at the NCBI Sequence Viewer, the graphical display for the NCBI Nucleotide and Protein databases.
- The Ensembl Gene View displays data associated at the gene level such as orthologs, paralogs, regulatory regions and splice variants. There is a link to the gene annotation at the Ensembl Genome Browser.
- The UCSC gene page provides information on the gene model and links to related tools and databases. There is also a link to the annotation at the UCSC Genome Browser.
- The primary mission of the Alliance of Genome Resources (AGR) is to develop and maintain sustainable genome information resources that facilitate the use of diverse model organisms in understanding the genetic and genomic basis of human biology, health and disease. The HGNC will link to the Human gene page at the AGR via HGNC ID if such a page exists.
Information on proteins encoded by the gene in question.
Links are made via UniProt protein accessions. There are four possible links that we have created via the SwissProt accession:
- The UniProt page for the encoded gene protein product. The UniProt Protein Knowledgebase is a curated protein sequence database that provides a high level of annotation, a minimal level of redundancy and high level of integration with other databases. We do not map to TrEMBL entries within UniProt, only to Swiss-Prot entries as these are manually annotated and reviewed.
- The InterPro page mapped to the displayed UniProt protein accession. InterPro is an integrated database of predictive protein "signatures" used for the classification and automatic annotation of proteins and genomes.
- The PDBe page mapped to the displayed UniProt accession. PDBe is a founding member of the Worldwide Protein Data Bank which collects, organises and disseminates data on biological macromolecular structures.
- The Reactome protein-level page mapped to the displayed UniProt protein accession. Reactome is an manually curated and peer-reviewed pathway database.
AlphaFold DB provides open access to protein structure predictions. In this section we display the AlphaFold accession and link to the resource using the UniProt/SwissProt accession which we obtained from AlphaFold. We only link to the SwissProt/reviewed protein structure predictions.
Provides links to associated phenotypes, diseases and mutations associated with the gene. If the gene has no links to related clinical resources this section is not shown. Possible links are as follows:
- OMIM - links to the Online Mendelian Inheritance in Man page for the gene. OMIM is described as a catalog of human genes and genetic phenotypes containing textual information, and links to MEDLINE, sequence records in the Entrez system, and additional related resources.
- LSDB - links to Locus Specific Mutation Databases. These databases list both published and unpublished mutations reported on a gene-by-gene basis. The full name of the database is provided for each link.
- Genetics Home Reference - provides consumer-friendly information about the effects of genetic variation on human health.
- Orphanet - the portal for rare diseases and orphan drugs. Links go to a gene-based page with all associated rare diseases listed.
- Decipher - a database of submicroscopic chromosomal imbalance. Links are to gene-level pages with associated mutations and syndromes listed.
- COSMIC - the catalogue of somatic mutations in cancer. Gene-level COSMIC reports list all curated references, clinical studies and samples with mutations in cancer.
- LRG - a link to a Locus Reference Genomic (LRG) record that contains stable genomic and transcript reference sequences for reporting variants with clinical implications. LRGs are manually curated, have permanent identifiers and core content that never changes.
- Genetic Testing Registry - the Genetic Testing Registry (GTR) provides a central location for voluntary submission of genetic test information by providers. The scope includes the test's purpose, methodology, validity, evidence of the test's usefulness, and laboratory contacts and credentials.
- GenCC - the gene curation coalition. The Gene Curation Coalition brings together groups engaged in the evaluation of gene-disease validity with a willingness to share data publicly, to develop consistent terminology for gene curation activities and to facilitate the consistent assessment of genes that have been reported in association with disease. In the gene symbol report we display the HGNC ID that is used to identify a record within the GenCC database and the link attached to the ID will take you to the gene page within the GenCC's database.
Displays the title, (first) author, journal information and links to PubMed and Europe PubMed Central. The abstract and full list of authors can also be viewed by clicking on the '+' icon next to the links. This section aims to reference a limited number of key papers that describe the gene and/or its products, or are particularly relevant to its nomenclature and/or function; it does not aim to be an exhaustive bibliography.
Links to other external resources that provide useful information on the gene. The links are only present for databases that have entries specific to the gene in question. Here is a list of these resources:
- BioGPS - A customizable and extensible portal for aggregating gene and protein information.
- GENATLAS - links to the GENATLAS: GENE database pages. These pages contain information relevant to gene mapping, gene products and genetic diseases.
- GeneCards - provides concise genomic related information on human genes.
- GOPubMed - links to a search of the GOPubMed literature search engine with the approved symbol for the gene. GOPubMed provides several different results display options, including listing references by year, MeSH term, GO term or researcher.
- QuickGO - links to a list of all Gene Ontology (GO) terms annotated for the gene product. GO terms are controlled vocabulary terms that describe the gene product characteristics. GO terms are mapped via QuickGO to the UniProt protein accession.
- WikiGenes - gene-based pages that combine information curated by scientists in a "wiki" format with sentences mined automatically from the literature.