Skip to Navigation

Symbol Report Documentation

Each gene with an approved HGNC symbol has its own Symbol Report that contains our manually curated data and links to many other external biomedical resources.  The HGNC "core data" is displayed at the top of the page in a separate box and presents the approved nomenclature, the unique HGNC ID number, synonyms, previous nomenclature, locus type and chromosomal location.  The table below the HGNC "core data" provides links to the HGNC's curated gene family page where applicable, and links to information on homologs in other species, nucleotide sequences, gene resources, protein resources, clinical resources, publications and other database links.

- Where links to external resources have been manually curated by a member of the HGNC, the letter for curated is displayed after the assertion.

D - Where links have been downloaded from external resources, the letter D for downloaded is displayed after the assertion; please note that these assertions are not subject to our strict manual checking and curation procedures and hence we cannot guarantee the reliability of the data.

The text that follows is a field-by-field guide to the information provided in the Symbol Report.

Core Data fields

Approved Symbol - the official gene symbol approved by the HGNC, which is a short abbreviated form of the gene name.  Symbols are approved in accordance with the Guidelines for Human Gene Nomenclature (please refer to our guidelines page).

Approved Name - the full gene name approved by the HGNC; corresponds to the approved symbol above.

HGNC ID - a unique ID provided by the HGNC for each gene with an approved symbol.  IDs are of the format HGNC:n, where n is a unique number.

Previous Gene Symbol & Names - this field displays any symbols and/or names that were previously HGNC-approved nomenclature.  Many genes will have no data in this field as the symbol or name will never have been changed.  In some instances, only previous gene names will appear as gene names are sometimes updated without a change to the approved gene symbol.

Synonyms - alternative symbols and names that have been used to refer to the gene.  Synonyms may be from the literature, from other databases or may be added to represent membership of a gene family.

Locus type - specifies the genetic class of each gene entry.  All HGNC locus types are listed below:

  • gene with protein product - protein-coding genes (the protein may be predicted and of unknown function) (SO:0001217)
  • RNA, cluster - region containing a cluster of small non-coding RNA genes
  • RNA, long non-coding - non-protein coding genes that encode long non-coding RNAs (lncRNAs) (SO:0001877); these are at least 200 nt in length. Subtypes include intergenic (SO:0001463), intronic (SO:0001903) and antisense (SO:0001904).
  • RNA, micro - non-protein coding genes that encode microRNAs (miRNAs) (SO:0001265)
  • RNA, ribosomal - non-protein coding genes that encode ribosomal RNAs (rRNAs) (SO:0001637)
  • RNA, small nuclear - non-protein coding genes that encode small nuclear RNAs (snRNAs) (SO:0001268)
  • RNA, small nucleolar - non-protein coding genes that encode small nucleolar RNAs (snoRNAs) containing C/D or H/ACA box domains (SO:0001267)
  • RNA, small cytoplasmic - non-protein coding genes that encode small cytoplasmic RNAs (scRNAs) (SO:0001266)
  • RNA, transfer - non-protein coding genes that encode transfer RNAs (tRNAs) (SO:0001272)
  • RNA, small misc - non-protein coding genes that encode miscellaneous types of small ncRNAs, such as vault (SO:0000404) and Y (SO:0000405) RNA genes
  • phenotype only -  mapped phenotypes where the causative gene has not been identified (SO:0001500)
  • pseudogene - genomic DNA sequences that are similar to protein-coding genes but do not encode a functional protein (SO:0000336)
  • RNA, pseudogene - pseudogene of a non-protein coding RNA
  • complex locus constituent - transcriptional unit that is part of a named complex locus
  • endogenous retrovirus - integrated retroviral elements that are transmitted through the germline (SO:0000100)
  • fragile site - a heritable locus on a chromosome that is prone to DNA breakage
  • immunoglobulin gene - gene segments that undergo somatic recombination to form heavy or light chain immunoglobulin genes (SO:0000460)
  • immunoglobulin pseudogene - immunoglobulin gene segments that are inactivated due to frameshift mutations and/or stop codons in the open reading frame
  • protocadherin - gene segments that constitute the three clustered protocadherins (alpha, beta and gamma)
  • readthrough - a naturally occurring transcript containing coding sequence from two or more genes that can also be transcribed individually
  • region - extents of genomic sequence that contain one or more genes, also applied to non-gene areas that do not fall into other types
  • T cell receptor gene - gene segments that undergo somatic recombination to form either alpha, beta, gamma or delta chain T cell receptor genes (SO:0000460)
  • T cell receptor pseudogene - T cell receptor gene segments that are inactivated due to frameshift mutations and/or stop codons in the open reading frame
  • transposable element - a segment of repetitive DNA that can move, or retrotranspose, to new sites within the genome (SO:0000101)
  • unknown - entries where the locus type is currently unknown
  • virus integration site - target sequence for the integration of viral DNA into the genome

Chromosomal Location - indicates the cytogenetic location of the gene or region on the chromosome.

All other Symbol Report Data fields

Gene family - links to HGNC-curated gene family pages.  Each link is to the relevant gene family or group the gene has been assigned to, according to either sequence similarity or information from publications, specialist advisors for that family or other databases. Families/groups may be either structural or functional; note that a gene may belong to more than one family/group.

Specialist Database - this section only appears on Symbol Reports if the gene in question is listed in an external database which is specific to certain classes of genes.  A full list is provided here:

  1. CD - Human Cell Differentiation Antigens
  2. Enzyme Commission (EC) - a repository of information relative to the nomenclature of enzymes
  3. HomeoDB - a database of homeobox gene diversity
  4. HORDE - Human Olfactory Receptor Data Exploratorium
  5. Human Intermediate Filament Database - information on intermediate filament genes
  6. IMGT/GENE-DB - immunoglobulin and T cell receptor gene nomenclature database hosted at the ImMunoGeneTics information system
  7. IUPHAR - Committee on Receptor Nomenclature and Drug Classification
  8. KIAA/HUGE - a database of Human Unidentified Gene-Encoded Large Proteins analysed by the Kazusa Human cDNA project
  9. KZNF Gene Catalog - a database of hand-annotated models of all kruppel-type (C2H2) zinc finger genes and pseudogenes in the human genome
  10. Mamit-tRNAdb - a compilation of mammalian mitochondrial tRNAs genes
  11. MEROPS - an information resource for peptidases
  12. miRBase - a searchable database of published miRNA sequences and annotation
  13. Pseudogene.org - a database of identified pseudogenes
  14. snoRNABase - a comprehensive database of human snoRNAs

Homologs - this section contains information on homologs of the gene in other species.  There are four separate data fields:

  1. a link to the gene page for the mouse ortholog at the Mouse Genome Informatics resource.  The approved symbol for the mouse ortholog is displayed after the link.  Mouse gene symbols are approved by the Mouse Genomic Nomenclature Committee (MGNC).
  2. a link to the gene page for the rat ortholog at the Rat Genome Database.  The approved symbol for the rat ortholog is also displayed after the link.  Rat gene symbols are approved by the Rat Genome and Nomenclature Committee (RGNC).
  3. HCOP: a link to orthology search results for the gene in question from the HGNC Comparison of Orthology Predictions tool.
  4. TreeFam: a link to the relevant phylogenetic tree as predicted at TreeFam, a database of phylogenetic trees of animal genes.

Nucleotide Sequences - a list of links to nucleotide sequences associated with the gene.  Links are to the following nucleotide resources:

  1. GenBank/EMBL/DDBJ sequence accession records.  Links are to sequence accessions curated by members of the HGNC.  Note that representative accessions are curated; a full list is not provided for each gene.  The user is given a choice of all three sequence databases links at which they may view the sequence accession.
  2. RefSeq the Reference Sequence (RefSeq) identifier for that entry, provided by the NCBI.  As we do not aim to curate all variants of a gene only one selected RefSeq is displayed per gene report. RefSeq aims to provide a comprehensive, integrated, non-redundant set of sequences, including genomic DNA, transcript (RNA), and protein products. RefSeq identifiers are designed to provide a stable reference for gene identification and characterization, mutation analysis, expression studies, polymorphism discovery, and comparative analyses.
  3. CCDS the consensus CDS (CCDS) sequence for the gene.  The CCDS project is a collaborative effort to identify a core set of human and mouse protein coding regions that are consistently annotated and of high quality. The long term goal is to support convergence towards a standard set of gene annotations.
  4. Vega gene level sequence annotation at the Vertebrate Genome Annotation (VEGA) database.

Gene Resources - provides links to external pages dedicated to information on the gene and to genome browsers.  Links are to the following pages:

  1. The Entrez Gene page at the NCBI provides curated sequence and descriptive information about genetic loci including official nomenclature, synonyms, sequence accessions, phenotypes, EC numbers, MIM numbers, UniGene clusters, homology, map locations, and related web sites.  There is also a link to the gene annotation at the NCBI Sequence Viewer, the graphical display for the NCBI Nucleotide and Protein databases.
  2. The Ensembl Gene View displays data associated at the gene level such as orthologs, paralogs, regulatory regions and splice variants.  There is a link to the gene annotation at the Ensembl Genome Browser.
  3. The UCSC gene page provides information on the gene model and links to related tools and databases.  There is also a link to the annotation at the UCSC Genome Browser.
  4. The Vega Gene View presents the manually annotated gene model, curated by the Havana project or their collaborators.  There is a corresponding link to the Vega Genome Browser.

Protein Resources - information on proteins encoded by the gene in question.  There are two possible links per Symbol Report:

  1. The UniProt page for the encoded gene protein product.  The UniProt Protein Knowledgebase is described as a curated protein sequence database that provides a high level of annotation, a minimal level of redundancy and high level of integration with other databases.
  2. The InterPro UniProtKB match page for the encoded protein.  InterPro is described as an integrated database of predictive protein "signatures" used for the classification and automatic annotation of proteins and genomes.  The match page shows all predicted protein signatures for the encoded protein.

Clinical Resources - provides links to associated phenotypes, diseases and mutations associated with the gene.  If the gene has no links to related clinical resources this section is not shown.  Possible links are as follows:

  1. OMIM - links to the Online Mendelian Inheritance in Man page for the gene.  OMIM is described as a catalog of human genes and genetic phenotypes containing textual information, and links to MEDLINE, sequence records in the Entrez system, and additional related resources.
  2. LSDB - links to Locus Specific Mutation Databases.  These databases list both published and unpublished mutations reported on a gene-by-gene basis.  The full name of the database is provided for each link.
  3. GeneTests - links to the NCBI GeneTests gene review page.  The GeneTests database provides information on genetic testing and related information on diagnosis, disease management and genetic counselling.
  4. Orphanet - the portal for rare diseases and orphan drugs.  Links go to a gene-based page with all associated rare diseases listed.
  5. Decipher  - a database of submicroscopic chromosomal imbalance.  Links are to gene-level pages with associated mutations and syndromes listed.
  6. COSMIC - the catalogue of somatic mutations in cancer.  Gene-level COSMIC reports list all curated references, clinical studies and samples with mutations in cancer.
  7. LRG - a link to the Locus Reference Genomic sequence for the gene.  LRG sequences "provide a stable genomic DNA framewrok for reporting mutations with a permanent ID and a core content that never changes".

References - displays the PubMed IDs (PMIDs) for references pertinent to the gene.  The user can choose to view these references at either PubMed or CiteXplore.  This section does not aim to list all possible published papers on the gene but provides links to papers that first described the gene in question or papers that are particularly relevant to the nomenclature of the gene.

Other Database Links - links to other external resources that provide useful information on the gene.  The links are only present for databases that have entries specific to the gene in question.  Here is a list of these resources:

  1. GENATLAS - links to the GENATLAS: GENE database pages.  These pages contain information relevant to gene mapping, gene products and genetic diseases.
  2. GeneCards - provides concise genomic related information on human genes.
  3. GOPubMed - links to a search of the GOPubMed literature search engine with the approved symbol for the gene. GOPubMed provides several different results display options, including listing references by year, MeSH term, GO term or researcher.
  4. H-InvDB - links to the Locus view page of the H-Invitational Database.  The H-InvDB contains gene annotation models and information on gene expression, gene products and related diseases.
  5. QuickGO - links to a list of all Gene Ontology (GO) terms annotated for the gene product.  GO terms are controlled vocabulary terms that describe the gene product charateristics.  GO terms are mapped via QuickGO to the UniProt protein accession.
  6. Reactome - links to a protein-level page that lists all signalling pathways associated with the gene curated by the Reactome project members.  Pathways are mapped to the UniProt protein accession.
  7. WikiGenes - gene-based pages that combine information curated by scientists in a "wiki" format with sentences mined automatically from the literature.

LRG sequences provide a stable genomic DNA framework for reporting mutations with a permanent ID and core content that never changes.