Symbol report help
Each gene with an approved HGNC symbol has its own Symbol Report that contains our manually curated data and links to many other external biomedical resources. The HGNC "core data" is displayed at the top of the page in a separate box and presents the approved nomenclature, the unique HGNC ID number, synonyms, previous nomenclature, locus type, chromosomal location , gene families and a HCOP orthology prediction link . The table below the HGNC "core data" provides links to external resources such as homologs in other species (i.e mouse and rat), nucleotide sequences, gene resources, protein resources, clinical resources, publications and other database links.
Table of Contents
- Core Data fields
- All other Symbol Report Data fields
-Placeholder symbol. If you have functional data about this gene or its product(s) please contact us.
D - Where links have been downloaded from external resources, the letter D for downloaded is displayed after the assertion; please note that these assertions are not subject to our strict manual checking and curation procedures and hence we cannot guarantee the reliability of the data.
The text that follows is a field-by-field guide to the information provided in the Symbol Report.
Core Data fields
The official gene symbol approved by the HGNC, which is a short abbreviated form of the gene name. Symbols are approved in accordance with the Guidelines for Human Gene Nomenclature (please refer to our guidelines page).
A unique ID provided by the HGNC for each gene with an approved symbol. IDs are of the format HGNC:n, where n is a unique number.
This field displays any symbols and/or names that were previously HGNC-approved nomenclature. Many genes will have no data in this field as the symbol or name will never have been changed. In some instances, only previous gene names will appear as gene names are sometimes updated without a change to the approved gene symbol.
Alternative symbols and names that have been used to refer to the gene. Synonyms may be from the literature, from other databases or may be added to represent membership of a gene family.
Specifies the genetic class of each gene entry. All HGNC locus types are listed below:
- gene with protein product - protein-coding genes (the protein may be predicted and of unknown function) (SO:0001217)
- RNA, Y - non-protein coding genes that encode Y RNAs (SO:0000405)
- RNA, cluster - region containing a cluster of small non-coding RNA genes
- RNA, long non-coding - non-protein coding genes that encode long non-coding RNAs (lncRNAs) (SO:0001877); these are at least 200 nt in length. Subtypes include intergenic (SO:0001463), intronic (SO:0001903) and antisense (SO:0001904).
- RNA, micro - non-protein coding genes that encode microRNAs (miRNAs) (SO:0001265)
- RNA, misc - non-protein coding genes that encode miscellaneous types of small ncRNAs
- RNA, ribosomal - non-protein coding genes that encode ribosomal RNAs (rRNAs) (SO:0001637)
- RNA, small cytoplasmic - non-protein coding genes that encode small cytoplasmic RNAs (scRNAs) (SO:0001266)
- RNA, small nuclear - non-protein coding genes that encode small nuclear RNAs (snRNAs) (SO:0001268)
- RNA, small nucleolar - non-protein coding genes that encode small nucleolar RNAs (snoRNAs) containing C/D or H/ACA box domains (SO:0001267)
- RNA, transfer - non-protein coding genes that encode transfer RNAs (tRNAs) (SO:0001272)
- RNA, vault - non-protein coding genes that encode vault RNAs (SO:0000404)
- phenotype only - mapped phenotypes where the causative gene has not been identified (SO:0001500)
- T cell receptor pseudogene - T cell receptor gene segments that are inactivated due to frameshift mutations and/or stop codons in the open reading frame
- immunoglobulin pseudogene - immunoglobulin gene segments that are inactivated due to frameshift mutations and/or stop codons in the open reading frame
- pseudogene - genomic DNA sequences that are similar to protein-coding genes but do not encode a functional protein (SO:0000336)
- T cell receptor gene - gene segments that undergo somatic recombination to form either alpha, beta, gamma or delta chain T cell receptor genes (SO:0000460). Also includes T cell receptor gene segments with open reading frames that either cannot undergo somatic recombination, or encode a peptide that is not predicted to fold correctly; these are identified by inclusion of the term “non-functional” in the gene name.
- complex locus constituent - transcriptional unit that is part of a named complex locus
- endogenous retrovirus - integrated retroviral elements that are transmitted through the germline (SO:0000100)
- fragile site - a heritable locus on a chromosome that is prone to DNA breakage
- immunoglobulin gene - gene segments that undergo somatic recombination to form heavy or light chain immunoglobulin genes (SO:0000460). Also includes immunoglobulin gene segments with open reading frames that either cannot undergo somatic recombination, or encode a peptide that is not predicted to fold correctly; these are identified by inclusion of the term “non-functional” in the gene name.
- protocadherin - gene segments that constitute the three clustered protocadherins (alpha, beta and gamma)
- readthrough - a naturally occurring transcript containing coding sequence from two or more genes that can also be transcribed individually
- region - extents of genomic sequence that contain one or more genes, also applied to non-gene areas that do not fall into other types
- transposable element - a segment of repetitive DNA that can move, or retrotranspose, to new sites within the genome (SO:0000101)
- unknown - entries where the locus type is currently unknown
- virus integration site - target sequence for the integration of viral DNA into the genome
Links to HGNC-curated gene family pages. Each link is to the relevant gene family or group the gene has been assigned to, according to either sequence similarity or information from publications, specialist advisors for that family or other databases. Families/groups may be either structural or functional; note that a gene may belong to more than one family/group.
A link to orthology search results for the gene in question from the HGNC Comparison of Orthology Predictions tool (HCOP).
All other Symbol Report Data fields
This section only appears on Symbol Reports if the gene in question is listed in an external database which is specific to certain classes of genes. A full list is provided here:
- CD - Human Cell Differentiation Antigens
- Enzyme Commission (EC) - a repository of information relative to the nomenclature of enzymes
- HomeoDB - a database of homeobox gene diversity
- HORDE - Human Olfactory Receptor Data Exploratorium
- Human Intermediate Filament Database - information on intermediate filament genes
- IMGT/GENE-DB - immunoglobulin and T cell receptor gene nomenclature database hosted at the ImMunoGeneTics information system
- IUPHAR/BPS Guide to PHARMACOLOGY - An expert-driven guide to pharmacological targets and the substances that act on them.
- KIAA/HUGE - a database of Human Unidentified Gene-Encoded Large Proteins analysed by the Kazusa Human cDNA project
- KZNF Gene Catalog - a database of hand-annotated models of all kruppel-type (C2H2) zinc finger genes and pseudogenes in the human genome
- Mamit-tRNAdb - a compilation of mammalian mitochondrial tRNAs genes
- MEROPS - an information resource for peptidases
- miRBase - a searchable database of published miRNA sequences and annotation
- Pseudogene.org - a database of identified pseudogenes
- snoRNABase - a comprehensive database of human snoRNAs
- BioParadigms SLC tables provides the latest up-to-date information on the SLC families and their members.
This section contains information on homologs of the gene in other species within a table with species symbol and database ID as the columns. The table contains the following:
A list of links to nucleotide sequences associated with the gene. Links are to the following nucleotide resources:
- GenBank/ENA/DDBJ sequence accession records. Links are to sequence accessions curated by members of the HGNC. Note that representative accessions are curated; a full list is not provided for each gene. The user is given a choice of all three sequence databases links at which they may view the sequence accession.
- RefSeq the Reference Sequence (RefSeq) identifier for that entry, provided by the NCBI. As we do not aim to curate all variants of a gene only one selected RefSeq is displayed per gene report. RefSeq aims to provide a comprehensive, integrated, non-redundant set of sequences, including genomic DNA, transcript (RNA), and protein products. RefSeq identifiers are designed to provide a stable reference for gene identification and characterization, mutation analysis, expression studies, polymorphism discovery, and comparative analyses.
- CCDS the consensus CDS (CCDS) sequence for the gene. The CCDS project is a collaborative effort to identify a core set of human and mouse protein coding regions that are consistently annotated and of high quality. The long term goal is to support convergence towards a standard set of gene annotations.
- Vega gene level sequence annotation at the Vertebrate Genome Annotation (VEGA) database.
Provides links to external pages dedicated to information on the gene and to genome browsers. Links are to the following pages:
- The NCBI gene page at the NCBI provides curated sequence and descriptive information about genetic loci including official nomenclature, synonyms, sequence accessions, phenotypes, EC numbers, MIM numbers, UniGene clusters, homology, map locations, and related web sites. There is also a link to the gene annotation at the NCBI Sequence Viewer, the graphical display for the NCBI Nucleotide and Protein databases.
- The Ensembl Gene View displays data associated at the gene level such as orthologs, paralogs, regulatory regions and splice variants. There is a link to the gene annotation at the Ensembl Genome Browser.
- The UCSC gene page provides information on the gene model and links to related tools and databases. There is also a link to the annotation at the UCSC Genome Browser.
- The Vega Gene View presents the manually annotated gene model, curated by the Havana project or their collaborators. There is a corresponding link to the Vega Genome Browser.
Information on proteins encoded by the gene in question. There are two possible links per Symbol Report:
- The UniProt page for the encoded gene protein product. The UniProt Protein Knowledgebase is described as a curated protein sequence database that provides a high level of annotation, a minimal level of redundancy and high level of integration with other databases.
- The InterPro UniProtKB match page for the encoded protein. InterPro is described as an integrated database of predictive protein "signatures" used for the classification and automatic annotation of proteins and genomes. The match page shows all predicted protein signatures for the encoded protein.
Provides links to associated phenotypes, diseases and mutations associated with the gene. If the gene has no links to related clinical resources this section is not shown. Possible links are as follows:
- OMIM - links to the Online Mendelian Inheritance in Man page for the gene. OMIM is described as a catalog of human genes and genetic phenotypes containing textual information, and links to MEDLINE, sequence records in the Entrez system, and additional related resources.
- LSDB - links to Locus Specific Mutation Databases. These databases list both published and unpublished mutations reported on a gene-by-gene basis. The full name of the database is provided for each link.
- GeneTests - links to the GeneTests database which provides information on genetic testing and related information on diagnosis, disease management and genetic counselling.
- Orphanet - the portal for rare diseases and orphan drugs. Links go to a gene-based page with all associated rare diseases listed.
- Decipher - a database of submicroscopic chromosomal imbalance. Links are to gene-level pages with associated mutations and syndromes listed.
- COSMIC - the catalogue of somatic mutations in cancer. Gene-level COSMIC reports list all curated references, clinical studies and samples with mutations in cancer.
- LRG - a link to a Locus Reference Genomic (LRG) record that contains stable genomic and transcript reference sequences for reporting variants with clinical implications. LRGs are manually curated, have permanent identifiers and core content that never changes.
- Genetic Testing Registry - the Genetic Testing Registry (GTR) provides a central location for voluntary submission of genetic test information by providers. The scope includes the test's purpose, methodology, validity, evidence of the test's usefulness, and laboratory contacts and credentials.
Displays the title, (first) author, journal information and links to PubMed and Europe PubMed Central. The abstract and full list of authors can also be viewed by clicking on the '+' icon next to the links. This section aims to reference a limited number of key papers that describe the gene and/or its products, or are particularly relevant to its nomenclature and/or function; it does not aim to be an exhaustive bibliography.
Links to other external resources that provide useful information on the gene. The links are only present for databases that have entries specific to the gene in question. Here is a list of these resources:
- BioGPS - A customizable and extensible portal for aggregating gene and protein information.
- GENATLAS - links to the GENATLAS: GENE database pages. These pages contain information relevant to gene mapping, gene products and genetic diseases.
- GeneCards - provides concise genomic related information on human genes.
- GOPubMed - links to a search of the GOPubMed literature search engine with the approved symbol for the gene. GOPubMed provides several different results display options, including listing references by year, MeSH term, GO term or researcher.
- H-InvDB - links to the Locus view page of the H-Invitational Database. The H-InvDB contains gene annotation models and information on gene expression, gene products and related diseases.
- QuickGO - links to a list of all Gene Ontology (GO) terms annotated for the gene product. GO terms are controlled vocabulary terms that describe the gene product charateristics. GO terms are mapped via QuickGO to the UniProt protein accession.
- Reactome - links to a protein-level page that lists all signalling pathways associated with the gene curated by the Reactome project members. Pathways are mapped to the UniProt protein accession.
- WikiGenes - gene-based pages that combine information curated by scientists in a "wiki" format with sentences mined automatically from the literature.