See the HGNC nomenclature guidelines for more details of many of the definitions given in this page.
CD indicates the field can contain multiple values as a comma delimited list.
QCD indicates the field can contain multiple values. Each value is enclosed in double quote marks and placed in a comma delimited list.
- HGNC ID - A unique ID provided by the HGNC. In the HTML results page this ID links to the HGNC Symbol Report for that gene.
- Approved Symbol - The official gene symbol that has been approved by the HGNC and is publically available. Symbols are approved based on specific nomenclature guidelines . In the HTML results page this ID links to the HGNC Symbol Report for that gene.
- Approved Name - The official gene name that has been approved by the HGNC and is publically available. Names are approved based on specific nomenclature guidelines .
- Status - Indicates whether the gene is classified as:
- Approved - these genes have HGNC-approved gene symbols
- Approved non-human - these entries have been approved in order to maintain the orthologous gene symbol in the human gene family series. It is quite likely that most of these genes will ultimately be found in the human genome
- Entry withdrawn - these previously approved genes are no longer thought to exist
- Locus Type - Specifies the type of locus, as defined by the NCBI , described by the entry:
- gene with protein product, function known or inferred - for protein-coding genes the protein may be predicted, but there is homology to proteins of known function, not just proteins of known motifs.
- gene with protein product, function unknown - genes for which there is a protein product, which may even have a defined motif, but its function is not known.
- gene with protein product, demonstrates somatic rearrangement - To be used for such 'genes' as IGHG1, IGHG2, which define a combination of exons giving rise to a particular class of protein product. See also: for the set of exons that defines one mRNA, depending on the rearrangement.
- gene with no protein product - for RNA-coding genes that do not fall into other specific categories to be used for such RNAs as the RNA component of enzymes, regulatory RNAs, etc.
- phenotype only - for mapped phenotypes
- pseudogene - genes are classified as pseudogenes if there is not evidence of transcription, even if there is a predicted coding sequence, if:
- a paper states they are pseudogenes
- the annotation says they are pseudogenes or
- some other experts says they are pseudogenes
- the accession aligns to multiple locations in the genome: (i) one in which there is intron/exon organization (ii) and the others, the potential pseudogene locations, in which the alignment is almost full length, but the number of exons is significantly smaller. Computationally, the current cutoff used by the NCBI is an exon ratio < 0.67
- non-human orthologue - entry describes the approved name and symbol for a gene before the human orthologue has been identified. This reserves the symbol so that it can not be used for a non-orthologous gene and is therefore valuable to maintain compatibility of symbols between orthologues in different species
- RNA, micro - RNAs explicitly designated as micro RNA
- RNA, ribosomal - RNAs that are structural components of ribosomes
- RNA, small nuclear - RNAs explicitly designated as small nuclear
- RNA, small nucleolar - RNAs explicitly designated as small nucleolar
- RNA, small cytoplasmic - RNAs explicitly designated as small cytoplasmic
- RNA, transfer - RNAs explicitly designated as transfer RNAs
- duplicon - A duplicated peice of DNA, containing a gene that is approximately 97% similar to the original functional gene. Usually the duplicon gene is not functional
- region - extents of genomic sequence that contain one or more genes. For genes that undergo genomic rearrangement, the region category should be used only for more than one of these. Also applied to non-gene areas that do not fall into other types, such as regulatory elements or repetitive elements
- model, ab initio - a model that is generated only from first principles, and is not guided by EST evidence. It does not predict a protein with significant similarity to other known proteins.
- model, ab initio, with EST support - a model that is generated from first principles, and is guided by EST evidence. It does not predict a protein with significant similarity to other known proteins.
- model, ab initio, with EST support and protein similarity - a model that is generated from first principles, and is guided by EST evidence. It does predict a protein with significant similarity to other known proteins.
- model, ab initio, with protein similarity - a model that is generated from first principles, and is not guided by EST evidence. It does predict a protein with significant similarity to other known proteins.
- model, supported by EST alignments - a model that is generated from EST alignments, but not mRNA alignments.
- model, supported by mRNA alignments - a model that is generated from mRNA alignments, but splice junctions or extensions are not supporte by ESTs.
- model, supported by mRNA and EST alignments - a model that is generated from mRNA alignments, and splice junctions or entensions are supported by ESTs.
- Previous Symbols CD - Symbols previously approved by the HGNC for this gene.
- Previous Names QCD - Gene names previously approved by the HGNC for this gene.
- Aliases CD - Other symbols used to refer to this gene.
- Name Aliases QCD - Other names used to refer to this gene.
- Chromosome - Indicates the location of the gene or region on the chromosome.
- Date Approved - Date the gene symbol and name were approved by the HGNC.
- Date Modified - If applicable, the date the entry was modified by the HGNC.
- Date Symbol changed - If applicable, the date the gene symbol was last changed by the HGNC from a previously approved symbol. Many genes receive approved symbols and names which are viewed and temporary (eg C2orf#) or are non-ideal when considered in the light of subsequent information. In the case of individual genes a change to the name (and subsequently the symbol) is only made if the original name is seriously misleading.
- Date Name changed - If applicable, the date the gene name was last changed by the HGNC from a previously approved name.
- Accession Numbers CD - Accession numbers for each entry selected by the HGNC.
- Enzyme ID CD - Enzyme entries have Enzyme Commission (EC) numbers associated with them that indicates the hierarchical functional classes to which they belong.
- Entrez Gene ID - Entrez Gene at the NCBI provide curated sequence and descriptive information about genetic loci including official nomenclature, aliases, sequence accessions, phenotypes, EC numbers, MIM numbers, UniGene clusters, homology, map locations, and related web sites. In the HTML results page this ID links to the Entrez Gene page for that gene. Entrez Gene has replaced LocusLink.
- MGD ID - Mouse Genome Database (MGD) identifier. In the HTML results page this ID links to the MGD Report for that gene.
- Specilist Database Links CD - This column contains links to specialist databases, with a particular interest in that symbol/gene.
- Ensembl Gene ID - This column contains a manually curated Ensembl Gene ID
- Specilist Database IDs CD - The Specilist Database Links column contains HTML links to the database in question
This column contains links to specialist databases, with a particular interest in that symbol/gene. it is a fixed length comma delimited list with each position dedecated to a particular database these are:-
- miRNA miRBase the microRNA database
- HORDE ID Human Olfactory Receptor Data Exploratorium
- CD Human Cell Differentiation Antigens
- Rfam RNA families database of alignments and CMs
- snoRNABase database of human H/ACA and C/D box snoRNAs.
- LLNL Lawrence Livermore National Laboratory Human KZNF Gene Catalog
- Intermediate Filament DB Human Intermediate Filament Database
- IUPHAR Committee on Receptor Nomenclature and Drug Classification.(mapped)
- IMGT/GENE-DB the international ImMunoGeneTics information system for immunoglobulins (mapped)
- MEROPS
the peptidase database
most of these IDs have undergone manual curation, however a few are mapped from reguarly updated files
kindly provided by the specilist database.
When we add new databases these will be appended to the end of this list
- Pubmed IDs CD - Identifier that links to published articles relevant to the entry in the NCBI's PubMed database .
- RefSeq IDs CD - The Reference Sequence (RefSeq) identifier for that entry, provided by the NCBI. As we do not aim to curate all variants of a gene only one selected RefSeq is displayed per gene report. RefSeq aims to provide a comprehensive, integrated, non-redundant set of sequences, including genomic DNA, transcript (RNA), and protein products. RefSeq identifiers are designed to provide a stable reference for gene identification and characterization, mutation analysis, expression studies, polymorphism discovery, and comparative analyses. In the HTML results page this ID links to the RefSeq page for that entry.
- Gene Family name CD - Indicates the name of the family or families the gene has been assigned to, according to either sequence similarity or information from publications, specialist advisors for that family or other databases. Families may therefore be either structural or functional. Genes within a family are assigned hierarchical symbols with a common stem.
Mapped Data
Please note that mapped data are derived from external sources and as such are not subject to our strict checking and curation procedures. They should therefore be treated with some caution.
- GDB ID (mapped data) - GDB became a source of high quality mapping data which were made available both on-line as well as through numerous printed publications. What set GDB apart from other biological databases was its use of world-class leaders in human genetics to act as curators for the data. In order to ensure a high degree of quality, records within GDB were subjected to a process of peer-review, not unlike a traditional publication.
- Entrez Gene ID (mapped data) - Entrez Gene at the NCBI provide curated sequence and descriptive information about genetic loci including official nomenclature, aliases, sequence accessions, phenotypes, EC numbers, MIM numbers, UniGene clusters, homology, map locations, and related web sites. In the HTML results page this ID links to the Entrez Gene page for that gene. Entrez Gene has replaced LocusLink.
- OMIM ID (mapped data) - Identifier provided by Online Mendelian Inheritance in Man (OMIM) at the NCBI. This database is described as a catalog of human genes and genetic disorders containing textual information and references, copious links to MEDLINE and sequence records in the Entrez system, and links to additional related resources at NCBI and elsewhere. In the HTML results page this ID links to the OMIM page for that entry.
- RefSeq (mapped data) - The Reference Sequence (RefSeq) identifier for that entry, provided by the NCBI. As we do not aim to curate all variants of a gene only one mapped RefSeq is displayed per gene report. RefSeq aims to provide a comprehensive, integrated, non-redundant set of sequences, including genomic DNA, transcript (RNA), and protein products. RefSeq identifiers are designed to provide a stable reference for gene identification and characterization, mutation analysis, expression studies, polymorphism discovery, and comparative analyses. In the HTML results page this ID links to the RefSeq page for that entry.
- UniProt ID (mapped data) - The UniProt identifier, provided by the EBI . The UniProt Protein Knowledgebase is described as a curated protein sequence database that provides a high level of annotation, a minimal level of redundancy and high level of integration with other databases. In the HTML results page this ID links to the UniProt page for that entry.
- Ensembl (mapped data) The Ensembl ID is derived from the current build of the Ensembl database and provided by the Ensembl team.
- UCSC (mapped data) The UCSC ID is derived from the current build of the UCSC database