Skip to Navigation

Statistics & downloads help

Details about the download files contained within the statistics & downloads page and fields contain within.

 

Contents

 

Statistics & files

The Statistics & downloads page contains tables with breakdown statistics by locus group and locus type of the number of approved symbol reports we have within the database. The tables also contain icons shown below, which enable users to download the data in text (tsv) or JSON format, or link to our custom download application for the chosen dataset. Genes annotated on alternative loci included in the GRC Reference Assembly are shown separately in the second table.

The icons are as follows:

  • image Tab delimited text file. Multple valued fields are double quoted and delimited by | within the quotes. The format of the file should be easily viewable within a spreadsheet application such as excel.
  • image JSON text file (no indentation or white space). Intended for loading into a JSON parser within a script or program
  • image Link to the Custom downloads page for the locus type/group where users can specify exactly what data they wish to download.  

Above the tables there is a karyotype image; clicking on a specific chromosome will change the table statistics to show the data for the selected chromosome.

Beneath the tables we also have text (tsv) and JSON files for our complete HGNC dataset, our gene families dataset and our locus specific database links set.

Fields within the tsv and JSON files

hgnc_id

HGNC ID. A unique ID created by the HGNC for every approved symbol.

symbol

The HGNC approved gene symbol. Equates to the "APPROVED SYMBOL" field within the gene symbol report.

name

HGNC approved name for the gene. Equates to the "APPROVED NAME" field within the gene symbol report.

locus_group

A group name for a set of related locus types as defined by the HGNC (e.g. non-coding RNA).

locus_type

The locus type as defined by the HGNC (e.g. RNA, transfer).

status

Status of the symbol report, which can be either "Approved" or "Entry Withdrawn".

location

Cytogenetic location of the gene (e.g. 2q34).

location_sortable

Same as "location" but single digit chromosomes are prefixed with a 0 enabling them to be sorted in correct numerical order (e.g. 02q34).

alias_symbol

Other symbols used to refer to this gene as seen in the "SYNONYMS" field in the symbol report.

alias_name

Other names used to refer to this gene as seen in the "SYNONYMS" field in the gene symbol report.

prev_symbol

Symbols previously approved by the HGNC for this gene. Equates to the "PREVIOUS SYMBOLS & NAMES" field within the gene symbol report.

prev_name

Gene names previously approved by the HGNC for this gene. Equates to the "PREVIOUS SYMBOLS & NAMES" field within the gene symbol report.

gene_family

Name given to a gene family or group the gene has been assigned to. Equates to the "GENE FAMILY" field within the gene symbol report.

gene_family_id

ID used to designate a gene family or group the gene has been assigned to.

date_approved_reserved

The date the entry was first approved.

date_symbol_changed

The date the gene symbol was last changed.

date_name_changed

The date the gene name was last changed.

date_modified

Date the entry was last modified.

entrez_id

Entrez gene ID. Found within the "GENE RESOURCES" section of the gene symbol report.

ensembl_gene_id

Ensembl gene ID. Found within the "GENE RESOURCES" section of the gene symbol report.

vega_id

Vega gene ID. Found within the "GENE RESOURCES" section of the gene symbol report.

ucsc_id

UCSC gene ID. Found within the "GENE RESOURCES" section of the gene symbol report.

ena

International Nucleotide Sequence Database Collaboration (GenBank, ENA and DDBJ) accession number(s). Found within the "NUCLEOTIDE SEQUENCES" section of the gene symbol report.

refseq_accession

RefSeq nucleotide accession(s). Found within the "NUCLEOTIDE SEQUENCES" section of the gene symbol report.

ccds_id

Consensus CDS ID. Found within the "NUCLEOTIDE SEQUENCES" section of the gene symbol report.

uniprot_ids

UniProt protein accession. Found within the "PROTEIN RESOURCES" section of the gene symbol report.

pubmed_id

Pubmed and Europe Pubmed Central PMID(s).

mgd_id

Mouse genome informatics database ID. Found within the "HOMOLOGS" section of the gene symbol report.

rgd_id

Rat genome database gene ID. Found within the "HOMOLOGS" section of the gene symbol report.

lsdb

The name of the Locus Specific Mutation Database and URL for the gene separated by a | character, e.g. Mutations of the ATP-binding Cassette Transporter Retina|http://www.retina-international.org/files/sci-news/abcrmut.htm

cosmic

Symbol used within the Catalogue of somatic mutations in cancer for the gene.

bioparadigms_slc

Symbol used to link to the SLC tables database at bioparadigms.org for the gene

http://slc.bioparadigms.org/protein?GeneName=<SYMBOL>

horde_id

Symbol used within HORDE for the gene

http://genome.weizmann.ac.il/horde/card/index/symbol:<SYMBOL>

merops

ID used to link to the MEROPS peptidase database

https://merops.sanger.ac.uk/cgi-bin/pepsum?mid=<ID>

imgt

Symbol used within international ImMunoGeneTics information system

http://www.imgt.org/IMGT_GENE-DB/GENElect?query=2+<SYMBOL>&species=Homo+sapiens

iuphar

The objectId used to link to the IUPHAR/BPS Guide to PHARMACOLOGY database. To link to IUPHAR/BPS Guide to PHARMACOLOGY database only use the number (only use 1 from the result objectId:1) in the example URL

http://www.guidetopharmacology.org/GRAC/ObjectDisplayForward?objectId=<ID>

kznf_gene_catalog

ID used to link to the Human KZNF Gene Catalog

http://znf.igb.uiuc.edu/human/action/exploreView?type=locus&id=<ID>

mamit-trnadb

ID to link to the Mamit-tRNA database

http://mamit-trna.u-strasbg.fr/mutations.asp?idAA=<ID>

enzyme_id

ENZYME EC accession number

http://enzyme.expasy.org/EC/<EC ACCESSION NUMBER>

intermediate_filament_db

ID used to link to the Human Intermediate Filament Database

http://www.interfil.org/details.php?id=<ID>