New VGNC gene nomenclature for cow, dog and horse!
We are excited to announce that, in addition to chimpanzee gene nomenclature, the VGNC website is now populated with gene symbols for cow, dog and horse. The VGNC initially uses a software pipeline based on the HGNC Consensus Orthology Predictions (HCOP) tool to transfer human gene symbols and names automatically to genes in other species where the same orthologs are predicted by four different resources (Ensembl, NCBI Gene, OMA and PANTHER). As you might expect, chimp is still the winner in terms of number of gene symbols (14846) as this includes manually curated non-consensus orthologs we have approved over the past year, but the other three species are not too far behind: cow = 11965 symbols; dog = 11474 symbols; horse = 10552 symbols.
Nobody should ever be confused by which species they are looking at when using vertebrate.genenames.org as we have handy species graphics that both accompany our search results and are present at the top right of every Symbol Report. VGNC Symbol Reports display the nomenclature, the unique VGNC ID, details on the species, links to the equivalent gene report in Ensembl and NCBI Gene, and links to the human ortholog on genenames.org. Where available there are also links to orthologs of the gene in the other three VGNC species. For an example, please see the chimp CDH7, cow CDH7, dog CDH7 and horse CDH7 Symbol Reports. Cow genes have an additional link to gene reports in the Bovine Genome Database where these are available.
The gene search facility on vertebrate.genenames.org works in the same way as the search on genenames.org but the most important difference is that the VGNC site allows filtering by species as part of the facets provided on the left-hand side of the search results. A good example search is for ABCA4. You can also browse the entire VGNC dataset using the gene data tab which provides links to Symbol Reports in the same format as the search results with the same species filters. We also provide statistics and downloads for the full dataset of each species. You simply select the species of interest from the dropdown box and you also have the option of filtering by chromosome. Data can be downloaded either as a text or JSON file. There is also one file that provides the complete VGNC dataset for all available VGNC species.
The next step for cow, dog, and horse nomenclature will be for curators to go through data manually where the HCOP orthology predictions were not entirely consistent, and for cases where there is not a one-to-one ortholog between human and the other species, as we have already been doing for chimp. This will take us some time! We also plan to add nomenclature for further vertebrate species using the automated VGNC software pipeline. Our criteria for choosing further species are the quality of the genome assembly and annotation, the perceived value as a research organism, and the level of support from the scientific community. Please contact us at firstname.lastname@example.org with feedback on the gene nomenclature or the functionality of the VGNC website.
Renaming of placeholder C#orf symbols
We still have over 350 human protein-coding genes named with a C#orf$ symbol, where # represents the chromosome on which the gene is located and $ is the next number in a numerical series. These symbols are assigned to genes where there is no identified function, family member, clear named ortholog, or predicted structural information for the gene product at the time of naming. Over time such information may become available which allows us to perform a rename. This renaming of placeholder C#orf symbols is important for two of our current major project aims: transfer of data across vertebrate species, as described above, and stabilisation of symbols to support work in the clinical community. Therefore C#orf renaming is one of our core priorities.
We have changed over 10% of C#orf symbols since May, a total of 37. One example is the gene previously known as C4orf26 which was highlighted to us as being clinically relevant on separate occasions by staff at the TGMI and Genomics England projects due to its association with Amelogenesis imperfecta, type IIA4; following consultation with researchers who have published work on the gene we have been able to rename this as ODAPH for ‘odontogenesis associated phosphoprotein’. Other examples of renames based on publications include C14orf159 to DGLUCY, D-glutamate cyclase; C19orf43 to TRIR, ‘telomerase RNA component interacting RNase’; C14orf80 to TEDC1, tubulin epsilon and delta complex 1 and C16orf59 to TEDC2, tubulin epsilon and delta complex 2. Examples of genes renamed based on newly identified family membership are CXorf23 which is now BCLAF3 for ‘BCLAF1 and THRAP3 family member 3’ and C17orf74 which was renamed to SPEM2 for ‘SPEM family member 2’. If you have, or know of, any data that would help us to rename any C#orf symbols please email us at email@example.com.
New Gene Family Resources
Our gene family set continues to grow month upon month. We have recently added quite a few examples of genes that encode subunits of complexes, including the Epidermal differentiation complex that is split into the Cornified envelope precursor family, the Small proline rich proteins, the Late cornified envelope proteins, the S100 calcium binding proteins and the S100 fused type protein family. Synaptophysins and Synaptogyrins are new additions to the large Tetraspan junctional complex superfamily and we have added individual gene family pages for the MSL histone acetyltransferase complex, NSL histone acetyltransferase complex, SMN complex and Pyruvate dehydrogenase complex.
Gene Symbols in the News
The most heavily featured gene in the news recently is undoubtedly MYBPC3 following correction of human embryos that were heterozygous for a pathogenic mutation of this gene using the CRISPR/Cas9 editing technique. In other news, the activity of the HDAC2 enzyme has been linked to silencing of genes involved in short term memory, giving hope that this enzyme could provide a therapeutic strategy for conditions such as Alzheimer disease. MUC7 also made an appearance due to a study which suggested an unusual MUC7 haplotype only found in sub-Saharan African populations may have originated from ancient interbreeding with an archaic hominin.
Paul and Beth attended the 36th International Society for Animal Genetics Conference (ISAG) in Dublin, Ireland from July. Paul gave a talk and both presented a poster on the work of the VGNC.
Beth will be presenting a poster at the Animal Genetics and Diseases meeting (Hinxton, UK) 20-22 September 2017 about the VGNC.
Elspeth will be joining in the discussions with our collaborators at NC-IUPHAR in their Paris meeting from 13-15 October.
Paul will be promoting the use of HGNC IDs in a poster at ASHG 2017 in Orlando, Florida this October.