Welcome to Tamsin

We would like to introduce a new member of our team: Tamsin Jones, who joined us in January to work as a curator on the Vertebrate Gene Nomenclature Committee (VGNC) project. Tamsin was previously a FlyBase curator at the University of Cambridge, where she worked on ontologies and curation of phenotype data from the Drosophila literature. Tamsin has a BSc(Hons) in Genetics from the University of Otago in New Zealand, and an MA in Organismic and Evolutionary Biology from Harvard University in the USA.

An update on our VGNC project

Largely thanks to the work of our new dedicated VGNC curator Tamsin, and our dedicated VGNC programmer Beth, alongside input from the other HGNC curators, our vertebrate gene naming is rapidly expanding. First, a little recap on how the project works: we select vertebrate species for gene naming based on the quality of their genomes and their relevance to the biomedical community. Initially, we use data from our HCOP tool to identify orthologs that are consistently predicted between human and the vertebrate species in question by four key resources: Panther, NCBI Gene, Ensembl Compara and OMA. For these ‘4 out of 4’ orthologs we automatically transfer the human gene nomenclature onto the other species, provided they have passed a set of additional criteria. The predicted orthologs that fail to meet these criteria are then looked at by a curator, beginning with the ‘3 out of 4’ dataset where three of the four key orthology resources agree.

We currently have a total 54,783 approved symbols over all four of the current VGNC species. These figures break down per species as: chimpanzee - 15,537; dog - 13,622; cow - 13,342; horse - 12,282. Many well studied genes now have approved symbols for their orthologs in all four species, for example MTOR - chimp, dog, cow, horse, ; BRAF - chimp, dog, cow, horse; EGFR - chimp, dog, cow, horse.

Large gene families, especially those where genes are found in clusters, need careful manual curation from the outset. Susan recently manually curated the keratin gene family ahead of presenting a poster at the PAG conference in January, which is mentioned in our Meeting News section below. She found that the two keratin gene clusters are broadly conserved across vertebrates and was able to name most of the keratin genes present on the chimp and cow genome assemblies. The keratin gene family includes unitary pseudogenes, where a one-to-one ortholog can be identified between a pseudogene in one species and a protein coding gene in another vertebrate species; an example is the KRT89P pseudogene in human which is the ortholog of the functional cow KRT89 gene.

If you have an interest in a particular gene family and would like to help us name the gene family members across vertebrates please email vgnc@genenames.org.

Progress on replacing placeholder symbols

Renaming genes with placeholder symbols continues to be one of our current priorities. This ties in with the VGNC project mentioned above because human ‘C#orf’ gene symbols do not transfer well across species. For example, human C17orf64 (chromosome 17 open reading frame 64) is currently approved as C17H17orf64 (chromosome 17 C17orf64 homolog) in chimp, C9H17orf64 (chromosome 9 C17orf64 homolog) in dog, C19H17orf64 (chromosome 19 C17orf64 homolog) in cow and C11H17orf64 (chromosome 11 C17orf64 homolog) in horse. Replacing placeholder symbols with symbols based on function, homology or protein structure provides meaning and allows exact symbol transferral to other species. Some examples of updated placeholder symbols from the last few months are presented below:

Symbol changed from C4orf22 to CFAP299, cilia and flagella associated protein 299, now also approved as CFAP299 in chimp, dog and cow

Symbol changed from C11orf70 to CFAP300, cilia and flagella associated protein 300, now also approved as CFAP300 in chimp and dog

Symbol changed from C7orf49 to CYREN, cell cycle regulator of NHEJ, now also approved as CYREN in chimp, horse and cow

Symbol changed from C2orf71 to PCARE, photoreceptor cilium actin regulator, now also approved as PCARE in chimp, dog, cow and horse

In addition, the following are examples of our placeholder FAM# symbols, which we use to designate “family with sequence similarity #” of unknown function, that have recently been renamed thanks to publications describing functional data for the gene products:

Symbol changed from FAM212A to INKA1, inka box actin regulator 1, now also approved as INKA1 in chimp, dog, cow, horse

Symbol changed from FAM212B to INKA2, inka box actin regulator 2, now also approved as INKA2 in chimp, cow, horse

Symbol changed from FAM109A to PHETA1, PH domain containing endocytic trafficking adaptor 1, now also approved as PHETA1 in chimp, cow and dog

Symbol changed from FAM109B to PHETA2, PH domain containing endocytic trafficking adaptor 2, now also approved as PHETA2 in chimp, cow and dog

In cases where an approved symbol is missing for a particular species, (e.g. there is currently no CFAP299 symbol for horse) this is usually due to either a lack of consensus between orthology resource predictions or problems with mapping between NCBI Gene and Ensembl gene IDs for that particular species, which may be due to differences in the automated gene model annotations.

New Gene Family pages

We are continually adding to our gene family resource. Gene families that were recently curated include:

LIMK/TESK kinase family

Ciliogenesis and planar polarity effector complex (CPLANE)

Myogenic regulatory family (MYF)

Myozenins (MYOZ)

GBAF complex and PBAF complex and BAF complex

Enolases (ENO)

Matrilins (MATN)

Junctophilins (JPH)

Interferon induced transmembrane proteins (IFITM)

PI4KA lipid kinase complex

GDNF family ligands

Neurotrophins (NTF)

Neuroligins (NRLGN)

Neurexins (NRXNs)

tRNA methyltransferases (TRMT)

DAZ RNA binding protein family (DAZ)

Spotlight on a new gene family:

Recent work to reassign the placeholder symbols FAM46A, FAM46B, FAM46C and FAM46D resulted in an in depth consultation with the research community to attempt to unify the nomenclature for all of the non-canonical RNA polymerases. The new root symbol ‘TENT’ for ‘terminal nucleotidyltransferase’ was agreed upon for the FAM46 family and for the genes that were previously approved with the slightly less informative PAPD (“PAP associated domain containing”) root symbol. The ‘TUT’ for ‘terminal uridylyl transferase’ genes were also standardised with TUT1 being retained and ZCCHC11 and ZCCHC6 being renamed with the symbols TUT4 and TUT7, which provide more information on gene function and were already in use in the literature. The ‘TUT’ genes were all assigned ‘TENT’ symbol aliases as ‘TUT’ is a more specific subclass of ‘TENT’. The gene symbol ‘MTPAP’ is widely used by the community so this symbol was retained and the gene was given the alias ‘TENT6’. You can view the entire TENT family at Terminal nucleotidyltransferases.

Gene Symbols in the News

To begin this edition of ‘Gene Symbols in the News’, we have two news stories that highlight the personal impact of genomic medicine and both also feature a striking coincidence: In the first story, a scientist working on the FOX gene family discovered that her disabled daughter has a random mutation in one of her FOXG1 alleles. Dr Lee describes life caring for her daughter and how she is now studying Foxg1 in the brains of mice to eludicate how the mutated FOXG1 protein causes damage to the developing brain, even though there is also expression of a non-mutated allele. The second story describes how the work of the ‘Deciphering Developental Disorders’ project provided a diagnosis for two families with daughters who have learning difficulties; these girls both carry a mutation in the CDK13 gene and although there are just 11 children in the UK who have been identified with this particular mutation, the two girls featured in this article live just 20 minutes from one another. BBC journalists were present when the girls and their families met for the first time.

In other news, a study has suggested that a TRPM8 gene variant linked to incidence of migraine may have helped humans adapt to living in a cold climate; people descended from ancestors living in colder countries like Finland have a much higher incidence of this variant than those descended from ancestors from warmer climes.

An FGF21 gene variant has recently been linked to an increase in sugar and alcohol intake, a slight increase in blood pressure, a larger hip-to-waist ratio, but surprisingly overall lower levels of body fat.

The SFRP1 gene has been reported as a promising target for treating hair loss; the immunosuppresant Cyclosporine A had previously been shown to trigger hair growth at least in part via inhibition of SFRP1 but has many side effects. A recent study has identified an alternative drug, WAY-316606, that is potentially better at inhibiting SFRP1 activity and causes fewer side effects.

Finally, analysing gene activity in recently perished corpses has been shown to be valuable in helping to accurately determine time of death - HBA1 is one such gene whose expression levels were found to be increased in 10 different tissues post-mortem.

Meeting News

In mid January Beth and Susan attended the International Plant & Animal Genome (PAG) XXVI conference in San Diego, USA, where they took part in the EBI workshop, gave a talk in an Equine workshop about progress in the VGNC project, and presented a poster on our VGNC work entitled ‘Standardizing gene names in key vertebrate species’.

At the end of the same month, Ruth attended RNA UK 2018 by the side of Lake Windermere, UK where she presented a poster providing an overview of RNA gene nomenclature.

In March Kris travelled to Reykjavik in Iceland to attend JSConf Iceland, a meeting for the JavaScript community.