Skip to Navigation

Search help

Our search options allow users to search all of our active gene symbol reports, gene family reports and static web pages quickly and with ease. The search server utilises Apache Solr which offers a powerful full-text search, hit highlighting and faceted searching. Faceted searching allows you to search for a particular keyword and then filter the results by record/page type, locus group and locus type.

The search form can be found within the mast head of each page. Enter a search term within the search input box and click on the spy glass icon; the default option searches the whole site including gene symbol reports and gene families. Users can also click on the drop down to the left of the input box and search only specific areas of the site (i.e symbols, gene families or static web pages).

 Contents  

 

The simplest way to search is to type a query word/ID into the input box within the mast head and click on the spy glass while the dropdown specifies "Search everything". The default option is a full-text search over all the indexes and fields. If reports are found containing the keyword/ID they are displayed in order of relevance (for more information see indexed fields for each search type). On the left hand side of the results page the filter options (where available) are shown with the numbers of reports associated with each facet.  If the results include gene symbols then clicking on the facet "Gene" will filter the results by this type and will change the faceting to display the locus groups and types that are relevant to the search results; enabling further filtering by locus type. Users can also change the default number of results per page from 10 up to 200.

The results display specific fields from within the search index which differ depending on the document type. The first line of each result contains the gene symbol and the gene name if the result is a gene symbol report, the family name if it's a gene family report or the page title if it hits any other page within the site. The second row will show the type of the indexed document (i.e gene, family or site) if searching everything, and will also contain some of the important fields to help identify the hit. The third row reports the field the keyword/ID matches, so if the keyword matched an approved symbol within a gene symbol report the third row would say "Matches: Gene symbol etc" as seen below in figure 1.

basic search example

 

Figure 1: Example of a basic "Search everything" search showing the facets, paging and summarised information.


The search application allows users to make advanced queries using the search box. In this section we describe how to specify the search type, use wildcards, logic operators and specify indexed fields.

Search types

Instead of searching everything, users can select to search only for gene symbol reports, gene families or the static content of the site (e.g. help or news pages) by selecting these options from the dropdown next to the search input box within the mast head.

Sometimes it may be useful to match records based on a query pattern rather than a keyword or ID. Our search allows users to use wildcard operators with an asterisk (*) to stand in place for one or more characters and a question mark (?) to stand in place for a single character substitution. Multple wildcards can be used in the same query, for instance searching for TP?T* will find the symbol TPST2P1.

Logic operators

By default the search application uses the logical operator OR, so inputing BRAF RNA into the search box actually equates to BRAF OR RNA so some of the results will contain BRAF and some will contain RNA and others may contain both. Sometime however this isn't what a user wants to search. Lets say a user wants to find reports that contain both BRAF and RNA in the same report. Typing in BRAF RNA will unfortunately return over 7000 hits. By changing the search query to BRAF AND RNA the results returned are more pertinent and reduces the number of hits returned. Alternatively asking for reports that do not contain a keyword/ID may be preferable therefore users may use NOT or - within term such as BRAF NOT RNA or BRAF -RNA.

Phrases

As discussed above the default operator is OR, so if a search using the term protein arginine methyltransferase 1 was used the actual search term will be protein OR arginine OR methyltransferase OR 1. This can be addressed by simply quoting the query so that the search knows to treat the quoted block as one term like "protein arginine methyltransferase 1".

Indexed fields

Users may search within a specific indexed field by using the information seen in the indexed fields section below using a very basic notation. To specify an index the user need only to type in to the search field the indexed field key followed immediately with a colon (:) and then the query eg refseq_accession:NM_033360. If the query is not an ID or a keyword "phrases" can be used after the colon eg gene_alias_name:"A-kinase anchor protein, 350kDa".

Indexed fields

"Search everything" fields

ccds_id
Consensus CDS ID. This field may be found within the "NUCLEOTIDE SEQUENCES" section of the gene symbol report e.g. CCDS5863.1
ena
International Nucleotide Sequence Database Collaboration (GenBank, ENA and DDBJ) accession number(s). Found within the "NUCLEOTIDE SEQUENCES" section of the gene symbol report e.g. M95712
ensembl_gene_id
Ensembl gene ID. Found within the "GENE RESOURCES" section of the gene symbol report e.g. ENSG00000157764
entrez_id
Entrez gene ID. Found within the "GENE RESOURCES" section of the gene symbol report e.g. 673
family_alias
Other names used to refer to this family as seen within the gene family reports under the "Also known as" field e.g. LPAAT
family_id
The HGNC gene family ID which can be seen in the URL for a gene family page after the word "set" e.g. 46
family_name
The gene family name as set by the HGNC and seen at the top of the gene family reports e.g. 1-acylglycerol-3-phosphate O-acyltransferases
gene_alias_name
Other names used to refer to this gene as seen in the "SYNONYMS" field in the gene symbol report e.g. A-kinase anchor protein, 350kDa
gene_alias_symbol
Other symbols used to refer to this gene as seen in the "SYNONYMS" field in the gene symbol report e.g. BRAF1
gene_name
HGNC approved name for the gene. Equates to the "APPROVED NAME" field within the gene symbol report e.g. zinc finger protein 536
gene_prev_name
Gene names previously approved by the HGNC for this gene. Equates to the "PREVIOUS SYMBOLS & NAMES" field within the gene symbol report e.g. solute carrier family 5 (choline transporter), member 7
gene_prev_symbol
Gene symbols previously approved by the HGNC for this gene. Equates to the "PREVIOUS SYMBOLS & NAMES" field within the gene symbol report e.g. RN5S49
gene_symbol
The HGNC approved gene symbol. Equates to the "APPROVED SYMBOL" field within the gene symbol report e.g. KLF4
hgnc_id
HGNC IDA unique ID created by the HGNC for every approved symbol e.g. 1097
locus_group
A group name for a set of related locus types as defined by the HGNC e.g. non-coding RNA.
locus_type
The locus type as set by the HGNC  e.g.RNA, long non-coding
mgd_id
Mouse genome informatics database ID. Found within the "HOMOLOGS" section of the gene symbol report e.g. MGI:88190
page_content
Text contained within the static web page (i.e text within news or help pages) e.g. "human TBX18 gene"
page_title
The title of the static web page (i.e text within news or help pages) e.g. "TBX18 gene therapy"
page_url
The URL of the indexed static web page (i.e text within news or help pages) e.g. hgnc-newsletter-winter-2012-2013
refseq_accession
RefSeq nucleotide accession. This field may be found within the "NUCLEOTIDE SEQUENCES" section of the gene symbol report e.g. NM_033360
rgd_id
Rat genome database gene ID. Found within the "HOMOLOGS" section of the gene symbol report e.g. RGD:2981
root_symbol
The common root gene symbol associated to a family if a common root symbol exists.
symbol_status
Status of the symbol report, which can be either "Approved" or "Entry Withdrawn" e.g. Approved
ucsc_id
UCSC gene ID. Found within the "GENE RESOURCES" section of the gene symbol report e.g. uc001rgp.1
uniprot_id
UniProt protein accession. Found within the "PROTEIN RESOURCES" section of the gene symbol report e.g. P00568
vega_id.
Vega gene ID. Found within the "GENE RESOURCES" section of the gene symbol report e.g. OTTHUMG00000020722

"Symbol search" fields

alias_name
Other names used to refer to this gene as seen in the "SYNONYMS" field in the gene symbol report e.g. A-kinase anchor protein, 350kDa
alias_symbol
Other symbols used to refer to this geneas seen in the "SYNONYMS" field in the gene symbol report e.g. BRAF1
ccds_id
Consensus CDS ID. This field may be found within the "NUCLEOTIDE SEQUENCES" section of the gene symbol report e.g. CCDS5863.1
ena
International Nucleotide Sequence Database Collaboration (GenBank, ENA and DDBJ) accession number(s). Found within the "NUCLEOTIDE SEQUENCES" section of the gene symbol report e.g. M95712
ensembl_gene_id
Ensembl gene ID. Found within the "GENE RESOURCES" section of the gene symbol report e.g. ENSG00000157764
entrez_id
Entrez gene ID. Found within the "GENE RESOURCES" section of the gene symbol report e.g. 673
hgnc_id
HGNC ID. A unique ID created by the HGNC for every approved symbol e.g. 1097
locus_group
A group name for a set of related locus types as defined by the HGNC e.g. non-coding RNA
locus_type
The locus type as set by the HGNC e.g. RNA, long non-coding
mgd_id
Mouse genome informatics database ID. Found within the "HOMOLOGS" section of the gene symbol report e.g. MGI:88190
name
HGNC approved name for the gene. Equates to the "APPROVED NAME" field within the gene symbol report e.g. zinc finger protein 536
prev_name
Gene names previously approved by the HGNC for this gene. Equates to the "PREVIOUS SYMBOLS & NAMES" field within the gene symbol report. e.g. solute carrier family 5 (choline transporter), member 7
prev_symbol
Symbols previously approved by the HGNC for this gene. Equates to the "PREVIOUS SYMBOLS & NAMES" field within the gene symbol report e.g. RN5S49
refseq_accession
RefSeq nucleotide accession. This field may be found within the "NUCLEOTIDE SEQUENCES" section of the gene symbol report e.g. NM_033360
rgd_id
Rat genome database gene ID. Found within the "HOMOLOGS" section of the gene symbol report e.g. RGD:2981
status
HGNC status for gene symbol reports the values of which will either be "Approved" or "Entry Withdrawn" e.g. Approved
symbol
The HGNC approved gene symbol. Equates to the "APPROVED SYMBOL" field within the gene symbol report e.g. KLF4
ucsc_id
UCSC gene ID. Found within the "GENE RESOURCES" section of the gene symbol report e.g. uc001rgp.1
uniprot_ids
UniProt protein accession. Found within the "PROTEIN RESOURCES" section of the gene symbol report e.g. P00568
vega_id
Vega gene ID. Found within the "GENE RESOURCES" section of the gene symbol report e.g. OTTHUMG00000020722

"Gene family search" fields

family_alias
Other names used to refer to this family as seen within the gene family reports under the "Also known as" field e.g. LPAAT
family_id
The HGNC gene family ID which can be seen in the URL for a gene family page after the word "set" e.g. 46
family_name
The gene family name as set by the HGNC and seen at the top of the gene family reports e.g. 1-acylglycerol-3-phosphate O-acyltransferases
gene_alias_name
Other names used to refer to a gene associated with the family and seen in the "SYNONYMS" field in the gene symbol report e.g. lysophosphatidic acid acyltransferase, beta
gene_alias_symbol
Other symbols used to refer to a gene associated with the family and  seen in the "SYNONYMS" field in the gene symbol report e.g. MAG1
gene_name
HGNC approved name for a gene associated with the family and equates to the "APPROVED NAME" field within the gene symbol report e.g. zinc finger protein 536
gene_prev_name
Gene names previously approved by the HGNC for genes associated with the gene family and equates to the "PREVIOUS SYMBOLS & NAMES" field within the gene symbol report e.g. Berardinelli-Seip congenital lipodystrophy
gene_prev_symbol
Gene symbols previously approved by the HGNC for genes associated with the gene family and equates to the "PREVIOUS SYMBOLS & NAMES" field within the gene symbol report e.g. BSCL
gene_symbol
The HGNC approved gene symbol for a gene associated to the gene family and equates to the "APPROVED SYMBOL" field within the gene symbol report e.g. AGPAT1
hgnc_id
HGNC ID. A unique ID created by the HGNC for every approved symbol e.g. HGNC:324
root_symbol
The common root gene symbol associated to a family if a common root symbol exists e.g. AGPAT

"Site search" fields

page_content
Text contained within a static web page (i.e text within news or help pages) e.g. "human TBX18 gene"
page_title
The title of a static web page (i.e text within news or help pages) e.g. "TBX18 gene therapy"
page_url
The URL of an indexed static web page (i.e text within news or help pages) e.g. hgnc-newsletter-winter-2012-2013