SQL syntax can be used within the WHERE box to limit the data returned to a particular set. The
main operators are =, LIKE, SIMILAR TO and ~. Negative versions of each of these
operators can also be obtained (see below).
The general syntax of an SQL pattern matching command is column_name OPERATOR 'pattern'.
This specifies that you wish to select entries within column column_name that contain or match in some
way the specified pattern. See
column name
definitions for a description of the content of each
column.
Equals
- = is the most simple of the pattern matching operators available
- = can be used to select data in which the column entry is exactly equal to
'pattern'.
- A useful example may be to limit the search to pseudogenes only using:
- NOT = or != can be used to give the reverse result to =, eg to exclude pseudogenes from the
search, use. (Note the limit of 15 records to increase the search speed):
LIKE
- LIKE is useful for slightly more complex queries, in which a non-exact match is required
- It can be used to select data in which the column entry matches a pattern containing the
wildcards % or _:
- % matches 1 or more characters of any type
- _ matches any single character
- This is useful, for example, to limit the search to genes with approved symbols that begin
with a string like OPN, using:
- ILIKE is similar to LIKE but enables case insensitive matching.
- NOT LIKE gives the reverse result to LIKE
- LIKE pattern matches always cover the entire string. To match a pattern anywhere within a string,
the pattern must therefore start and end with a percent sign. For example, to select genes in which any of
the aliases, not just the first, contains TRIM use:
SIMILAR TO
- SIMILAR TO is similar to LIKE, except that
pattern is evaluated as a
regular expression
- SIMILAR TO allows more complex pattern matching to be achieved than with = or LIKE
- The regular expression syntax used is that according to the
SQL99 definition. See Postgres Documentation
for more information.
- Varying numbers of particular characters, or a range of possible characters, can be allowed to match,
using the following quantifiers:
- | denotes alternation (either of two alternatives)
- * denotes repetition of the previous item zero or more times
- + denotes repetition of the previous item one or more times
- Parentheses () may be used to group items into a single logical item
- A bracket expression [...] specifies a character class (ie any of the characters
within that class, such as [a-z]
- Examples:
- Approved symbol is OPN or RHO followed by any number of any character:
- Approved symbol is TRIM followed by 1 or more digits:
Regular expressions
- Regular expressions are useful for specifying complex patterns. The basic operators are described
below, as modified from the Postgres
Documentation. This page also has further information.
- True regular expressions permit much more flexible matching than the SIMILAR TO operator, although they
require more complex syntax and such queries are likely to take much longer to run
- Regular expression operators are used in place of = or LIKE
(eg
column_name ~ pattern)
to determine the way in which the data must match the search pattern:
- ~ Matches regular expression, case sensitive
- ~* Matches regular expression, case insensitive
- !~ Does not match regular expression, case sensitive
- !~* Does not match regular expression, case insensitive
- Character class codes can be used as short hand for groups of characters that must be matched:
- . any single character (except newline)
- [[:digit:]] any digit (0-9)
- [[:allnum:]_] any word character (a-z,0-9,_)
- [[:space:]] any whitespace character (tab, newline or space)
- [^[:digit:]] any non-digit (not 0-9)
- [^[:allnum:]_] any non-word character (not a-z,0-9,_)
- [^[:space:]] any non-whitespace character (not tab, newline or space)
- Regular expression quantifiers describe the number of times a character (or atom)
must be present, in order for the pattern to match. They can be applied to any character class:
- * a sequence of 0 or more matches of the atom
- + a sequence of 1 or more matches of the atom
- ? a sequence of 0 or 1 matches of the atom
- {m} a sequence of exactly m matches of the atom
- {m,} a sequence of m or more matches of the atom
- {m,n} a sequence of m through n (inclusive) matches of the atom; m may not exceed n
- Follow any quantifier by a ? to make it non-greedy- it will now match the
minimum number of characters that still allow the whole pattern to match
- Note that regular expressions will match at any part of the search data and
are not required to match the entire string like LIKE. As a result, wildcards
are not needed at either end of the search pattern.
- Example:
- Aliases that are TRIM (case insensitive) followed by 1 or more
numbers, to distinguish TRIM20 etc from TRIMLESS:
Terms can be combined with and/or
Examples:
- To select the entire TRIM family:
- The whole opsin family (opsins and rhodopsin):