Using Regular Expressions in Find Dialog
The sequence pattern search supports regular expressions. In the following, the characters a, b and c
are used as place holders for any kind of single character or grouped regular expression, and the
characters m and n are used for a number.
- ? matches the preceding expression or the null string
- (e.g.: “AC?G” matches “AG”, and “ACG”)
- * matches the null string or any number of repetitions of the preceding expression
- (e.g.: “AC*G” matches “AG”, “ACG”, “ACCG”, “ACCCG”, and so on)
- + matches one or more repetitions of the preceding expression
- (e.g.: “AC+G” matches “ACG”, “ACCG”, “ACCCG”, “ACCCCG”, and so on)
- {m} matches exactly m repetitions of the preceding expression
- (e.g.: “AC{3}G” matches “ACCCG”)
- {m,n} matches between m and n repetitions of the preceding expression, inclusive
- (e.g.: “AC{2,4}G” matches “ACCG”, “ACCCG”, and “ACCCCG”)
- {m,} matches m or more repetitions of the preceding expression
- (e.g.: “AC{2,}G” matches “ACCG”, “ACCCG”, “ACCCCG”, and so on)
- a|b matches whatever the expression a would match, or whatever the expression b would match
- (e.g.: “ACG|CAA” matches “ACG”, and “CAA”)
- (abc) can be used for grouping expressions in combination with the operators above
- (e.g.: “A(CAT)?G” matches “AG” and “ACATG”)
Base Ambiguities
A 'X' is treated like the ambiguity 'N'. Ambiguities are resolved by default, e.g., 'W' will match to a 'W', 'A' or 'T'.
If the ambiguity resolving is disabled, the 'W' will only match to a 'W'. Blank space characters are ignored.
Using Regular Expressions for File Naming Parameters
The regular expressions for file naming conventions are described in two steps: First it is defined
how the read name could be split into several fields (see below). And secondly it is defined
how these fields should be used.
Standard regular expression can be used to split the read name into fields: each field is defined
by a region that is bracketed “(...)”. In addition parts of the file name can be omitted from field
definitions (these parts are then not bracketed).
Additionally to the above expressions, the following two expressions are useful
for the field splitting:
- \w matches every single alpha-numeric character or to “_”
- (e.g.: “\w+” can matches to every word that contains only letters, numbers or “_“)
- [^ab] matches every single character, that does not match with a or b
- (e.g.: “[^_]” matches to every character except “_“)