Regular Expressions
General
Mathematica patterns provide a powerful way to do string manipulation. But particularly if you are familiar with specialized string manipulation languages, you may sometimes find it convenient to specify string patterns using
regular expression notation. You can do this in
Mathematica with
RegularExpression objects.
Using regular expression notation in Mathematica.
This replaces all occurrences of

or

.
| Out[1]= |  |
This specifies the same operation using a general
Mathematica string pattern.
| Out[2]= |  |
You can mix regular expressions with general patterns.
| Out[3]= |  |
RegularExpression in
Mathematica supports all standard regular expression constructs.
| c | the literal character c |
| . | any character except newline |
| [c1c2...] | any of the characters  |
| [c1-c2] | any character in the range - |
| [^c1c2...] | any character except the  |
| p* | p repeated zero or more times |
| p+ | p repeated one or more times |
| p? | zero or one occurrence of p |
| p{m,n} | p repeated between m and n times |
| p*?, p+?, p?? | the shortest consistent strings that match |
| (p1p2...) | strings matching the sequence  |
| p1|p2 | strings matching or  |
Basic constructs in Mathematica regular expressions.
This finds substrings that match the specified regular expression.
| Out[4]= |  |
This does the same operation with a general
Mathematica string pattern.
| Out[5]= |  |
There is a close correspondence between many regular expression constructs and basic general
Mathematica string pattern constructs.
Correspondences between regular expression and general string pattern constructs.
Just as in general
Mathematica string patterns, there are special notations in regular expressions for various common classes of characters. Note that you need to use double backslashes (

) to enter most of these notations in
Mathematica regular expression strings.
Regular expression notations for classes of characters.
This gives each occurrence of

followed by digit characters.
| Out[6]= |  |
Here is the same thing done with a general
Mathematica string pattern.
| Out[7]= |  |
Mathematica supports the standard POSIX character classes

,

,

,

,

,

,

,

,

,

,

,

,

,

.
This finds runs of uppercase letters.
| Out[8]= |  |
This does the same thing.
| Out[9]= |  |
Regular expression notations for positions in strings.
In general
Mathematica patterns, you can use constructs like

and

to give arbitrary names to objects that are matched. In regular expressions, there is a way to do something somewhat like this using numbering: the
n
parenthesized pattern object
(p) in a regular expression can be referred to as
\\n within the body of the pattern, and

outside it.
This finds pairs of identical letters that appear together.
| Out[10]= |  |
This does the same thing using a general
Mathematica string pattern.
| Out[11]= |  |
The

refers to the letter matched by

.
| Out[12]= |  |
Here is the
Mathematica pattern version.
| Out[13]= |  |