This is documentation for Mathematica 8, which was
based on an earlier version of the Wolfram Language.
View current documentation (Version 11.1)

RegularExpression

RegularExpression
represents the generalized regular expression specified by the string .
  • RegularExpression supports standard regular expression syntax of the kind used in typical string manipulation languages.
  • The following basic elements can be used in regular expression strings:
cthe literal character c
.any character except newline
[c1c2...]any of the characters
[c1-c2]any character in the range -
[^c1c2...]any character except the
p*p repeated zero or more times
p+p repeated one or more times
p?zero or one occurrence of p
p{m,n}p repeated between m and n times
p*?,p+?,p??the shortest consistent strings that match
(p1p2...)strings matching the sequence , , ...
p1|p2strings matching or
  • The following represent classes of characters:
\\ddigit 0-9
\\Dnondigit
\\sspace, newline, tab, or other whitespace character
\\Snon-whitespace character
\\wword character (letter, digit, or )
\\Wnonword character
[[:class:]]characters in a named class
[^[:class:]]characters not in a named class
  • The following named classes can be used: , , , , , , , , , , , , , .
  • The following represent positions in strings:
^the beginning of the string (or line)
$the end of the string (or line)
\\bword boundary
\\Banywhere except a word boundary
  • The following set options for all regular expression elements that follow them:
(?i)treat uppercase and lowercase as equivalent (ignore case)
(?m)make and match start and end of lines (multiline mode)
(?s)allow to match newline
(?-c)unset options
  • , , etc. represent literal characters , , etc.
  • Analogs of named Mathematica patterns such as can be set up in regular expression strings using (regex).
  • Within a regular expression string, \\n represents the substring matched by the ^(th) parenthesized regular expression object (regex).
  • For the purpose of functions such as StringReplace and StringCases, any appearing in the right-hand side of a rule RegularExpression["regex"]->rhs is taken to correspond to the substring matched by the ^(th) parenthesized regular expression object in regex. represents the whole matched string.
Find words involving the characters a, b, c, d, e:
Equivalent form using string patterns:
Decide whether the string consists of words and whitespace:
Equivalent form using string patterns:
Find words involving the characters a, b, c, d, e:
In[1]:=
Click for copyable input
Out[1]=
Equivalent form using string patterns:
In[2]:=
Click for copyable input
Out[2]=
 
Decide whether the string consists of words and whitespace:
In[1]:=
Click for copyable input
Out[1]=
Equivalent form using string patterns:
In[2]:=
Click for copyable input
Out[2]=
Extract any character except newline:
Either of the characters "a" and "b":
Any character between "a" and "e", including "a" and "e":
Any character except "a" and "1":
Any digit repeated one or more times:
The character "a" repeated 2 or 3 times:
Any digit:
Nondigit characters:
Space, newline, tab, or other whitespace character:
Non-whitespace characters:
Word characters:
Nonword characters:
Find all uppercase letters:
Split a string at the beginning of a new line:
Split a string at the end of a new line:
Insert a character at the boundary of each word:
Split a string at every character except at the boundary of a word:
StringExpression can contain RegularExpression objects:
Conditional patterns:
Use alternatives to match one or more line breaks:
Non-greedy matches are done by appending a question mark "?" to the quantifiers:
The refers to the letter matched by :
Numbered subpatterns:
Use StringMatchQ to determine string pattern matches:
Use StringCases to find matching substrings:
Use StringSplit to split a string into substrings using a delimiter pattern:
New in 5.1