Mathematica > Core Language > String Manipulation > String Patterns > RegularExpression >
Mathematica > Core Language > Rules & Patterns > Patterns > String Patterns > RegularExpression >

RegularExpression

RegularExpression
represents the generalized regular expression specified by the string .
  • RegularExpression supports standard regular expression syntax of the kind used in typical string manipulation languages.
  • The following basic elements can be used in regular expression strings:
cthe literal character c
.any character except newline
[c1c2...]any of the characters
[c1-c2]any character in the range -
[^c1c2...]any character except the
p*p repeated zero or more times
p+p repeated one or more times
p?zero or one occurrence of p
p{m,n}p repeated between m and n times
p*?,p+?,p??the shortest consistent strings that match
(p1p2...)strings matching the sequence , , ...
p1|p2strings matching or
  • The following represent classes of characters:
\\ddigit 0-9
\\Dnondigit
\\sspace, newline, tab, or other whitespace character
\\Snon-whitespace character
\\wword character (letter, digit, or )
\\Wnonword character
[[:class:]]characters in a named class
[^[:class:]]characters not in a named class
  • The following named classes can be used: , , , , , , , , , , , , , .
  • The following represent positions in strings:
^the beginning of the string (or line)
$the end of the string (or line)
\\bword boundary
\\Banywhere except a word boundary
  • The following set options for all regular expression elements that follow them:
(?i)treat uppercase and lowercase as equivalent (ignore case)
(?m)make and match start and end of lines (multiline mode)
(?s)allow to match newline
(?-c)unset options
  • , , etc. represent literal characters , , etc.
  • Analogs of named Mathematica patterns such as can be set up in regular expression strings using (regex).
  • Within a regular expression string, \\n represents the substring matched by the ^(th) parenthesized regular expression object (regex).
  • For the purpose of functions such as StringReplace and StringCases, any appearing in the right-hand side of a rule RegularExpression["regex"]->rhs is taken to correspond to the substring matched by the ^(th) parenthesized regular expression object in regex. represents the whole matched string.
Find words involving the characters a, b, c, d, e:
Equivalent form using string patterns:
Decide whether the string consists of words and whitespace:
Equivalent form using string patterns:
Find words involving the characters a, b, c, d, e:
In[1]:=
Click for copyable input
Out[1]=
Equivalent form using string patterns:
In[2]:=
Click for copyable input
Out[2]=
 
Decide whether the string consists of words and whitespace:
In[1]:=
Click for copyable input
Out[1]=
Equivalent form using string patterns:
In[2]:=
Click for copyable input
Out[2]=
Extract any character except newline:
Either of the characters "a" and "b":
Any character between "a" and "e", including "a" and "e":
Any character except "a" and "1":
Any digit repeated one or more times:
The character "a" repeated 2 or 3 times:
Any digit:
Nondigit characters:
Space, newline, tab, or other whitespace character:
Non-whitespace characters:
Word characters:
Nonword characters:
Find all uppercase letters:
Split a string at the beginning of a new line:
Split a string at the end of a new line:
Insert a character at the boundary of each word:
Split a string at every character except at the boundary of a word:
StringExpression can contain RegularExpression objects:
Conditional patterns:
Use alternatives to match one or more line breaks:
Non-greedy matches are done by appending a question mark "?" to the quantifiers:
The refers to the letter matched by :
Numbered subpatterns:
Use StringMatchQ to determine string pattern matches:
Use StringCases to find matching substrings:
Use StringSplit to split a string into substrings using a delimiter pattern:
New in 5.1
Ask a question about this page  |  Suggest an improvement  |  Leave a message for the team
Format:   HTML  |  CDF