RegularExpression

RegularExpression["regex"]

represents the generalized regular expression specified by the string "regex".

Details

  • RegularExpression can be used to represent classes of strings in functions like StringMatchQ, StringReplace, StringCases, and StringSplit.
  • RegularExpression supports standard regular expression syntax of the kind used in typical string manipulation languages.
  • The following basic elements can be used in regular expression strings:
  • cthe literal character c
    .any character except newline
    [c1c2]any of the characters ci
    [c1-c2]any character in the range c1c2
    [^c1c2]any character except the ci
    p*p repeated zero or more times
    p+p repeated one or more times
    p?zero or one occurrence of p
    p{m,n}p repeated between m and n times
    p*?,p+?,p??the shortest consistent strings that match
    (p1p2)strings matching the sequence p1, p2,
    p1|p2strings matching p1 or p2
  • The following represent classes of characters:
  • \\ddigit 09
    \\Dnondigit
    \\sspace, newline, tab, or other whitespace character
    \\Snon-whitespace character
    \\wword character (letter, digit, or _)
    \\Wnonword character
    [[:class:]]characters in a named class
    [^[:class:]]characters not in a named class
  • The following named classes can be used: alnum, alpha, ascii, blank, cntrl, digit, graph, lower, print, punct, space, upper, word, xdigit.
  • The following represent positions in strings:
  • ^the beginning of the string (or line)
    $the end of the string (or line)
    \\bword boundary
    \\Banywhere except a word boundary
  • The following set options for all regular expression elements that follow them:
  • (?i)treat uppercase and lowercase as equivalent (ignore case)
    (?m)make ^ and $ match start and end of lines (multiline mode)
    (?s)allow . to match newline
    (?-c)unset options
  • \\., \\[, etc. represent literal characters ., [, etc.
  • Analogs of named Wolfram Language patterns such as x:expr can be set up in regular expression strings using (regex).
  • Within a regular expression string, \\gn represents the substring matched by the n ^(th) parenthesized regular expression object (regex). The shorter \\n is often equivalent to \\gn.
  • For the purpose of functions such as StringReplace and StringCases, any $n appearing in the righthand side of a rule RegularExpression["regex"]->rhs is taken to correspond to the substring matched by the n ^(th) parenthesized regular expression object in regex. $0 represents the whole matched string.

Examples

open allclose all

Basic Examples  (2)

Find words involving the characters a, b, c, d, e:

Equivalent form using string patterns:

Decide whether the string consists of words and whitespace:

Equivalent form using string patterns:

Scope  (21)

Basic Constructs  (17)

Extract any character except newline:

Either of the characters "a" and "b":

Any character between "a" and "e", including "a" and "e":

Any character except "a" and "1":

Any digit repeated one or more times:

The character "a" repeated 2 or 3 times:

Any digit:

Nondigit characters:

Space, newline, tab, or other whitespace character:

Non-whitespace characters:

Word characters:

Nonword characters:

Find all uppercase letters:

Split a string at the beginning of a new line:

Split a string at the end of a new line:

Insert a character at the boundary of each word:

Split a string at every character except at the boundary of a word:

Compound Constructs  (4)

StringExpression can contain RegularExpression objects:

Conditional patterns:

Use alternatives to match one or more line breaks:

Non-greedy matches are done by appending a question mark "?" to the quantifiers:

Generalizations & Extensions  (1)

The $1 refers to the letter matched by (.):

Numbered subpatterns:

Properties & Relations  (3)

Use StringMatchQ to determine string pattern matches:

Use StringCases to find matching substrings:

Use StringSplit to split a string into substrings using a delimiter pattern:

Wolfram Research (2004), RegularExpression, Wolfram Language function, https://reference.wolfram.com/language/ref/RegularExpression.html.

Text

Wolfram Research (2004), RegularExpression, Wolfram Language function, https://reference.wolfram.com/language/ref/RegularExpression.html.

CMS

Wolfram Language. 2004. "RegularExpression." Wolfram Language & System Documentation Center. Wolfram Research. https://reference.wolfram.com/language/ref/RegularExpression.html.

APA

Wolfram Language. (2004). RegularExpression. Wolfram Language & System Documentation Center. Retrieved from https://reference.wolfram.com/language/ref/RegularExpression.html

BibTeX

@misc{reference.wolfram_2024_regularexpression, author="Wolfram Research", title="{RegularExpression}", year="2004", howpublished="\url{https://reference.wolfram.com/language/ref/RegularExpression.html}", note=[Accessed: 14-September-2024 ]}

BibLaTeX

@online{reference.wolfram_2024_regularexpression, organization={Wolfram Research}, title={RegularExpression}, year={2004}, url={https://reference.wolfram.com/language/ref/RegularExpression.html}, note=[Accessed: 14-September-2024 ]}