RegularExpression
RegularExpression["regex"]
represents the generalized regular expression specified by the string "regex".
Details
- RegularExpression can be used to represent classes of strings in functions like StringMatchQ, StringReplace, StringCases, and StringSplit.
- RegularExpression supports standard regular expression syntax of the kind used in typical string manipulation languages.
- The following basic elements can be used in regular expression strings:
-
c the literal character c . any character except newline [c1c2…] any of the characters ci [c1-c2] any character in the range c1–c2 [^c1c2…] any character except the ci p* p repeated zero or more times p+ p repeated one or more times p? zero or one occurrence of p p{m,n} p repeated between m and n times p*?,p+?,p?? the shortest consistent strings that match (p1p2…) strings matching the sequence p1, p2, … p1p2 strings matching p1 or p2 - The following represent classes of characters:
-
\\d digit 0–9 \\D nondigit \\s space, newline, tab, or other whitespace character \\S non-whitespace character \\w word character (letter, digit, or _) \\W nonword character [[:class:]] characters in a named class [^[:class:]] characters not in a named class - The following named classes can be used: alnum, alpha, ascii, blank, cntrl, digit, graph, lower, print, punct, space, upper, word, xdigit.
- The following represent positions in strings:
-
^ the beginning of the string (or line) $ the end of the string (or line) \\b word boundary \\B anywhere except a word boundary - The following set options for all regular expression elements that follow them:
-
(?i) treat uppercase and lowercase as equivalent (ignore case) (?m) make ^ and $ match start and end of lines (multiline mode) (?s) allow . to match newline (?-c) unset options - \\., \\[, etc. represent literal characters ., [, etc.
- Analogs of named Wolfram Language patterns such as x:expr can be set up in regular expression strings using (regex).
- Within a regular expression string, \\gn represents the substring matched by the n parenthesized regular expression object (regex). The shorter \\n is often equivalent to \\gn.
- For the purpose of functions such as StringReplace and StringCases, any $n appearing in the right‐hand side of a rule RegularExpression["regex"]->rhs is taken to correspond to the substring matched by the n parenthesized regular expression object in regex. $0 represents the whole matched string.
Examples
open allclose allBasic Examples (2)
Scope (21)
Basic Constructs (17)
Extract any character except newline:
Either of the characters "a" and "b":
Any character between "a" and "e", including "a" and "e":
Any character except "a" and "1":
Any digit repeated one or more times:
The character "a" repeated 2 or 3 times:
Space, newline, tab, or other whitespace character:
Split a string at the beginning of a new line:
Split a string at the end of a new line:
Insert a character at the boundary of each word:
Split a string at every character except at the boundary of a word:
Compound Constructs (4)
StringExpression can contain RegularExpression objects:
Use alternatives to match one or more line breaks:
Non-greedy matches are done by appending a question mark "?" to the quantifiers:
Properties & Relations (3)
Use StringMatchQ to determine string pattern matches:
Use StringCases to find matching substrings:
Use StringSplit to split a string into substrings using a delimiter pattern:
Text
Wolfram Research (2004), RegularExpression, Wolfram Language function, https://reference.wolfram.com/language/ref/RegularExpression.html.
CMS
Wolfram Language. 2004. "RegularExpression." Wolfram Language & System Documentation Center. Wolfram Research. https://reference.wolfram.com/language/ref/RegularExpression.html.
APA
Wolfram Language. (2004). RegularExpression. Wolfram Language & System Documentation Center. Retrieved from https://reference.wolfram.com/language/ref/RegularExpression.html