StringCases

StringCases["string",patt]

gives a list of the substrings in "string" that match the string expression patt.

StringCases["string",lhsrhs]

gives a list of the values of rhs corresponding to the substrings that match the string expression lhs.

StringCases["string",p,n]

includes only the first n substrings that match.

StringCases["string",{p1,p2,}]

gives substrings that match any of the pi.

StringCases[{s1,s2,},p]

gives the list of results for each of the si.

StringCases[patt]

represents an operator form of StringCases that can be applied to an expression.

Details and Options

  • String expressions can contain any of the objects specified in the notes for StringExpression.
  • With the default option setting Overlaps->False, StringCases includes only substrings that do not overlap. With Overlaps->True, it includes substrings that overlap.
  • With Overlaps->All, multiple substrings that match the same string expression are all included. With Overlaps->True, only the first such matching substring at a given position is included.
  • Setting the option IgnoreCase->True makes StringCases treat lowercase and uppercase letters as equivalent.
  • StringCases["string",RegularExpression["regex"]] gives substrings matching the specified regular expression.
  • StringCases[s,lhs:>rhs] evaluates rhs only when the pattern is found.
  • StringCases[patt][expr] is equivalent to StringCases[expr, patt].
  • StringCases[BioSequence["type","seq"],patt,] finds cases of patt in the string "seq" yielding a list of biomolecular sequences. In this case, degenerate letters in patt are interpreted as wildcard patterns based on the type of biomolecular sequence. Use Verbatim["patt"] to match degenerate letters literally.
  • The documentation for BioSequence lists the degenerate letters supported by each type of biomolecular sequence.
  • If the biomolecular sequence operated upon by StringCases is circular, wraparound matches are possible.

Background & Context

Examples

open allclose all

Basic Examples  (3)

Find the substrings matching a pattern:

Return only the named wildcard character in each substring:

Use the operator form of StringCases:

Scope  (11)

Use string patterns:

Use a regular expression:

Use pattern matching for dates:

Mixed regular expressions and string patterns:

Rules to extract values corresponding to matching substrings:

Include only the two first strings that match:

Occurrences in either substring:

StringCases automatically threads over lists of strings:

Find codon-length subsequences in a DNA sequence:

Use a wildcard in the pattern found in a given biomolecular sequence:

The "Y" is a degenerate letter and is not a wildcard except in biomolecular sequences:

Additional wraparound matches may be found in circular biomolecular sequences:

Match only literal degenerate letter occurrences using Verbatim:

Options  (3)

IgnoreCase  (1)

Find all substrings "cat", including use of uppercase letters:

Overlaps  (2)

Find all runs of two or more letters starting with the letter "a":

Allow overlaps between the substrings:

Allow multiple substrings to start at the same character as well:

Find subsequences in a circular DNA sequence:

Allow overlaps between the subsequences:

Applications  (3)

Extract phone numbers from a text:

Find the sequence of section numbers in the US Constitution:

Find the numbers of amendments to the Constitution:

Primes whose digits are all consecutive:

Properties & Relations  (2)

StringCount gives the number of matching substrings:

The length of matching substrings:

Use StringPosition to get the position of matching substrings:

Check:

Possible Issues  (1)

Use :> rather than -> if the right-hand side of a rule contains string operations:

Using -> in such cases leads to immediate evaluation and possible error messages:

Neat Examples  (1)

Find the sequence of words in the US Constitution:

Total number of words:

Number of distinct words:

Logarithmic frequency distribution of words:

Ten most common words:

Wolfram Research (2004), StringCases, Wolfram Language function, https://reference.wolfram.com/language/ref/StringCases.html (updated 2020).

Text

Wolfram Research (2004), StringCases, Wolfram Language function, https://reference.wolfram.com/language/ref/StringCases.html (updated 2020).

CMS

Wolfram Language. 2004. "StringCases." Wolfram Language & System Documentation Center. Wolfram Research. Last Modified 2020. https://reference.wolfram.com/language/ref/StringCases.html.

APA

Wolfram Language. (2004). StringCases. Wolfram Language & System Documentation Center. Retrieved from https://reference.wolfram.com/language/ref/StringCases.html

BibTeX

@misc{reference.wolfram_2023_stringcases, author="Wolfram Research", title="{StringCases}", year="2020", howpublished="\url{https://reference.wolfram.com/language/ref/StringCases.html}", note=[Accessed: 18-March-2024 ]}

BibLaTeX

@online{reference.wolfram_2023_stringcases, organization={Wolfram Research}, title={StringCases}, year={2020}, url={https://reference.wolfram.com/language/ref/StringCases.html}, note=[Accessed: 18-March-2024 ]}