This is documentation for Mathematica 8, which was
based on an earlier version of the Wolfram Language.
View current documentation (Version 11.2)
Processing Textual Data
Mathematica has uniquely flexible capabilities for processing large volumes of textual data. Most often data represented as a string is converted to lists or other constructs which can then be manipulated using Mathematica's powerful symbolic language constructs.
Import import data from files or the web
"Text", "PDF", "TeX", "HTML" pick out plain text, table data, etc.
FindList search files for records containing particular strings
StringSplit split a string into words, sentences, etc.
StringCount count occurrences of words, etc.
StringCases find instances of a string pattern
StringExpression match symbolic string patterns
Sort sort into alphabetical order
Tally tally numbers of identical strings
Nearest find the closest-matching string from a list
FindClusters find clusters in string data
EditDistance edit or Levenshtein distance
SequenceAlignment find matching sequences in strings
Hash find a hash code using a variety of schemes
DictionaryLookup look up words in English and other dictionaries
WordData find semantic, grammatical, morphological etc. properties of words