has uniquely flexible capabilities for processing large volumes of textual data. Most often data represented as a string is converted to lists or other constructs which can then be manipulated using Mathematica
's powerful symbolic language constructs.
import data from files or the web
, "HTML" —
pick out plain text, table data, etc.
search files for records containing particular strings
split a string into words, sentences, etc.
count occurrences of words, etc.
find instances of a string pattern
match symbolic string patterns
sort into alphabetical order
find the closest-matching string from a list
find clusters in string data
edit or Levenshtein distance
find a hash code using a variety of schemes
look up words in an English dictionary
find semantic, grammatical, morphological etc. properties of words