Mathematica has uniquely flexible capabilities for processing large volumes of textual data. Most often data represented as a string is converted to lists or other constructs, which can then be manipulated using Mathematica's powerful symbolic language constructs.
Import — import data from files or the web
FindList — search files for records containing particular strings
StringSplit — split a string into words, sentences, etc.
StringCount — count occurrences of words etc.
StringCases — find instances of a string pattern
StringExpression — match symbolic string patterns
Sort — sort into alphabetical order
Tally — tally numbers of identical strings
Nearest — find the closest-matching string from a list
FindClusters — find clusters in string data
EditDistance — edit or Levenshtein distance
SequenceAlignment — find matching sequences in strings
Hash — find a hash code using a variety of schemes
DictionaryLookup — look up words in English and other dictionaries
WordData — find semantic, grammatical, morphological, etc. properties of words