The Wolfram Language has uniquely flexible capabilities for processing large volumes of textual data. Most often data represented as a string is converted to lists or other constructs, which can then be manipulated using the Wolfram Language's powerful symbolic language constructs.
Import — import data from files or the web
FindList — search files for records containing particular strings
StringSplit — split a string into words, sentences, etc.
StringCount — count occurrences of words etc.
StringCases — find instances of a string pattern
StringExpression — match symbolic string patterns
Sort — sort into alphabetical order
Counts — give counts of how many times strings occur
Classify — classify strings based on training data or built-in classifiers
Nearest — find the closest-matching string from a list
FindClusters — find clusters in string data
EditDistance — edit or Levenshtein distance
SequenceAlignment — find matching sequences in strings
Hash — find a hash code using a variety of schemes
DictionaryLookup — look up words in English and other dictionaries
WordData — find semantic, grammatical, morphological, etc. properties of words
Interpreter — attempt to interpret strings in a wide variety of types
TextRecognize — do OCR on text in an image