Text Manipulation

The Wolfram Language has uniquely flexible capabilities for processing textual data. It can operate at the level of strings and characters or at the level of words and sentences. It can also operate semantically, through its extensive built-in natural language understanding capabilities.

ReferenceReference

Text Acquisition

Import import text from files or the web

"Text", "PDF", "TeX", "HTML" pick out plaintext, table data, etc.

NotebookImport import text from a notebook

FindList search files for records containing particular strings

TextString convert arbitrary expressions to text

TextRecognize extract text from images using OCR

Text Normalization »

ToLowerCase  ▪  ToUpperCase  ▪  RemoveDiacritics  ▪  CharacterEncoding  ▪  ...

DeleteStopwords delete standard stopwords ("the", "and", etc.) from a string

StringSplit split a string at newlines or other delimiters

StringReplace  ▪  StringDelete  ▪  StringTrim  ▪  ...

Structural Text Manipulation

TextSentences extract a list of sentences

TextWords extract a list of words

SequenceAlignment find matching sequences in text

Searching & Pattern Matching »

StringExpression general string pattern

StringMatchQ  ▪  StringCases  ▪  StringCount  ▪  ...

Text Analysis »

WordCounts count occurrences of words and -grams

LetterCounts  ▪  CharacterCounts  ▪  WordCount

Classify classify strings based on training data or built-in classifiers

Natural Language Processing

LanguageIdentify determine the language of a text

DictionaryLookup look up words in English and other dictionaries

WordData find semantic, grammatical, morphological, etc. properties of words

Natural Language Understanding »

Interpreter attempt to interpret strings of a wide variety of types

SemanticInterpretation  ▪  SemanticImportString  ▪  AmbiguityFunction  ▪  ...

Text Generation »

StringTemplate  ▪  StringRiffle  ▪  TextString  ▪  Pluralize  ▪  ...