Text Manipulation

The Wolfram Language has uniquely flexible capabilities for processing textual data. It can operate at the level of strings and characters or at the level of words and sentences. It can also operate semantically, through its extensive built-in natural language understanding capabilities as well as its ability to use LLM functionality, including through the Wolfram Prompt Repository.

Text Acquisition

Import import text from files or the web

"Text", "PDF", "TeX", "HTML" pick out plaintext, table data, etc.

NotebookImport import text from a notebook

FindList search files for records containing particular strings

TextString convert arbitrary expressions to text

TextRecognize extract text from images using OCR

Text Normalization »

ToLowerCase  ▪  ToUpperCase  ▪  RemoveDiacritics  ▪  CharacterEncoding  ▪  ...

DeleteStopwords delete standard stopwords ("the", "and", etc.) from a string

StringSplit split a string at newlines or other delimiters

StringReplace  ▪  StringDelete  ▪  StringTrim  ▪  ...

Structural Text Manipulation

TextCases extract symbolically specified elements

TextSentences extract a list of sentences

TextWords extract a list of words

SequenceAlignment find matching sequences in text

Searching & Pattern Matching »

StringExpression general string pattern

StringMatchQ  ▪  StringCases  ▪  StringCount  ▪  ...

LLM-Based Text Manipulation »

LLMResourceFunction apply operations from the Wolfram Prompt Repository

LLMFunction apply operations specified by natural language descriptions

LLMExampleFunction apply operations based on examples

LLMSynthesize  ▪  LLMPrompt  ▪  LLMTool  ▪  ...

Text Analysis »

WordCounts count occurrences of words and -grams

LetterCounts  ▪  CharacterCounts  ▪  WordCount

Classify classify strings based on training data or built-in classifiers

Natural Language Processing

LanguageIdentify determine the language of a text

DictionaryLookup look up words in English and other dictionaries

WordData find semantic, grammatical, morphological, etc. properties of words

TextStructure parse text into its grammatical structure

TextContents generate a dataset of identified elements in text

SpellingCorrectionList list of spelling suggestions for misspelled words

Natural Language Understanding »

Interpreter attempt to interpret strings of a wide variety of types

SemanticInterpretation  ▪  SemanticImportString  ▪  AmbiguityFunction  ▪  ...

Text Generation »

StringTemplate  ▪  StringRiffle  ▪  TextString  ▪  LLMSynthesize  ▪  ...