Text Manipulation

The Wolfram Language has uniquely flexible capabilities for processing textual data. It can operate at the level of strings and characters or at the level of words and sentences. It can also operate semantically, through its extensive built-in natural language understanding capabilities as well as its ability to use LLM functionality, including through the Wolfram Prompt Repository.

Text Acquisition

Import — import text from files or the web

"Text", "PDF", "TeX", "HTML" — pick out plaintext, table data, etc.

NotebookImport — import text from a notebook

FindList — search files for records containing particular strings

TextString — convert arbitrary expressions to text

TextRecognize — extract text from images using OCR

Text Normalization »

ToLowerCase ▪ ToUpperCase ▪ RemoveDiacritics ▪ CharacterEncoding ▪ ...

DeleteStopwords — delete standard stopwords ("the", "and", etc.) from a string

StringSplit — split a string at newlines or other delimiters

StringReplace ▪ StringDelete ▪ StringTrim ▪ ...

Structural Text Manipulation

TextCases — extract symbolically specified elements

TextSentences — extract a list of sentences

TextWords — extract a list of words

SequenceAlignment — find matching sequences in text

Searching & Pattern Matching »

StringExpression — general string pattern

StringMatchQ ▪ StringCases ▪ StringCount ▪ ...

LLM-Based Text Manipulation »

LLMResourceFunction — apply operations from the Wolfram Prompt Repository

LLMFunction — apply operations specified by natural language descriptions

LLMExampleFunction — apply operations based on examples

LLMSynthesize ▪ LLMPrompt ▪ LLMTool ▪ ...

Text Analysis »

WordCounts — count occurrences of words and -grams

LetterCounts ▪ CharacterCounts ▪ WordCount

Classify — classify strings based on training data or built-in classifiers

Natural Language Processing

LanguageIdentify — determine the language of a text

DictionaryLookup — look up words in English and other dictionaries

WordData — find semantic, grammatical, morphological, etc. properties of words

TextStructure — parse text into its grammatical structure

TextContents — generate a dataset of identified elements in text

SpellingCorrectionList — list of spelling suggestions for misspelled words

Natural Language Understanding »

Interpreter — attempt to interpret strings of a wide variety of types

SemanticInterpretation ▪ SemanticImportString ▪ AmbiguityFunction ▪ ...

Text Generation »

StringTemplate ▪ StringRiffle ▪ TextString ▪ LLMSynthesize ▪ ...

Top

Enable JavaScript to interact with content and submit forms on Wolfram websites. Learn how