Enable JavaScript to interact with content and submit forms on Wolfram websites. Learn how

Natural Language Processing

Natural language processing deals with understanding text and spoken words as a human would. It is a fundamental component of many human/machine interactions (vocal assistants, dictation software, voice-operated systems, ...), text processing, analysis (suggestions, keyword spotting, translation, ...) and much more. The Wolfram Language natural language processing functionality is a combination of rule-based and machine learning language models, including LLMs. It builds on top of advanced text mining and string manipulation capabilities and is integrated with a large visualization suite and extensive built-in linguistic data.

Text Generation & Acquisition

LLMSynthesize — generate text from a prompt using an LLM

TextRecognize ▪ ResourceData ▪ WikipediaData

Import — import text from files or the web

"Text" ▪ "PDF" ▪ "HTML" ▪ "CSV" ▪ ...

LLM-Based Operations »

LLMResourceFunction — apply operations from the Wolfram Prompt Repository

LLMFunction — apply operations specified by natural language descriptions

LLMPromptGenerator — add context-dependent messages to an LLM prompt

LLMPrompt ▪ LLMTool ▪ ChatEvaluate ▪ ...

Text Mining

SemanticSearch — search based on the contextual meaning of terms

TextSearch — search an index or directory, returning a list of documents

Find, FindList — search files for records containing particular strings

StringTake ▪ StringReplace ▪ StringCases ▪ RegularExpression ▪ ...

Text Normalization »

RemoveDiacritics — remove diacritics such as accents, umlauts, etc.

CharacterNormalize — reduce or decompose characters to normal forms (e.g. ¼ to 1⁄4)

TextTranslation ▪ Transliterate ▪ DeleteStopwords ▪ WordStem ▪ ToLowerCase ▪ ...

Tokenization

StringSplit — split a string at spaces or other delimiters

StringCases — find cases of string patterns

TextCases ▪ TextSentences ▪ TextWords ▪ TextStructure

Feature Extraction

FeatureExtraction — extract numerical features from text

NetModel — pre-trained networks for text feature extraction

"GloVe" ▪ "BERT" ▪ "ELMo" ▪ "GPT2" ▪ ...

NetGraph ▪ LongShortTermMemoryLayer ▪ AttentionLayer

"Tokens" ▪ "SubwordTokens" ▪ "Characters" ▪ ...

Content Extraction

TextSummarize — automatically produce different types of summarization

FindTextualAnswer — attempt to find answers to questions from text

TextContents, TextCases, TextPosition — extract semantic elements in text

Text Classification

Classify — classify strings based on training data or built-in classifiers

"Language" ▪ "Profanity" ▪ "Sentiment" ▪ ...

LanguageIdentify — identify what language a text is in

Text Clustering

FindClusters — find clusters in string data

ClusteringTree ▪ ClusteringComponents ▪ ClusterClassify

Text Analysis »

WordCounts — count of words or -grams

CharacterCounts ▪ WordFrequency ▪ WordData ▪ PartOfSpeech ▪ ...

Text Visualization

WordCloud — generate a word cloud from word frequencies or weights

Snippet — extract a snippet of text

Style, Highlighted — style text with color, font, size, background, etc.

Top