Natural Language Processing

Natural language processing deals with understanding text and spoken words as a human would. It is a fundamental component of many human/machine interactions (vocal assistants, dictation software, voice-operated systems, ...), text processing, analysis (suggestions, keyword spotting, translation, ...) and much more. The Wolfram Language natural language processing functionality is a combination of rule-based and machine learning language models, including LLMs. It builds on top of advanced text mining and string manipulation capabilities and is integrated with a large visualization suite and extensive built-in linguistic data.

Text Generation & Acquisition

LLMSynthesize generate text from a prompt using an LLM

TextRecognize  ▪  ResourceData  ▪  WikipediaData

Import import text from files or the web

"Text"  ▪  "PDF"  ▪  "HTML"  ▪  "CSV"  ▪  ...

LLM-Based Operations »

LLMResourceFunction apply operations from the Wolfram Prompt Repository

LLMFunction apply operations specified by natural language descriptions

LLMPromptGenerator add context-dependent messages to an LLM prompt

LLMPrompt  ▪  LLMTool  ▪  ChatEvaluate  ▪  ...

Text Mining

SemanticSearch search based on the contextual meaning of terms

TextSearch search an index or directory, returning a list of documents

Find, FindList search files for records containing particular strings

StringTake  ▪  StringReplace  ▪  StringCases  ▪  RegularExpression  ▪  ...

Text Normalization »

RemoveDiacritics remove diacritics such as accents, umlauts, etc.

CharacterNormalize reduce or decompose characters to normal forms (e.g. ¼ to 1⁄4)

TextTranslation  ▪  Transliterate  ▪  DeleteStopwords  ▪  WordStem  ▪  ToLowerCase  ▪  ...

Tokenization

StringSplit split a string at spaces or other delimiters

StringCases find cases of string patterns

TextCases  ▪  TextSentences  ▪  TextWords  ▪  TextStructure

Feature Extraction

FeatureExtraction extract numerical features from text

NetModel pre-trained networks for text feature extraction

"GloVe"  ▪  "BERT"  ▪  "ELMo"  ▪  "GPT2"  ▪  ...

NetGraph  ▪  LongShortTermMemoryLayer  ▪  AttentionLayer

"Tokens"  ▪  "SubwordTokens"  ▪  "Characters"  ▪  ...

Content Extraction

TextSummarize automatically produce different types of summarization

FindTextualAnswer attempt to find answers to questions from text

TextContents, TextCases, TextPosition extract semantic elements in text

Text Classification

Classify classify strings based on training data or built-in classifiers

"Language"  ▪  "Profanity"  ▪  "Sentiment"  ▪  ...

LanguageIdentify identify what language a text is in

Text Clustering

FindClusters find clusters in string data

ClusteringTree  ▪  ClusteringComponents  ▪  ClusterClassify

Text Analysis »

WordCounts count of words or -grams

CharacterCounts  ▪  WordFrequency  ▪  WordData  ▪  PartOfSpeech  ▪  ...

Text Visualization

WordCloud generate a word cloud from word frequencies or weights

Snippet extract a snippet of text

Style, Highlighted style text with color, font, size, background, etc.