Natural Language Processing
Natural language processing deals with understanding text and spoken words as a human would. It is a fundamental component of many human/machine interactions (vocal assistants, dictation software, voice-operated systems, ...), text processing, analysis (suggestions, keyword spotting, translation, ...) and much more. The Wolfram Language natural language processing functionality is a combination of rule-based and machine learning language models, including LLMs. It builds on top of advanced text mining and string manipulation capabilities and is integrated with a large visualization suite and extensive built-in linguistic data.
Text Generation & Acquisition
LLMSynthesize — generate text from a prompt using an LLM
TextRecognize ▪ ResourceData ▪ WikipediaData
Import — import text from files or the web
"Text" ▪ "PDF" ▪ "HTML" ▪ "CSV" ▪ ...
LLM-Based Operations »
LLMResourceFunction — apply operations from the Wolfram Prompt Repository
LLMFunction — apply operations specified by natural language descriptions
LLMPromptGenerator — add context-dependent messages to an LLM prompt
LLMPrompt ▪ LLMTool ▪ ChatEvaluate ▪ ...
Text Mining
SemanticSearch — search based on the contextual meaning of terms
TextSearch — search an index or directory, returning a list of documents
Find, FindList — search files for records containing particular strings
StringTake ▪ StringReplace ▪ StringCases ▪ RegularExpression ▪ ...
Text Normalization »
RemoveDiacritics — remove diacritics such as accents, umlauts, etc.
CharacterNormalize — reduce or decompose characters to normal forms (e.g. ¼ to 1⁄4)
TextTranslation ▪ Transliterate ▪ DeleteStopwords ▪ WordStem ▪ ToLowerCase ▪ ...
Tokenization
StringSplit — split a string at spaces or other delimiters
StringCases — find cases of string patterns
TextCases ▪ TextSentences ▪ TextWords ▪ TextStructure
Feature Extraction
FeatureExtraction — extract numerical features from text
NetModel — pre-trained networks for text feature extraction
"GloVe" ▪ "BERT" ▪ "ELMo" ▪ "GPT2" ▪ ...
NetGraph ▪ LongShortTermMemoryLayer ▪ AttentionLayer
"Tokens" ▪ "SubwordTokens" ▪ "Characters" ▪ ...
Content Extraction
TextSummarize — automatically produce different types of summarization
FindTextualAnswer — attempt to find answers to questions from text
TextContents, TextCases, TextPosition — extract semantic elements in text
Text Classification
Classify — classify strings based on training data or built-in classifiers
"Language" ▪ "Profanity" ▪ "Sentiment" ▪ ...
LanguageIdentify — identify what language a text is in
Text Clustering
FindClusters — find clusters in string data
ClusteringTree ▪ ClusteringComponents ▪ ClusterClassify
Text Analysis »
WordCounts — count of words or -grams
CharacterCounts ▪ WordFrequency ▪ WordData ▪ PartOfSpeech ▪ ...
Text Visualization
WordCloud — generate a word cloud from word frequencies or weights
Snippet — extract a snippet of text
Style, Highlighted — style text with color, font, size, background, etc.