Natural Language Processing
Natural language processing deals with understanding text and spoken words as a human would. It is a fundamental component of many human/machine interactions (vocal assistants, dictation software, voice-operated systems, ...), text processing, analysis (suggestions, keyword spotting, translation, ...) and much more. The Wolfram Language natural language processing functionality is a combination of rule-based and machine learning language models, including LLMs. It builds on top of advanced text mining and string manipulation capabilities and is integrated with a large visualization suite and extensive built-in linguistic data.
Text Acquisition
Import — import text from files or the web
"Text", "PDF", "HTML", "CSV", … — pick out plaintext, table data, etc.
TextRecognize ▪ ExampleData ▪ WikipediaData
LLMSynthesize — generate text from a prompt using an LLM
Text Mining
TextSearch — search an index or directory, returning a list of documents
Find, FindList — search files for records containing particular strings
StringTake ▪ StringReplace ▪ StringCases ▪ RegularExpression ▪ ...
Text Normalization »
RemoveDiacritics — remove diacritics such as accents, umlauts, etc.
CharacterNormalize — reduce or decompose characters to normal forms (e.g. ¼ to 1⁄4)
TextTranslation ▪ Transliterate ▪ DeleteStopwords ▪ WordStem ▪ ToLowerCase ▪ ...
Tokenization
StringSplit — split a string at spaces or other delimiters
StringCases — find cases of string patterns
TextCases ▪ TextSentences ▪ TextWords ▪ TextStructure
LLM-Based Operations »
LLMResourceFunction — apply operations from the Wolfram Prompt Repository
LLMFunction — apply operations specified by natural language descriptions
LLMExampleFunction — apply operations based on examples
LLMPrompt ▪ LLMTool ▪ ChatEvaluate ▪ ...
Feature Extraction
FeatureExtraction — extract numerical features from text
NetModel — pre-trained networks for text feature extraction
"GloVe" ▪ "BERT" ▪ "ELMo" ▪ "GPT2" ▪ ...
Content Extraction
FindTextualAnswer — attempt to find answers to questions from text
TextContents, TextCases, TextPosition — extract semantic elements in text
Text Classification
Classify — classify strings based on training data or built-in classifiers
"Language" ▪ "Profanity" ▪ "Sentiment" ▪ ...
LanguageIdentify — identify what language a text is in
Text Clustering
FindClusters — find clusters in string data
ClusteringTree ▪ ClusteringComponents ▪ ClusterClassify
Text Analysis »
WordCounts — count of words or -grams
CharacterCounts ▪ WordFrequency ▪ WordData ▪ PartOfSpeech ▪ ...
Text Visualization
WordCloud — generate a word cloud from word frequencies or weights
Snippet — extract a snippet of text
Style, Highlighted — style text with color, font, size, background, etc.