Text Analysis

The Wolfram Language includes increasingly sophisticated tools for analyzing and visualizing text, both structurally and semantically.

Sources of Text

Import  ▪  ExampleData  ▪  WikipediaData

WordCount total number of words in a text

WordCounts count of words or -grams

WordFrequency frequency of words or -grams

LetterCounts  ▪  CharacterCounts

Sort sort into alphabetical order

KeySort  ▪  TakeLargest

Classify classify strings based on training data or built-in classifiers

Nearest find the closest-matching string from a list

FindClusters find clusters in string data

ClusteringTree  ▪  ClusteringComponents  ▪  ClusterClassify

Dendrogram hierarchical plot of similarities

EditDistance edit or Levenshtein distance

LanguageIdentify identify what language a text is in

DictionaryLookup  ▪  WordData  ▪  WordStem  ▪  PartOfSpeech  ▪  Transliterate

WordFrequencyData data on word frequencies in typical current and historical text

SemanticImport import text with semantic understanding

LLM-Based Analysis »

LLMFunction apply LLM-based operations specified by natural language to text

LLMResourceFunction apply LLM-based operations from the Wolfram Prompt Repository

LLMExampleFunction  ▪  LLMPrompt  ▪  LLMSynthesize  ▪  LLMTool

Text Visualization

Style style text with color, font, or size

WordCloud generate a word cloud from word frequencies or weights

Snippet extract a snippet of text

StringPartition partition a string into equal-size blocks

InsertLinebreaks break a string onto multiple lines

Text Parsing

TextStructure parse text into its grammatical structure

Text Comparison »

SequenceAlignment  ▪  LongestCommonSubsequence  ▪  DistanceMatrix  ▪  ...

Content Analysis

TextContents generate a dataset of identified elements in text

Content Extraction

TextCases extract symbolically specified elements

Containing  ▪  Alternatives  ▪  Entity

TextPosition positions of symbolically specified elements

FindTextualAnswer attempt to find answers to questions from text

Text Normalization »

TextWords  ▪  TextSentences  ▪  DeleteStopwords  ▪  RemoveDiacritics  ▪  ...