Wolfram Language & System 11.0 (2016)|Legacy Documentation

This is documentation for an earlier version of the Wolfram Language.View current documentation (Version 11.2)

Text Analysis

The Wolfram Language includes increasingly sophisticated tools for analyzing and visualizing text, both structurally and semantically.


Sources of Text

Import  ▪  ExampleData  ▪  WikipediaData

WordCount total number of words in a text

WordCounts count of words or -grams

WordFrequency frequency of words or -grams

LetterCounts  ▪  CharacterCounts

Sort sort into alphabetical order

KeySort  ▪  TakeLargest

Classify classify strings based on training data or built-in classifiers

Nearest find the closest-matching string from a list

FindClusters find clusters in string data

ClusteringTree  ▪  ClusteringComponents  ▪  ClusterClassify

Dendrogram hierarchical plot of similarities

EditDistance edit or Levenshtein distance

LanguageIdentify identify what language a text is in

DictionaryLookup  ▪  WordData  ▪  WordStem  ▪  PartOfSpeech  ▪  Transliterate

WordFrequencyData data on word frequencies in typical current and historical text

SemanticImport import text with semantic understanding

Text Visualization

Style style text with color, font, or size

WordCloud generate a word cloud from word frequencies or weights

Snippet extract a snippet of text

StringPartition partition a string into equal-size blocks

InsertLinebreaks break a string onto multiple lines

Text Parsing

TextStructure parse text into its grammatical structure

Text Comparison »

SequenceAlignment  ▪  LongestCommonSubsequence  ▪  DistanceMatrix  ▪  ...

Content Extraction

TextCases extract symbolically specified elements

Containing  ▪  Alternatives  ▪  Entity

TextPosition positions of symbolically specified elements

Text Normalization »

TextWords  ▪  TextSentences  ▪  DeleteStopwords  ▪  RemoveDiacritics  ▪  ...