|
SOLUTIONS
|
Processing Textual Data
Mathematica has uniquely flexible capabilities for processing large volumes of textual data. Most often data represented as a string is converted to lists or other constructs, which can then be manipulated using Mathematica's powerful symbolic language constructs.
Learning ResourcesLearning Resources
Related Web Resources Community |
Featured ExamplesFeatured Examples |
-
Analyze Words in a Block of Text
-
Cluster Similar Words
-
Find Successive Nearest Words in Text
-
Fit Word Length Data to Distributions
-
Model Word Lengths by Binomial Distributions
-
Search Files for Text
-
Use Character Codes to Extract Special Characters from Text
-
Word Length Distribution in Various Languages
ReferenceReference
Import — import data from files or the web
"Text", "PDF", "TeX", "HTML" — pick out plain text, table data, etc.
FindList — search files for records containing particular strings
StringSplit — split a string into words, sentences, etc.
StringCount — count occurrences of words etc.
StringCases — find instances of a string pattern
StringExpression — match symbolic string patterns
Sort — sort into alphabetical order
Tally — tally numbers of identical strings
Nearest — find the closest-matching string from a list
FindClusters — find clusters in string data
EditDistance — edit or Levenshtein distance
SequenceAlignment — find matching sequences in strings
Hash — find a hash code using a variety of schemes
DictionaryLookup — look up words in English and other dictionaries
WordData — find semantic, grammatical, morphological, etc. properties of words
