WordCounts

WordCounts["string"]

gives an association whose keys are the distinct words identified in string, and whose values give the number of times those words appear in string.

WordCounts["string",n]

gives counts of the distinct n-grams consisting of runs of n words in string.

WordCounts[{"string1","string2",},]

gives the counts for each of the stringi.

Details and Options

  • WordCounts[string,] identifies words in string in the same way as TextWords.
  • In WordCounts[string,n], words that are considered part of an n-gram must appear consecutively in string, not separated by nonword characters other than whitespace.
  • WordCounts has the option IgnoreCase. With the setting IgnoreCase->True, letters are in effect all converted to lower case before being counted.

Examples

open allclose all

Basic Examples  (3)

Count the distinct words in a string:

Count the distinct 2-gram word sequences in a string:

Count the distinct words in each of a list of strings:

Scope  (1)

Words can include digits and hyphens but not most punctuation:

Options  (2)

IgnoreCase  (2)

The default setting IgnoreCase->False treats uppercase and lowercase characters as distinct:

IgnoreCase->True treats words that differ only in case as the same:

Count n-grams regardless of case:

Applications  (2)

Find the number of times the main characters Sherlock Holmes and John Watson are mentioned in some novels of Arthur Conan Doyle:

Visualize the results:

Retrieve Miguel Cervantes's novel Don Quixote from ExampleData to test the empirical Zipf law:

Generate the frequency table of all words in this text:

Zipf's law asserts that the frequency of a word versus its rank in the frequency table follows approximately a linear relation in a log-log scale. Test this statement on the first 1,000 most frequent words:

The result is close to . Visualize the fit together with the actual data:

Neat Examples  (1)

Find the 20 most frequently occurring words in a body of text:

Do the same for 2-word sequences:

Wolfram Research (2015), WordCounts, Wolfram Language function, https://reference.wolfram.com/language/ref/WordCounts.html (updated 2024).

Text

Wolfram Research (2015), WordCounts, Wolfram Language function, https://reference.wolfram.com/language/ref/WordCounts.html (updated 2024).

CMS

Wolfram Language. 2015. "WordCounts." Wolfram Language & System Documentation Center. Wolfram Research. Last Modified 2024. https://reference.wolfram.com/language/ref/WordCounts.html.

APA

Wolfram Language. (2015). WordCounts. Wolfram Language & System Documentation Center. Retrieved from https://reference.wolfram.com/language/ref/WordCounts.html

BibTeX

@misc{reference.wolfram_2024_wordcounts, author="Wolfram Research", title="{WordCounts}", year="2024", howpublished="\url{https://reference.wolfram.com/language/ref/WordCounts.html}", note=[Accessed: 15-October-2024 ]}

BibLaTeX

@online{reference.wolfram_2024_wordcounts, organization={Wolfram Research}, title={WordCounts}, year={2024}, url={https://reference.wolfram.com/language/ref/WordCounts.html}, note=[Accessed: 15-October-2024 ]}