gives an association whose keys are the distinct words identified in string, and whose values give the number of times those words appear in string.
gives counts of the distinct n-grams consisting of runs of n words in string.
Details and Options
- WordCounts[string,…] identifies words in string in the same way as TextWords.
- In WordCounts[string,n], words that are considered part of an n-gram must appear consecutively in string, not separated by nonword characters other than whitespace.
- WordCounts has the option IgnoreCase. With the setting IgnoreCase->True, letters are in effect all converted to lower case before being counted.
Examplesopen allclose all
Retrieve Miguel Cervantes's novel Don Quixote from ExampleData to test the empirical Zipf law:
Zipf's law asserts that the frequency of a word versus its rank in the frequency table follows approximately a linear relation in a log-log scale. Test this statement on the first 1,000 most frequent words: