WordFrequencyData
WordFrequencyData[word]
gives the frequency of word in typical published English text.
WordFrequencyData[{word1,word2,…}]
gives an association of frequencies of the wordi.
WordFrequencyData[word,"TimeSeries"]
gives a time series for the frequency of word in typical published English text.
WordFrequencyData[word,"TimeSeries",datespec]
gives a time series for dates specified by datespec.
WordFrequencyData[word,"prop"]
gives property prop of the word frequency.
Details and Options
- WordFrequencyData[word1word2…] gives the total frequencies of all the wordi.
- WordFrequencyData[word,"Total",datespec] gives the total frequency of word for the dates specified by datespec.
- By default, WordFrequencyData uses the Google Books English n-gram public dataset.
- Possible options include:
-
IgnoreCase False whether to ignore case in word Language "English" what language of source corpus to use - In WordFrequencyData[word,"prop"], possible properties include:
-
"Total" give total frequencies over a date range "TimeSeries" give a time series of frequencies "CaseVariants" give results for all variants of upper and lower case "PartsOfSpeechVariants" give results for all variants of parts of speech {prop1,prop2,…} give results for combinations of properties - Possible date specifications include:
-
All use all available dates for the specified source corpus DateObject[…] use DateObject year use specific year {yearmin,yearmax} use year range between yearmin and yearmax {{d1,d2,…}} use explicit dates {d1,d2,…}
Examples
open allclose allBasic Examples (4)
Scope (4)
Get the overall frequency of "atlas":
Find the frequency of multiple words at once:
WordFrequencyData accepts as input TextElement with a specific "GrammaticalUnit":
Plot the historical time series for the frequency of the word "computer" since 1900:
Generalizations & Extensions (1)
When Alternatives is used as an input, the result is the total frequency for any of the alternatives:
Alternatives may be used in combination with other properties, such as "TimeSeries":
Options (6)
IgnoreCase (1)
Returns the frequency of a word, ignoring any lower- or uppercase variants. The default value is False:
Language (5)
Find the frequency of a common Spanish word in a Spanish-language text corpus:
Spanish words might appear in the other languages, but with a much lower frequency:
A common word in French returns a high frequency value:
Popularity of the word "peace" in Spanish:
The word "Sputnik" in Russian:
Get a time series of the word "Haus" in German between 1900 and now and plot the result:
Properties & Relations (14)
"CaseVariants" (3)
A word can have many lower- and uppercase variants:
Getting the frequency of the word with IgnoreCase->True should be equivalent to getting the Total for the previous list:
Get the most popular case variation of "DOS":
When asking for multiple words, the association will contain all variants of each word:
"PartOfSpeechVariants" (4)
"TimeSeries" (2)
Get the frequency of the word "war" throughout the twentieth century:
This can be plotted directly using DateListPlot:
Compare the usage of "peace" and "war" over time:
And compare their usage in another language too:
Plot the ratio of the words "war" and "peace" for both languages:
"Total" (5)
"Total" is the default property:
The usage of DateObject objects in the date specification is allowed:
The "Total" can be computed over a specific list of years:
Infinity can be used to specify an unbound range:
Possible Issues (1)
Words that are not included within the corpus will return Missing["NotAvailable"]:
Neat Examples (11)
Popularity of the word "dog" and its translations in different languages:
The words "gold" versus "oil" over time:
Frequency of terms for telephone and television over time:
Sorting day names by popularity:
Some words have lost their old orthography:
The word "democracy" gets more frequent usage in the twentieth century:
"Apple" with initial uppercase A became popular after 1980:
The relative frequency of part of speech variants may change over time. "Tackle" as a verb and as a noun is a good example:
Regularization of irregular verbs may explain the changes in the part of speech and orthography of some words, such as "burnt" versus "burned":
Text
Wolfram Research (2016), WordFrequencyData, Wolfram Language function, https://reference.wolfram.com/language/ref/WordFrequencyData.html.
CMS
Wolfram Language. 2016. "WordFrequencyData." Wolfram Language & System Documentation Center. Wolfram Research. https://reference.wolfram.com/language/ref/WordFrequencyData.html.
APA
Wolfram Language. (2016). WordFrequencyData. Wolfram Language & System Documentation Center. Retrieved from https://reference.wolfram.com/language/ref/WordFrequencyData.html