WordFrequencyData

WordFrequencyData[word]

gives the frequency of word in typical published English text.

WordFrequencyData[{word₁,word₂,…}]

gives an association of frequencies of the word_i.

WordFrequencyData[word,"TimeSeries"]

gives a time series for the frequency of word in typical published English text.

WordFrequencyData[word,"TimeSeries",datespec]

gives a time series for dates specified by datespec.

WordFrequencyData[word,"prop"]

gives property prop of the word frequency.

Details and Options

WordFrequencyData[word₁|word₂|…] gives the total frequencies of all the word_i.
WordFrequencyData[word,"Total",datespec] gives the total frequency of word for the dates specified by datespec.
By default, WordFrequencyData uses the Google Books English n-gram public dataset.
Possible options include:
IgnoreCase False whether to ignore case in word

Language "English" what language of source corpus to use
In WordFrequencyData[word,"prop"], possible properties include:

	"Total"	give total frequencies over a date range
	"TimeSeries"	give a time series of frequencies
	"CaseVariants"	give results for all variants of upper and lower case
	"PartsOfSpeechVariants"	give results for all variants of parts of speech
	{prop₁,prop₂,…}	give results for combinations of properties

Possible date specifications include:

	All	use all available dates for the specified source corpus
	DateObject[…]	use DateObject
	year	use specific year
	{year_min,year_max}	use year range between year_min and year_max
	{{d₁,d₂,…}}	use explicit dates {d₁,d₂,…}

Examples

open allclose all

Basic Examples (4)

Get the frequency of the word "dog" in typical English:

Get the typical frequencies of several words:

Compute the ratio between the words "war" and "peace" in published text:

Plot the historical time series for the frequency of the word "economy":

Scope (4)

Get the overall frequency of "atlas":

Find the frequency of multiple words at once:

WordFrequencyData accepts as input TextElement with a specific "GrammaticalUnit":

Plot the historical time series for the frequency of the word "computer" since 1900:

Generalizations & Extensions (1)

When Alternatives is used as an input, the result is the total frequency for any of the alternatives:

Alternatives may be used in combination with other properties, such as "TimeSeries":

Options (6)

IgnoreCase (1)

Returns the frequency of a word, ignoring any lower- or uppercase variants. The default value is False:

This value is usually greater than the default:

Language (5)

Find the frequency of a common Spanish word in a Spanish-language text corpus:

Spanish words might appear in the other languages, but with a much lower frequency:

A common word in French returns a high frequency value:

Popularity of the word "peace" in Spanish:

The word "Sputnik" in Russian:

Get a time series of the word "Haus" in German between 1900 and now and plot the result:

Properties & Relations (14)

"CaseVariants" (3)

A word can have many lower- and uppercase variants:

Getting the frequency of the word with IgnoreCase->True should be equivalent to getting the Total for the previous list:

Get the most popular case variation of "DOS":

When asking for multiple words, the association will contain all variants of each word:

"PartOfSpeechVariants" (4)

Calculate the frequency of a word in an specific year for all part of speech variants:

Show different usages of the word "nuke" in 1944:

Some words may return many part of speech variants:

Combining this argument with "CaseVariants":

Combining with "CaseVariants" and "TimeSeries":

"TimeSeries" (2)

Get the frequency of the word "war" throughout the twentieth century:

This can be plotted directly using DateListPlot:

Compare the usage of "peace" and "war" over time:

And compare their usage in another language too:

Plot the ratio of the words "war" and "peace" for both languages:

"Total" (5)

"Total" is the default property:

For a simple date range:

The usage of DateObject objects in the date specification is allowed:

The "Total" can be computed over a specific list of years:

Infinity can be used to specify an unbound range:

Possible Issues (1)

Words that are not included within the corpus will return Missing["NotAvailable"]:

Neat Examples (11)

Popularity of the word "dog" and its translations in different languages:

The words "gold" versus "oil" over time:

Frequency of terms for telephone and television over time:

Joining synonyms:

Common diseases:

Sorting day names by popularity:

Some words have lost their old orthography:

The word "democracy" gets more frequent usage in the twentieth century:

"Apple" with initial uppercase A became popular after 1980:

The relative frequency of part of speech variants may change over time. "Tackle" as a verb and as a noun is a good example:

Regularization of irregular verbs may explain the changes in the part of speech and orthography of some words, such as "burnt" versus "burned":

Evolution of "ustedes" versus "vosotros" in Spanish:

Top

More Learning

Tech Support

Wolfram Solutions

Wolfram Solutions For Education

Get Started

Grow Your Skills

Work with Us

Educational Programs for Adults

Educational Programs for Youth

Read

WordFrequencyData

Details and Options

Examples

Basic Examples (4)

Scope (4)

Generalizations & Extensions (1)

Options (6)

IgnoreCase (1)

Language (5)

Properties & Relations (14)

"CaseVariants" (3)

"PartOfSpeechVariants" (4)

"TimeSeries" (2)

"Total" (5)

Possible Issues (1)

Neat Examples (11)

Text

CMS

APA

BibTeX

BibLaTeX

	IgnoreCase	False	whether to ignore case in word
	Language	"English"	what language of source corpus to use

WordFrequencyData

Details and Options

Examples

Basic Examples (4)

Scope (4)

Generalizations & Extensions (1)

Options (6)

IgnoreCase (1)

Language (5)

Properties & Relations (14)

"CaseVariants" (3)

"PartOfSpeechVariants" (4)

"TimeSeries" (2)

"Total" (5)

Possible Issues (1)

Neat Examples (11)

See Also

Related Guides

Related Workflows

History

Text

CMS

APA

BibTeX

BibLaTeX