Wolfram Language & System Documentation Center

TextCases   

TextCases[text,form]

gives a list of all cases of text identified as being of type form that appear in text.

TextCases[text,{form₁,form₂,…}]

gives an association of results for all the types form_i.

TextCases[text,formspecprop]

gives the specified property for each result found.

TextCases[text,formspec{prop₁,prop₂,…}]

gives a list of properties for each result found.

TextCases[text,spec,n]

gives the first n cases found.

TextCases   

Listing of Text Content Types »

TextCases[text,form]

gives a list of all cases of text identified as being of type form that appear in text.

TextCases[text,{form₁,form₂,…}]

gives an association of results for all the types form_i.

TextCases[text,formspecprop]

gives the specified property for each result found.

TextCases[text,formspec{prop₁,prop₂,…}]

gives a list of properties for each result found.

TextCases[text,spec,n]

gives the first n cases found.

Details and Options

TextCases is used to perform several natural language processing tasks such as part-of-speech tagging or named entity recognition.
In TextCases[text,…], text can be a string, a file with plain text represented by File[…], a ContentObject expression or a list of these text objects.
TextCases[{text₁,text₂,…},…] gives cases for each text_i.
Identification type form can be:

	"type"	any text content type (e.g. "Noun", "City")
	Entity[…,…]	a specific entity of a text content type
	form₁\|form₂\|…	form matching any of the form_i
	Containing[outer,inner]	forms of type outer containing type inner
	Verbatim["string"]	a specific string to be matched exactly
	pattern	a string pattern to be matched

Possible choices for the property prop are:

	"String"	string of the identified text (default)
	"Position"	start and end position of the string in text
	"Probability"	estimated probability that the identification is correct
	"Interpretation"	standard interpretation of the identified string
	"Snippet"	a snippet around the identified string
	"HighlightedSnippet"	a snippet with the identified string highlighted
	f	apply f to the association containing all properties
	{prop₁,prop₂,…}	a list of property specifications

The following options can be given:

AcceptanceThreshold	Automatic	minimum probability to accept identification
PerformanceGoal	Automatic	favor algorithms with specific advantages
TargetDevice	"CPU"	whether CPU or GPU computation should be used for entity detection
VerifyInterpretation	False	whether interpretability should be verified

TextCases uses machine learning. Its methods, training sets and biases included therein may change and yield varied results in different versions of the Wolfram Language.
TextCases may download resources that will be stored in your local object store at $LocalBase and can be listed using LocalObjects[] and removed using ResourceRemove.

Examples

open all close all

Basic Examples (6)

Find the cities in a text:

Wolfram Language code: TextCases["NYC, Los Angeles, and Chicago are the largest cities in the United States of America in 2018.", "City"]

Find the nouns in a sentence:

Wolfram Language code: TextCases["The quick brown fox jumps over the lazy dog.", "Noun"]

Find currency amounts and get interpretations:

Wolfram Language code: TextCases["The shirt cost $50 in America, but only 5€ in Italy.", "CurrencyAmount" -> "Interpretation"]

Find cities, countries and dates in text:

Wolfram Language code:

TextCases["NYC, Los Angeles, and Chicago are the largest cities in the United States of America in 2018.", {"City", "Country", "Date"}]

Obtain probabilities and interpretations:

Wolfram Language code:

TextCases["NYC, Los Angeles, and Chicago are the largest cities in the USA in 2018.", {"City", "Country", "Date"} -> {"String", "Interpretation", "Probability"}]

Find all the locations and get their positions:

Wolfram Language code:

TextCases["NYC, Los Angeles, and Chicago are the largest cities in the USA in 2018.", "Location" -> (#String -> #Interpretation&)]

Find all references to New York City in a text:

Wolfram Language code: TextCases["I love New York - I ❤ NYC", Entity["City", {"NewYork", "NewYork", "UnitedStates"}]]

Scope (5)

ContentObject and Files (2)

Find instances of colors in a ContentObject:

Wolfram Language code: doc = TextSearch["ExampleData/Text", "dog"][1]

Wolfram Language code: TextCases[doc, "Color"]

Find quantities in a File:

Wolfram Language code: file = TextSearch["ExampleData/Text", "dog"][1, "Location"]

Wolfram Language code: TextCases[file, "Color"]

Alternatives and Containing (2)

Use Alternatives to match multiple types:

Wolfram Language code: TextCases["John and Mary went to the store.", "Noun" | "Verb"]

Wolfram Language code: TextCases["John and Mary went to the store.", "Noun" | "ProperNoun" | "Verb"]

Find all sentences in a string that contain currency amounts:

Wolfram Language code:

TextCases["I have a fairly clear idea of what I will buy at the store.  I want shoes, a computer, and a jacket.  The computer will be the most expensive, and will cost over $1000.", Containing["Sentence", "CurrencyAmount"]]

Find all sentences in a string that contain countries:

Wolfram Language code:

TextCases["On vacation, I first went to France, then I went to Belgium.  The food was amazing in both countries.", Containing["Sentence", "Country"]]

Combine Alternatives and Containing to form highly structured queries:

Wolfram Language code:

TextCases["I have a fairly clear idea of what I will buy at the store.  I want shoes, a computer, and a jacket.  The computer will be the most expensive, and will cost over $1000.  John will like my computer.", Containing["Sentence", "CurrencyAmount" | "ProperNoun"]]

Return Types (1)

Specify multiple return types:

Wolfram Language code:

TextCases["He spent $50 in Boston on the 11th July. For a shirt that costs 5€ in Rome!", "CurrencyAmount"  -> {"String", "Position", "Interpretation"}]

Wolfram Language code:

TextCases["He spent $50 in Boston on the 11th July. For a shirt that costs 5€ in Rome!", {"CurrencyAmount", "City", "Date"}  -> {"String", "Position", "Interpretation"}]

Show all the available properties in an Association:

Wolfram Language code: TextCases["He spent $50 in Boston on the 11th July. For a shirt that costs 5€ in Rome!", "Date" -> Identity]

Create a dataset with the properties of several types of entities:

Wolfram Language code:

TextCases["He spent $50 in Boston on the 11th July. For a shirt that costs 5€ in Rome!", {"CurrencyAmount", "City", "Date"} -> Identity]//Dataset

Get the geodetic positions of the locations occurring in a text:

Wolfram Language code: TextCases["She took a plane from Toulouse in France to Montreal in Canada", "Location" -> (#String -> #Interpretation&)]

Options (3)

AcceptanceThreshold (1)

By default, all the detected entities have an estimated probability higher than 0.5:

Wolfram Language code: TextCases[ExampleData[{"Text", "JFKInaugural"}], {"Country", "Date", "Person"}]

Get only the entities that are highly probable to be correct by setting a high AcceptanceThreshold:

Wolfram Language code: TextCases[ExampleData[{"Text", "JFKInaugural"}], {"Country", "Date", "Person"}, "AcceptanceThreshold" -> 0.9]

PerformanceGoal (1)

Using PerformanceGoal->"Speed" can help to have faster detection, at the cost of lower accuracy:

Wolfram Language code: AbsoluteTiming@TextCases["My favourite cities are New York and Foix", "City"]

Wolfram Language code: AbsoluteTiming@TextCases["My favourite cities are New York and Foix", "City", PerformanceGoal -> "Speed"]

VerifyInterpretation (1)

By default, some entities cannot be interpreted, either because they are not correct or because they are not yet in the knowledgebase. In these cases, a string is returned instead of an interpretation:

Wolfram Language code: TextCases["We visited Toulouse and Foix in France.", {"City", "Country"} -> "Interpretation"]

Use VerifyInterpretation to filter out the entities that cannot be interpreted:

Wolfram Language code:

TextCases["We visited Toulouse and Foix in Midi-Pyrénées in France.", {"City", "Country"} -> "Interpretation", VerifyInterpretation -> True]

Applications (6)

Word and Sentence Segmentation (2)

Word segmentation preserves syntactic elements such as email addresses, URLs, and Twitter handles:

Wolfram Language code: TextCases["His email address is user@domain.com and Twitter handle is @username.", "Word"]

Wolfram Language code: TextCases["http://www.wolfram.com is a useful resource for Wolfram Language programmers.", "Word"]//TextElement

All the non-whitespace characters are grabbed with forms "Word" and "Punctuation":

Wolfram Language code:

TextCases["Washington D.C. is the capital of the United States.  Mr. Anthony A. Williams was the mayor.", "Word" | "Punctuation"]//TextElement

Sentence segmentation intelligently ignores acronyms and other misleading boundaries:

Wolfram Language code:

TextCases["Washington D.C. is the capital of the United States.  Mr. Fox is the CEO of the company.  She co-founded the company with Mrs. Smith.", "Sentence"]//TextElement

Parts of Speech (2)

Return all words of a given part of speech:

Wolfram Language code: TextCases["John and Mary went to the new store.", "Noun"]

Wolfram Language code: TextCases["John and Mary went to the new store.", "Verb"]

Wolfram Language code: TextCases["John and Mary went to the new store.", "Preposition"]

Make a table of word clouds from parts of speech:

Wolfram Language code: alice = ExampleData[{"Text", "AliceInWonderland"}];

Wolfram Language code: partsOfSpeech = TextCases[alice, {"Noun", "Verb", "Adjective", "Adverb"}];

Wolfram Language code: Grid @Partition[KeyValueMap[Labeled[WordCloud[#2, ImageSize -> 200], #1]&, partsOfSpeech], 2]

Entities and Interpretable Objects (2)

Find countries:

Wolfram Language code: TextCases["On vacation, I first went to France, then I went to Belgium.", "Country"]

Return interpreted strings as Entity objects:

Wolfram Language code: TextCases["On vacation, I first went to France, then I went to Belgium.", "Country" -> "Interpretation"]

Find currency amounts in a Wikipedia article:

Wolfram Language code: dollarstore = WikipediaData@First@WikipediaSearch["Variety Store"];

Wolfram Language code: currencies = TextCases[dollarstore, "CurrencyAmount" -> "Interpretation"]

Convert to another currency:

Wolfram Language code: CurrencyConvert[currencies, "Yen"]

Properties & Relations (4)

TextCases handles the same types as TextPosition and TextContents and always identifies the same substrings as these functions for a given type:

Wolfram Language code: TextContents["Boston, Worcester, and Springfield are the largest cities in Massachusetts.", "City"]

Wolfram Language code: TextCases["Boston, Worcester, and Springfield are the largest cities in Massachusetts.", "City"]

Wolfram Language code: TextPosition["Boston, Worcester, and Springfield are the largest cities in Massachusetts.", "City"]

TextCases is a generalization of TextPosition:

Wolfram Language code: TextCases["Boston, Worcester, and Springfield are the largest cities in Massachusetts.", "City" -> "Position"]

Wolfram Language code:

TextPosition["Boston, Worcester, and Springfield are the largest cities in Massachusetts.", {"City", "AdministrativeDivision"}]

Wolfram Language code:

TextCases["Boston, Worcester, and Springfield are the largest cities in Massachusetts.", {"City", "AdministrativeDivision"} -> "Position"]

A dataset that is similar to the output of TextContents can be obtained using TextCases:

Wolfram Language code: Dataset@TextCases["Boston, Worcester, and Springfield are the largest cities in Massachusetts.", "City" -> Identity]

Wolfram Language code:

Dataset@TextCases["Boston, Worcester, and Springfield are the largest cities in Massachusetts.", "City" -> Function[<|"City" -> Last[#Interpretation][[1]], "State" -> Last[#Interpretation][[2]], "Country" -> Last[#Interpretation][[3]], "Position" -> #Position, "HighlightedSnippet" -> #HighlightedSnippet|>]]

TextSentences is equivalent to TextCases[…,"Sentence"]:

Wolfram Language code: TextCases["Hi Michael J. Jordan. I am Michael I. Jordan.", "Sentence"]//TextElement

Wolfram Language code: TextSentences["Hi Michael J. Jordan. I am Michael I. Jordan."]//TextElement

TextStructure splits texts into the same sentences:

Wolfram Language code: TextStructure["Hi Michael J. Jordan. I am Michael I. Jordan.", "PartsOfSpeech"]

TextWords is equivalent to TextCases[…,"Word"]:

Wolfram Language code: TextCases["As a matter-of-fact, my mother-in-law is in N.Y.C.", "Word"]//TextElement

Wolfram Language code: TextWords["As a matter-of-fact, my mother-in-law is in N.Y.C."]//TextElement

TextStructure splits texts into the same words and punctuation marks as TextCases[…,"Word"|"Punctuation"]:

Wolfram Language code: TextCases["As a matter-of-fact, my mother-in-law is in N.Y.C.", "Word" | "Punctuation"]//TextElement

Wolfram Language code: TextStructure["As a matter-of-fact, my mother-in-law is in New York.", "PartsOfSpeech"]

Neat Examples (2)

Many entities (cities, countries, etc.) can be located on a map. TextCases allows you to find all these entities at once.

Take the Wikipedia article about rice:

Wolfram Language code: text = WikipediaData["Rice"];

Wolfram Language code: Snippet[text, 4]

Find all entities that can be pinpointed to a location:

Wolfram Language code: locations = TextCases[text, "LocationEntity" -> "Interpretation", VerifyInterpretation -> True];

Wolfram Language code: RandomSample[locations, 10]

Visualize the locations identified and their frequency in the text:

Wolfram Language code: GeoBubbleChart[Counts[locations]]

Show the number of mentions of each continent and country:

Wolfram Language code: ReverseSort@Counts[Cases[locations, Entity["GeographicRegion", _]]]

Wolfram Language code: ReverseSort@Counts[Cases[locations, Entity["Country", _]]]

Take the Wikipedia article about world wars:

Wolfram Language code: text = WikipediaData["World War"];

Wolfram Language code: Snippet[text, 4]

Find all sentences containing dates and extract their corresponding DateObject interpretations:

Wolfram Language code: sentences = TextCases[text, "Date" -> {"HighlightedSnippet", "Interpretation"}, VerifyInterpretation -> True];

Wolfram Language code: RandomSample[sentences, 3]

Display these dates on a timeline:

Wolfram Language code: TimelinePlot[sentences[[All, 2]]]

Display the extracted sentences on a timeline:

Wolfram Language code:

TimelinePlot[Association[Rule@@@sentences], Sequence[PlotRange -> {"1900", "1950"}, PlotLayout -> "Vertical", Background -> LightBlue]]

Top

More Learning

Tech Support

Wolfram Solutions

Wolfram Solutions For Education

Get Started

Grow Your Skills

Work with Us

Educational Programs for Adults

Educational Programs for Youth

Read

TextCases

Details and Options

Examples

Basic Examples (6)

Scope (5)

ContentObject and Files (2)

Alternatives and Containing (2)

Return Types (1)

Options (3)

AcceptanceThreshold (1)

PerformanceGoal (1)

VerifyInterpretation (1)

Applications (6)

Word and Sentence Segmentation (2)

Parts of Speech (2)

Entities and Interpretable Objects (2)

Properties & Relations (4)

Neat Examples (2)

Text

CMS

APA

BibTeX

BibLaTeX

TextCases

Details and Options

Examples

Basic Examples (6)

Scope (5)

ContentObject and Files (2)

Alternatives and Containing (2)

Return Types (1)

Options (3)

AcceptanceThreshold (1)

PerformanceGoal (1)

VerifyInterpretation (1)

Applications (6)

Word and Sentence Segmentation (2)

Parts of Speech (2)

Entities and Interpretable Objects (2)

Properties & Relations (4)

Neat Examples (2)

See Also

Related Guides

Related Links

History

Text

CMS

APA

BibTeX

BibLaTeX

TextCases