TextCases   

Listing of Text Content Types »

TextCases[text,form]

gives a list of all cases of text identified as being of type form that appear in text.

TextCases[text,{form₁,form₂,…}]

gives an association of results for all the types form_i.

TextCases[text,formspecprop]

gives the specified property for each result found.

TextCases[text,formspec{prop₁,prop₂,…}]

gives a list of properties for each result found.

TextCases[text,spec,n]

gives the first n cases found.

Details and Options

TextCases is used to perform several natural language processing tasks such as part-of-speech tagging or named entity recognition.
In TextCases[text,…], text can be a string, a file with plain text represented by File[…], a ContentObject expression or a list of these text objects.
TextCases[{text₁,text₂,…},…] gives cases for each text_i.
Identification type form can be:

	"type"	any text content type (e.g. "Noun", "City")
	Entity[…,…]	a specific entity of a text content type
	form₁\|form₂\|…	form matching any of the form_i
	Containing[outer,inner]	forms of type outer containing type inner
	Verbatim["string"]	a specific string to be matched exactly
	pattern	a string pattern to be matched

Possible choices for the property prop are:

	"String"	string of the identified text (default)
	"Position"	start and end position of the string in text
	"Probability"	estimated probability that the identification is correct
	"Interpretation"	standard interpretation of the identified string
	"Snippet"	a snippet around the identified string
	"HighlightedSnippet"	a snippet with the identified string highlighted
	f	apply f to the association containing all properties
	{prop₁,prop₂,…}	a list of property specifications

The following options can be given:

AcceptanceThreshold	Automatic	minimum probability to accept identification
PerformanceGoal	Automatic	favor algorithms with specific advantages
TargetDevice	"CPU"	whether CPU or GPU computation should be used for entity detection
VerifyInterpretation	False	whether interpretability should be verified

TextCases uses machine learning. Its methods, training sets and biases included therein may change and yield varied results in different versions of the Wolfram Language.
TextCases may download resources that will be stored in your local object store at $LocalBase and can be listed using LocalObjects[] and removed using ResourceRemove.

Examples

open allclose all

Basic Examples (6)

Find the cities in a text:

Find the nouns in a sentence:

Find currency amounts and get interpretations:

Find cities, countries and dates in text:

Obtain probabilities and interpretations:

Find all the locations and get their positions:

Find all references to New York City in a text:

Scope (5)

ContentObject and Files (2)

Find instances of colors in a ContentObject:

Find quantities in a File:

Alternatives and Containing (2)

Use Alternatives to match multiple types:

Find all sentences in a string that contain currency amounts:

Find all sentences in a string that contain countries:

Combine Alternatives and Containing to form highly structured queries:

Return Types (1)

Specify multiple return types:

Show all the available properties in an Association:

Create a dataset with the properties of several types of entities:

Get the geodetic positions of the locations occurring in a text:

Options (3)

AcceptanceThreshold (1)

By default, all the detected entities have an estimated probability higher than 0.5:

Get only the entities that are highly probable to be correct by setting a high AcceptanceThreshold:

PerformanceGoal (1)

Using PerformanceGoal->"Speed" can help to have faster detection, at the cost of lower accuracy:

VerifyInterpretation (1)

By default, some entities cannot be interpreted, either because they are not correct or because they are not yet in the knowledgebase. In these cases, a string is returned instead of an interpretation:

Use VerifyInterpretation to filter out the entities that cannot be interpreted:

Applications (6)

Word and Sentence Segmentation (2)

Word segmentation preserves syntactic elements such as email addresses, URLs, and Twitter handles:

All the non-whitespace characters are grabbed with forms "Word" and "Punctuation":

Sentence segmentation intelligently ignores acronyms and other misleading boundaries:

Parts of Speech (2)

Return all words of a given part of speech:

Make a table of word clouds from parts of speech:

Entities and Interpretable Objects (2)

Find countries:

Return interpreted strings as Entity objects:

Find currency amounts in a Wikipedia article:

Convert to another currency:

Properties & Relations (4)

TextCases handles the same types as TextPosition and TextContents and always identifies the same substrings as these functions for a given type:

TextCases is a generalization of TextPosition:

A dataset that is similar to the output of TextContents can be obtained using TextCases:

TextSentences is equivalent to TextCases[…,"Sentence"]:

TextStructure splits texts into the same sentences:

TextWords is equivalent to TextCases[…,"Word"]:

TextStructure splits texts into the same words and punctuation marks as TextCases[…,"Word"|"Punctuation"]:

Neat Examples (2)

Many entities (cities, countries, etc.) can be located on a map. TextCases allows you to find all these entities at once.

Take the Wikipedia article about rice:

Find all entities that can be pinpointed to a location:

Visualize the locations identified and their frequency in the text:

Show the number of mentions of each continent and country:

Take the Wikipedia article about world wars:

Find all sentences containing dates and extract their corresponding DateObject interpretations:

Display these dates on a timeline:

Display the extracted sentences on a timeline:

Top

More Learning

Tech Support

Wolfram Solutions

Wolfram Solutions For Education

Get Started

Grow Your Skills

Work with Us

Educational Programs for Adults

Educational Programs for Youth

Read

TextCases

Details and Options

Examples

Basic Examples (6)

Scope (5)

ContentObject and Files (2)

Alternatives and Containing (2)

Return Types (1)

Options (3)

AcceptanceThreshold (1)

PerformanceGoal (1)

VerifyInterpretation (1)

Applications (6)

Word and Sentence Segmentation (2)

Parts of Speech (2)

Entities and Interpretable Objects (2)

Properties & Relations (4)

Neat Examples (2)

Text

CMS

APA

BibTeX

BibLaTeX

TextCases

Details and Options

Examples

Basic Examples (6)

Scope (5)

ContentObject and Files (2)

Alternatives and Containing (2)

Return Types (1)

Options (3)

AcceptanceThreshold (1)

PerformanceGoal (1)

VerifyInterpretation (1)

Applications (6)

Word and Sentence Segmentation (2)

Parts of Speech (2)

Entities and Interpretable Objects (2)

Properties & Relations (4)

Neat Examples (2)

See Also

Related Guides

Related Links

History

Text

CMS

APA

BibTeX

BibLaTeX

TextCases