Wolfram Language & System Documentation Center

TextPosition   

TextPosition[text,form]

gives a list of the starting and ending positions at which instances of form occur in text.

TextPosition[text,{form₁,form₂,…}]

gives an association of results for all the types form_i.

TextPosition[text,formspec,n]

gives the positions of the first n cases found.

Details and Options

In TextPosition[text,form], text can be a string, a file with plain text, a ContentObject expression or a list of these text objects.
TextPosition[{text₁,text₂,…},…] gives cases for each text_i.
Identification type form can be:

	"type"	any text content type (e.g. "Noun", "City")
	Entity[…,…]	a specific entity of a text content type
	form₁\|form₂\|…	form matching any of the form_i
	Containing[outer,inner]	forms of type outer containing type inner
	Verbatim["string"]	a specific string to be matched exactly
	pattern	a string pattern to be matched

Possible choices for the property prop are:

	"String"	string of the identified text (default)
	"Position"	start and end position of the string in text
	"Probability"	estimated probability that the identification is correct
	"Interpretation"	standard interpretation of the identified string
	"Snippet"	a snippet around the identified string
	"HighlightedSnippet"	a snippet with the identified string highlighted
	f	apply f to the association containing all properties
	{prop₁,prop₂,…}	a list of property specifications

The following options can be given:

AcceptanceThreshold	Automatic	minimum probability to accept identification
PerformanceGoal	Automatic	favor algorithms with specific advantages
TargetDevice	"CPU"	whether CPU or GPU computation should be used for entity detection
VerifyInterpretation	False	whether interpretability should be verified

Examples

open all close all

Basic Examples (6)

Find mentions of cities in a piece of text:

Wolfram Language code: TextPosition["NYC, Los Angeles, and Chicago are the largest cities in the United States of America in 2018.", "City"]

Find the nouns in a sentence:

Wolfram Language code: TextPosition["The quick brown fox jumps over the lazy dog.", "Noun"]

Find currency amounts:

Wolfram Language code: TextPosition["The shirt cost $50 in America, but only 5€ in Italy.", "CurrencyAmount"]

Find positions of cities, countries and dates in text:

Wolfram Language code:

TextPosition["NYC, Los Angeles, and Chicago are the largest cities in the United States of America in 2018.", {"City", "Country", "Date"}]

Wolfram Language code:

Function[text, Map[StringTake[text, #]&, TextPosition[text, {"City", "Country", "Date"}]]]["NYC, Los Angeles, and Chicago are the largest cities in the United States of America in 2018."]

Find all the locations and get their positions:

Wolfram Language code: TextPosition["NYC, Los Angeles, and Chicago are the largest cities in the USA in 2018.", "Location"]

Wolfram Language code:

Function[text, StringTake[text, TextPosition[text, "Location"]]]["NYC, Los Angeles, and Chicago are the largest cities in the USA in 2018."]

Find all references to New York City in a text:

Wolfram Language code: TextPosition["I love New York - I ❤ NYC", Entity["City", {"NewYork", "NewYork", "UnitedStates"}]]

Scope (4)

ContentObject and Files (2)

Find instances of colors in a ContentObject:

Wolfram Language code: doc = TextSearch["ExampleData/Text", "dog"][1]

Wolfram Language code: TextPosition[doc, "Color"]

Wolfram Language code: StringTake[doc["Plaintext"], TextPosition[doc, "Color"]]

Find quantities in a File:

Wolfram Language code: file = TextSearch["ExampleData/Text", "dog"][1, "Location"]

Wolfram Language code: TextPosition[file, "Color"]

Wolfram Language code: StringTake[Import[file, "Plaintext"], %]

Alternatives and Containing (2)

Use Alternatives to match multiple types:

Wolfram Language code: TextPosition["John and Mary went to the store.", "Noun" | "Verb"]

Wolfram Language code: TextPosition["John and Mary went to the store.", "Noun" | "ProperNoun" | "Verb"]

Find all sentences in a string that contain currency amounts:

Wolfram Language code:

TextPosition["I have a fairly clear idea of what I will buy at the store.  I want shoes, a computer, and a jacket.  The computer will be the most expensive, and will cost over $1000.", Containing["Sentence", "CurrencyAmount"]]

Find all sentences in a string that contain countries:

Wolfram Language code:

TextPosition["On vacation, I first went to France, then I went to Belgium.  The food was amazing in both countries.", Containing["Sentence", "Country"]]

Combine Alternatives and Containing to form highly structured queries:

Wolfram Language code:

TextPosition["I have a fairly clear idea of what I will buy at the store.  I want shoes, a computer, and a jacket.  The computer will be the most expensive, and will cost over $1000.  John will like my computer.", Containing["Sentence", "CurrencyAmount" | "ProperNoun"]]

Options (3)

AcceptanceThreshold (1)

By default, all the detected entities have an estimated probability higher than 0.5:

Wolfram Language code: TextPosition[ExampleData[{"Text", "JFKInaugural"}], {"Country", "Date", "Person"}]

Wolfram Language code:

Map[StringTake[ExampleData[{"Text", "JFKInaugural"}], #]&, TextPosition[ExampleData[{"Text", "JFKInaugural"}], {"Country", "Date", "Person"}]]

Get only the entities that are highly probable to be correct by setting a high AcceptanceThreshold:

Wolfram Language code: TextPosition[ExampleData[{"Text", "JFKInaugural"}], {"Country", "Date", "Person"}, "AcceptanceThreshold" -> 0.9]

Wolfram Language code:

Map[StringTake[ExampleData[{"Text", "JFKInaugural"}], #]&, TextPosition[ExampleData[{"Text", "JFKInaugural"}], {"Country", "Date", "Person"}, "AcceptanceThreshold" -> 0.9]]

PerformanceGoal (1)

Using PerformanceGoal->"Speed" can help to have faster detection, at the cost of lower accuracy:

Wolfram Language code: AbsoluteTiming@TextPosition["My favourite cities are New York and Foix", "City"]

Wolfram Language code: AbsoluteTiming@TextPosition["My favourite cities are New York and Foix", "City", PerformanceGoal -> "Speed"]

VerifyInterpretation (1)

By default, some entities cannot be interpreted, either because they are not correct or because they are not yet in the knowledgebase. In these cases, a string is returned instead of an interpretation:

Wolfram Language code: TextPosition["We visited Toulouse and Foix in France.", "City"]

Wolfram Language code: AssociationMap[Interpreter["City"], StringTake["We visited Toulouse and Foix in France.", {{12, 19}, {25, 28}}]]

Use VerifyInterpretation to filter out the entities that cannot be interpreted:

Wolfram Language code: TextPosition["We visited Toulouse and Foix in Midi-Pyrénées in France.", "City", VerifyInterpretation -> True]

Applications (6)

Word and Sentence Segmentation (2)

Word segmentation preserves syntactic elements such as email addresses, URLs and Twitter handles:

Wolfram Language code:

wordPositions = TextPosition["His email address is user@domain.com and Twitter handle is @username. http://www.wolfram.com is a useful resource for Wolfram Language programmers.", "Word"]

Wolfram Language code:

TextElement@StringTake["His email address is user@domain.com and Twitter handle is @username. http://www.wolfram.com is a useful resource for Wolfram Language programmers.", wordPositions]

All the non-whitespace characters are grabbed with forms "Word" and "Punctuation":

Wolfram Language code:

tokenPositions = TextPosition["Washington D.C. is the capital of the United States.  Mr. Anthony A. Williams was the mayor.", "Word" | "Punctuation"]

Wolfram Language code:

TextElement@StringTake["Washington D.C. is the capital of the United States.  Mr. Anthony A. Williams was the mayor.", tokenPositions]

Sentence segmentation intelligently ignores acronyms and other misleading boundaries:

Wolfram Language code:

text = "Washington D.C. is the capital of the United States.  Mr. Fox is the CEO of the company.  She co-founded the company with Mrs. Smith.";
sentencePositions = TextPosition[text, "Sentence"]

Wolfram Language code: TextElement@StringTake[text, sentencePositions]

Parts of Speech (2)

Return all words of a given part of speech:

Wolfram Language code: TextPosition["John and Mary went to the new store.", "Noun"]

Wolfram Language code: TextPosition["John and Mary went to the new store.", "Verb"]

Wolfram Language code: TextPosition["John and Mary went to the new store.", "Preposition"]

Make a table of word clouds from parts of speech:

Wolfram Language code: alice = ExampleData[{"Text", "AliceInWonderland"}];

Wolfram Language code: partsOfSpeech = TextPosition[alice, {"Noun", "Verb", "Adjective", "Adverb"}]

Wolfram Language code:

Grid @Partition[KeyValueMap[Labeled[WordCloud[#2, ImageSize -> 200], #1]&, Map[StringTake[alice, #]&, partsOfSpeech]], 2]

Entities and Interpretable Objects (2)

Find countries:

Wolfram Language code:

text = "On vacation, I first went to France, then I went to Belgium.";
TextPosition[text, "Country"]

Wolfram Language code: StringTake[text, TextPosition[text, "Country"]]

Return interpreted strings as Entity objects:

Wolfram Language code: AssociationMap[Interpreter["Country"], {"France", "Belgium"}]

Find currency amounts in a Wikipedia article:

Wolfram Language code: dollarstore = WikipediaData@First@WikipediaSearch["Variety Store"];

Wolfram Language code: currencies = TextPosition[dollarstore, "CurrencyAmount"]

Get currency amounts:

Wolfram Language code: Interpreter["CurrencyAmount"][StringTake[dollarstore, currencies]]

Properties & Relations (1)

TextPosition handles the same types as TextCases and TextContents, and always identify the same substrings as these functions for a given type:

Wolfram Language code: TextContents["Boston, Worcester, and Springfield are the largest cities in Massachusetts.", "City"]

Wolfram Language code: TextCases["Boston, Worcester, and Springfield are the largest cities in Massachusetts.", "City"]

Wolfram Language code: TextPosition["Boston, Worcester, and Springfield are the largest cities in Massachusetts.", "City"]

TextCases is a generalization of TextPosition:

Wolfram Language code: TextCases["Boston, Worcester, and Springfield are the largest cities in Massachusetts.", "City" -> "Position"]

Wolfram Language code:

TextPosition["Boston, Worcester, and Springfield are the largest cities in Massachusetts.", {"City", "AdministrativeDivision"}]

Wolfram Language code:

TextCases["Boston, Worcester, and Springfield are the largest cities in Massachusetts.", {"City", "AdministrativeDivision"} -> "Position"]

Top

More Learning

Tech Support

Wolfram Solutions

Wolfram Solutions For Education

Get Started

Grow Your Skills

Work with Us

Educational Programs for Adults

Educational Programs for Youth

Read

TextPosition

Details and Options

Examples

Basic Examples (6)

Scope (4)

ContentObject and Files (2)

Alternatives and Containing (2)

Options (3)

AcceptanceThreshold (1)

PerformanceGoal (1)

VerifyInterpretation (1)

Applications (6)

Word and Sentence Segmentation (2)

Parts of Speech (2)

Entities and Interpretable Objects (2)

Properties & Relations (1)

Text

CMS

APA

BibTeX

BibLaTeX

TextPosition

Details and Options

Examples

Basic Examples (6)

Scope (4)

ContentObject and Files (2)

Alternatives and Containing (2)

Options (3)

AcceptanceThreshold (1)

PerformanceGoal (1)

VerifyInterpretation (1)

Applications (6)

Word and Sentence Segmentation (2)

Parts of Speech (2)

Entities and Interpretable Objects (2)

Properties & Relations (1)

See Also

Related Guides

History

Text

CMS

APA

BibTeX

BibLaTeX

TextPosition