---
title: "TextPosition"
language: "en"
type: "Symbol"
summary: "TextPosition[text, form] gives a list of the starting and ending positions at which instances of form occur in text. TextPosition[text, {form1, form2, ...}] gives an association of results for all the types formi. TextPosition[text, formspec, n] gives the positions of the first n cases found."
keywords: 
- text position
- find entity
- find entities
- find entity positions
- named entity recognition
- sentence boundary detection
- sentence splitter
- interpreter
- semantic import
- semantic search
canonical_url: "https://reference.wolfram.com/language/ref/TextPosition.html"
source: "Wolfram Language Documentation"
related_guides: 
  - 
    title: "Text Content Types"
    link: "https://reference.wolfram.com/language/guide/TextContentTypes.en.md"
  - 
    title: "Text Analysis"
    link: "https://reference.wolfram.com/language/guide/TextAnalysis.en.md"
  - 
    title: "Natural Language Processing"
    link: "https://reference.wolfram.com/language/guide/NaturalLanguageProcessing.en.md"
related_functions: 
  - 
    title: "TextCases"
    link: "https://reference.wolfram.com/language/ref/TextCases.en.md"
  - 
    title: "TextContents"
    link: "https://reference.wolfram.com/language/ref/TextContents.en.md"
  - 
    title: "StringPosition"
    link: "https://reference.wolfram.com/language/ref/StringPosition.en.md"
  - 
    title: "Containing"
    link: "https://reference.wolfram.com/language/ref/Containing.en.md"
---
[EXPERIMENTAL]

# TextPosition Listing of Text Content Types »

TextPosition[text, form] gives a list of the starting and ending positions at which instances of form occur in text.

TextPosition[text, {form1, form2, …}] gives an association of results for all the types formi.

TextPosition[text, formspec, n] gives the positions of the first n cases found.

## Details and Options

* In ``TextPosition[text, form]``, ``text`` can be a string, a file with plain text, a ``ContentObject`` expression or a list of these text objects.

* ``TextPosition[{text1, text2, …}, …]`` gives cases for each ``texti``.

* Identification type ``form`` can be:

|                          |                                             |
| ------------------------ | ------------------------------------------- |
| "type"                   | any text content type (e.g. "Noun", "City") |
| Entity[…, …]             | a specific entity of a text content type    |
| form1 \| form2 \| …      | form matching any of the formi              |
| Containing[outer, inner] | forms of type outer containing type inner   |
| Verbatim["string"]       | a specific string to be matched exactly     |
| pattern                  | a string pattern to be matched              |

* Possible choices for the property ``prop`` are:

|                      |                                                          |
| -------------------- | -------------------------------------------------------- |
| "String"             | string of the identified text (default)                  |
| "Position"           | start and end position of the string in text             |
| "Probability"        | estimated probability that the identification is correct |
| "Interpretation"     | standard interpretation of the identified string         |
| "Snippet"            | a snippet around the identified string                   |
| "HighlightedSnippet" | a snippet with the identified string highlighted         |
| f                    | apply f to the association containing all properties     |
| {prop1, prop2, …}    | a list of property specifications                        |

* The following options can be given:

|                       |           |                                                                    |
| --------------------- | --------- | ------------------------------------------------------------------ |
| AcceptanceThreshold   | Automatic | minimum probability to accept identification                       |
| PerformanceGoal       | Automatic | favor algorithms with specific advantages                          |
| TargetDevice          | "CPU"     | whether CPU or GPU computation should be used for entity detection |
| VerifyInterpretation  | False     | whether interpretability should be verified                        |

---

## Examples (20)

### Basic Examples (6)

```wl
In[1]:= TextPosition["NYC, Los Angeles, and Chicago are the largest cities in the United States of America in 2018.", "City"]

Out[1]= {{1, 3}, {6, 16}, {23, 29}}
```

---

Find the nouns in a sentence:

```wl
In[1]:= TextPosition["The quick brown fox jumps over the lazy dog.", "Noun"]

Out[1]= {{17, 19}, {41, 43}}
```

---

Find currency amounts:

```wl
In[1]:= TextPosition["The shirt cost $50 in America, but only 5€ in Italy.", "CurrencyAmount"]

Out[1]= {{16, 18}, {41, 42}}
```

---

Find positions of cities, countries and dates in text:

```wl
In[1]:= TextPosition["NYC, Los Angeles, and Chicago are the largest cities in the United States of America in 2018.", {"City", "Country", "Date"}]

Out[1]= <|"City" -> {{1, 3}, {6, 16}, {23, 29}}, "Country" -> {{61, 84}}, "Date" -> {{89, 92}}|>

In[2]:= Function[text, Map[StringTake[text, #]&, TextPosition[text, {"City", "Country", "Date"}]]]["NYC, Los Angeles, and Chicago are the largest cities in the United States of America in 2018."]

Out[2]= <|"City" -> {"NYC", "Los Angeles", "Chicago"}, "Country" -> {"United States of America"}, "Date" -> {"2018"}|>
```

---

Find all the locations and get their positions:

```wl
In[1]:= TextPosition["NYC, Los Angeles, and Chicago are the largest cities in the USA in 2018.", "Location"]

Out[1]= {{1, 3}, {6, 16}, {23, 29}, {61, 63}}

In[2]:= Function[text, StringTake[text, TextPosition[text, "Location"]]]["NYC, Los Angeles, and Chicago are the largest cities in the USA in 2018."]

Out[2]= {"NYC", "Los Angeles", "Chicago", "USA"}
```

---

Find all references to New York City in a text:

```wl
In[1]:= TextPosition["I love New York - I ❤ NYC", Entity["City", {"NewYork", "NewYork", "UnitedStates"}]]

Out[1]= {{8, 15}, {23, 25}}
```

### Scope (4)

#### ContentObject and Files (2)

Find instances of colors in a ``ContentObject`` :

```wl
In[1]:= doc = TextSearch["ExampleData/Text", "dog"][1]

Out[1]=
ContentObject[Association["Location" -> File["/usr/local/Wolfram/Mathematica/12.0_2018.12.11_afterN\
IPS/SystemFiles/Components/TextSearch/ExampleData/Text/AliceInWonderland.txt"], 
  "FileName" -> "AliceInWonderland.txt", "ModificationDate" -> 
   ... eyAbsent", "CreationDate"], "FileByteCount" -> 51724, 
  "FileExtension" -> "txt", "ReferenceLocation" -> File["/usr/local/Wolfram/Mathematica/12.0_2018.1\
2.11_afterNIPS/SystemFiles/Components/TextSearch/ExampleData/Text/AliceInWonderland.txt"]]]

In[2]:= TextPosition[doc, "Color"]

Out[2]= {{569, 573}, {587, 590}, {1929, 1934}, {2694, 2698}, {8002, 8006}, {8061, 8065}, {8399, 8403}, {8941, 8945}, {18133, 18138}, {18677, 18681}, {19067, 19071}, {20089, 20093}, {25505, 25508}, {28763, 28767}, {39375, 39378}, {39453, 39457}, {39519, 39521}, {39841, 39844}, {39865, 39869}, {40630, 40634}, {41381, 41385}, {45962, 45966}, {46441, 46445}, {46547, 46551}, {46832, 46836}, {48190, 48194}, {48574, 48578}, {49633, 49637}, {50201, 50205}, {50262, 50266}, {50791, 50796}, {51043, 51046}}

In[3]:= StringTake[doc["Plaintext"], TextPosition[doc, "Color"]]

Out[3]= {"White", "pink", "ORANGE", "White", "White", "white", "white", "white", "Canary", "White", "white", "white", "blue", "green", "rose", "white", "red", "rose", "white", "White", "White", "White", "White", "White", "White", "White", "White", "White", "White", "White", "purple", "rose"}
```

---

Find quantities in a ``File`` :

```wl
In[1]:= file = TextSearch["ExampleData/Text", "dog"][1, "Location"]

Out[1]= File["/usr/local/Wolfram/Mathematica/12.0_2018.12.11_afterNIPS/SystemFiles/Components/TextSearch/ExampleData/Text/AliceInWonderland.txt"]

In[2]:= TextPosition[file, "Color"]

Out[2]= {{569, 573}, {587, 590}, {1929, 1934}, {2694, 2698}, {8002, 8006}, {8061, 8065}, {8399, 8403}, {8941, 8945}, {18131, 18136}, {18675, 18679}, {19065, 19069}, {20087, 20091}, {25503, 25506}, {28761, 28765}, {39373, 39376}, {39451, 39455}, {39517, 39519}, {39839, 39842}, {39863, 39867}, {40628, 40632}, {41379, 41383}, {45960, 45964}, {46439, 46443}, {46545, 46549}, {46830, 46834}, {48188, 48192}, {48572, 48576}, {49631, 49635}, {50199, 50203}, {50260, 50264}, {50789, 50794}, {51041, 51044}}

In[3]:= StringTake[Import[file, "Plaintext"], TextPosition[file, "Color"]]

Out[3]= {"White", "pink", "ORANGE", "White", "White", "white", "white", "white", "Canary", "White", "white", "white", "blue", "green", "rose", "white", "red", "rose", "white", "White", "White", "White", "White", "White", "White", "White", "White", "White", "White", "White", "purple", "rose"}
```

#### Alternatives and Containing (2)

Use ``Alternatives`` to match multiple types:

```wl
In[1]:= TextPosition["John and Mary went to the store.", "Noun" | "Verb"]

Out[1]= {{15, 18}, {27, 31}}

In[2]:= TextPosition["John and Mary went to the store.", "Noun" | "ProperNoun" | "Verb"]

Out[2]= {{1, 4}, {10, 13}, {15, 18}, {27, 31}}
```

Find all sentences in a string that contain currency amounts:

```wl
In[3]:= TextPosition["I have a fairly clear idea of what I will buy at the store.  I want shoes, a computer, and a jacket.  The computer will be the most expensive, and will cost over $1000.", Containing["Sentence", "CurrencyAmount"]]

Out[3]= {{103, 168}}
```

Find all sentences in a string that contain countries:

```wl
In[4]:= TextPosition["On vacation, I first went to France, then I went to Belgium.  The food was amazing in both countries.", Containing["Sentence", "Country"]]

Out[4]= {{1, 60}}
```

---

Combine ``Alternatives`` and ``Containing`` to form highly structured queries:

```wl
In[1]:= TextPosition["I have a fairly clear idea of what I will buy at the store.  I want shoes, a computer, and a jacket.  The computer will be the most expensive, and will cost over $1000.  John will like my computer.", Containing["Sentence", "CurrencyAmount" | "ProperNoun"]]

Out[1]= {{103, 168}, {171, 197}}
```

### Options (3)

#### AcceptanceThreshold (1)

By default, all the detected entities have an estimated probability higher than 0.5:

```wl
In[1]:= TextPosition[ExampleData[{"Text", "JFKInaugural"}], {"Country", "Date", "Person"}]

Out[1]= <|"Country" -> {{851, 859}, {2988, 2995}, {5739, 5747}, {5830, 5838}, {6952, 6960}, {7098, 7104}, {7212, 7218}}, "Date" -> {{12, 16}, {663, 667}, {1131, 1135}, {5893, 5895}}, "Person" -> {{4197, 4203}, {5023, 5028}, {6428, 6431}}|>

In[2]:= Map[StringTake[ExampleData[{"Text", "JFKInaugural"}], #]&, TextPosition[ExampleData[{"Text", "JFKInaugural"}], {"Country", "Date", "Person"}]]

Out[2]= <|"Country" -> {"Americans", "Americas", "Americans", "Americans", "Americans", "America", "America"}, "Date" -> {"today", "today", "today", "Now"}, "Person" -> {"Mankind", "Isaiah", "Will"}|>
```

Get only the entities that are highly probable to be correct by setting a high ``AcceptanceThreshold`` :

```wl
In[3]:= TextPosition[ExampleData[{"Text", "JFKInaugural"}], {"Country", "Date", "Person"}, "AcceptanceThreshold" -> 0.9]

Out[3]= <|"Country" -> {{851, 859}, {5830, 5838}, {6952, 6960}, {7212, 7218}}, "Date" -> {{12, 16}, {663, 667}, {1131, 1135}, {5893, 5895}}, "Person" -> {{5023, 5028}}|>

In[4]:= Map[StringTake[ExampleData[{"Text", "JFKInaugural"}], #]&, TextPosition[ExampleData[{"Text", "JFKInaugural"}], {"Country", "Date", "Person"}, "AcceptanceThreshold" -> 0.9]]

Out[4]= <|"Country" -> {"Americans", "Americans", "Americans", "America"}, "Date" -> {"today", "today", "today", "Now"}, "Person" -> {"Isaiah"}|>
```

#### PerformanceGoal (1)

Using ``PerformanceGoal -> "Speed"`` can help to have faster detection, at the cost of lower accuracy:

```wl
In[1]:= AbsoluteTiming@TextPosition["My favourite cities are New York and Foix", "City"]

Out[1]= {0.02516, {{25, 32}, {38, 41}}}

In[2]:= AbsoluteTiming@TextPosition["My favourite cities are New York and Foix", "City", PerformanceGoal -> "Speed"]

Out[2]= {0.00233, {{25, 32}}}
```

#### VerifyInterpretation (1)

By default, some entities cannot be interpreted, either because they are not correct or because they are not yet in the knowledgebase. In these cases, a string is returned instead of an interpretation:

```wl
In[1]:= TextPosition["We visited Toulouse and Foix in France.", "City"]

Out[1]= {{12, 19}, {25, 28}}

In[2]:= AssociationMap[Interpreter["City"], StringTake["We visited Toulouse and Foix in France.", {{12, 19}, {25, 28}}]]

Out[2]=
<|"Toulouse" -> Entity["City", {"Toulouse", "MidiPyrenees", "France"}], "Foix" -> Failure["InterpretationFailure", Association["MessageTemplate" :> Interpreter::semantictype, 
  "MessageParameters" -> Association["Type" -> "city", "Input" -> "Foix"], "Type" -> "City", 
  "Input" -> "Foix"]]|>
```

Use ``VerifyInterpretation`` to filter out the entities that cannot be interpreted:

```wl
In[3]:= TextPosition["We visited Toulouse and Foix in Midi-Pyrénées in France.", "City", VerifyInterpretation -> True]

Out[3]= {{12, 19}}
```

### Applications (6)

#### Word and Sentence Segmentation (2)

Word segmentation preserves syntactic elements such as email addresses, URLs and Twitter handles:

```wl
In[1]:= wordPositions = TextPosition["His email address is user@domain.com and Twitter handle is @username. http://www.wolfram.com is a useful resource for Wolfram Language programmers.", "Word"]

Out[1]= {{1, 3}, {5, 9}, {11, 17}, {19, 20}, {22, 36}, {38, 40}, {42, 48}, {50, 55}, {57, 58}, {60, 68}, {71, 92}, {94, 95}, {97, 97}, {99, 104}, {106, 113}, {115, 117}, {119, 125}, {127, 134}, {136, 146}}

In[2]:= TextElement@StringTake["His email address is user@domain.com and Twitter handle is @username. http://www.wolfram.com is a useful resource for Wolfram Language programmers.", wordPositions]

Out[2]=
TextElement[{"His", "email", "address", "is", "user@domain.com", "and", "Twitter", "handle", "is", 
  "@username", "http://www.wolfram.com", "is", "a", "useful", "resource", "for", "Wolfram", 
  "Language", "programmers"}]
```

All the non-whitespace characters are grabbed with forms ``"Word"`` and ``"Punctuation"`` :

```wl
In[3]:= tokenPositions = TextPosition["Washington D.C. is the capital of the United States.  Mr. Anthony A. Williams was the mayor.", "Word" | "Punctuation"]

Out[3]= {{1, 10}, {12, 15}, {17, 18}, {20, 22}, {24, 30}, {32, 33}, {35, 37}, {39, 44}, {46, 51}, {52, 52}, {55, 57}, {59, 65}, {67, 68}, {70, 77}, {79, 81}, {83, 85}, {87, 91}, {92, 92}}

In[4]:= TextElement@StringTake["Washington D.C. is the capital of the United States.  Mr. Anthony A. Williams was the mayor.", tokenPositions]

Out[4]=
TextElement[{"Washington", "D.C.", "is", "the", "capital", "of", "the", "United", "States", ".", 
  "Mr.", "Anthony", "A.", "Williams", "was", "the", "mayor", "."}]
```

---

Sentence segmentation intelligently ignores acronyms and other misleading boundaries:

```wl
In[1]:=
text = "Washington D.C. is the capital of the United States.  Mr. Fox is the CEO of the company.  She co-founded the company with Mrs. Smith.";
sentencePositions = TextPosition[text, "Sentence"]

Out[1]= {{1, 52}, {55, 88}, {91, 133}}

In[2]:= TextElement@StringTake[text, sentencePositions]

Out[2]=
TextElement[{"Washington D.C. is the capital of the United States.", 
  "Mr. Fox is the CEO of the company.", "She co-founded the company with Mrs. Smith."}]
```

#### Parts of Speech (2)

Return all words of a given part of speech:

```wl
In[1]:= TextPosition["John and Mary went to the new store.", "Noun"]

Out[1]= {{31, 35}}

In[2]:= TextPosition["John and Mary went to the new store.", "Verb"]

Out[2]= {{15, 18}}

In[3]:= TextPosition["John and Mary went to the new store.", "Preposition"]

Out[3]= {{20, 21}}
```

---

Make a table of word clouds from parts of speech:

```wl
In[1]:= alice = ExampleData[{"Text", "AliceInWonderland"}];

In[2]:= partsOfSpeech = TextPosition[alice, {"Noun", "Verb", "Adjective", "Adverb"}]

Out[2]= <|"Noun" -> {{13, 23}, {81, 86}, {95, 98}, {115, 121}, {168, 171}, {177, 182}, {211, 218}, {223, 235}, {261, 263}, {270, 273}, {301, 308}, {313, 325},  <<1420>> , {51248, 51253}, {51271, 51278}, {51295, 51300}, {51335, 51339}, {51350, 51353}, {51388, 51393}, {51414, 51418}, {51462, 51466}, {51499, 51504}, {51561, 51570}, {51575, 51578}, {51705, 51709}},  <<2>> , "Adverb" ->  <<1>> |>

In[3]:= Grid @Partition[KeyValueMap[Labeled[WordCloud[#2, ImageSize -> 200], #1]&, Map[StringTake[alice, #]&, partsOfSpeech]], 2]

Out[3]=
|                               |                            |
| ----------------------------- | -------------------------- |
| Labeled[[image], "Noun"]      | Labeled[[image], "Verb"]   |
| Labeled[[image], "Adjective"] | Labeled[[image], "Adverb"] |
```

#### Entities and Interpretable Objects (2)

Find countries:

```wl
In[1]:=
text = "On vacation, I first went to France, then I went to Belgium.";
TextPosition[text, "Country"]

Out[1]= {{30, 35}, {53, 59}}

In[2]:= StringTake[text, TextPosition[text, "Country"]]

Out[2]= {"France", "Belgium"}
```

Return interpreted strings as ``Entity`` objects:

```wl
In[3]:= AssociationMap[Interpreter["Country"], {"France", "Belgium"}]

Out[3]= <|"France" -> Entity["Country", "France"], "Belgium" -> Entity["Country", "Belgium"]|>
```

---

Find currency amounts in a Wikipedia article:

```wl
In[1]:= dollarstore = WikipediaData@First@WikipediaSearch["Variety Store"];

In[2]:= currencies = TextPosition[dollarstore, "CurrencyAmount"]

Out[2]= {{3427, 3434}, {3800, 3807}, {4606, 4609}, {4713, 4715}, {4774, 4775}, {5186, 5187}, {5191, 5193}, {6089, 6098}, {6104, 6112}, {6547, 6555}, {6732, 6736}, {6765, 6769}, {6783, 6794}, {8455, 8456}, {8462, 8463}, {8620, 8622}, {8631, 8637}, {8685, 86 ... 12469}, {12502, 12506}, {12511, 12515}, {12581, 12590}, {12593, 12599}, {12773, 12778}, {12898, 12906}, {13022, 13032}, {13371, 13378}, {14020, 14021}, {14033, 14040}, {14146, 14152}, {14241, 14247}, {14256, 14262}, {14729, 14754}, {15351, 15354}}
```

Get currency amounts:

```wl
In[3]:= Interpreter["CurrencyAmount"][StringTake[dollarstore, currencies]]

Out[3]= {Quantity[185000, "USDollars"], Quantity[10, "USCents"], Quantity[300, "USDollars"], Quantity[10, "USCents"], Quantity[5, "USCents"], Quantity[5, "USCents"], Quantity[10, "USCents"], Quantity[5, "USCents"], Quantity[10, "USCents"], Quantity[10, "US ... , "USDollars"], Quantity[99, "USCents"], Quantity[2, "USDollars"], Quantity[3, "USDollars"], Quantity[2, "MexicanPesos"], Quantity[1.99, "BrazilianReais"], Quantity[1.2, "USDollars"], Quantity[1000, "ChileanPesos"], Quantity[100, "BritishPounds"]}
```

### Properties & Relations (1)

``TextPosition`` handles the same types as ``TextCases`` and ``TextContents``, and always identify the same substrings as these functions for a given type:

```wl
In[1]:= TextContents["Boston, Worcester, and Springfield are the largest cities in Massachusetts.", "City"]

Out[1]=
Dataset[{Association["String" -> "Boston", "Position" -> {1, 6}, 
   "Probability" -> 0.904606819152832, "HighlightedSnippet" -> 
    Row[{Highlighted["Boston"], ", Worcester, and Springfield are the"}]], 
  Association["String" -> "Worcester", "Po ...  ", and Springfield are the"}]], Association["String" -> "Springfield", 
   "Position" -> {24, 34}, "Probability" -> 0.9392808079719543, 
   "HighlightedSnippet" -> Row[{"Worcester, and ", Highlighted["Springfield"], 
      " are the largest"}]]}]

In[2]:= TextCases["Boston, Worcester, and Springfield are the largest cities in Massachusetts.", "City"]

Out[2]= {"Boston", "Worcester", "Springfield"}

In[3]:= TextPosition["Boston, Worcester, and Springfield are the largest cities in Massachusetts.", "City"]

Out[3]= {{1, 6}, {9, 17}, {24, 34}}
```

``TextCases`` is a generalization of ``TextPosition`` :

```wl
In[4]:= TextCases["Boston, Worcester, and Springfield are the largest cities in Massachusetts.", "City" -> "Position"]

Out[4]= {{1, 6}, {9, 17}, {24, 34}}

In[5]:= TextPosition["Boston, Worcester, and Springfield are the largest cities in Massachusetts.", {"City", "AdministrativeDivision"}]

Out[5]= <|"City" -> {{1, 6}, {9, 17}, {24, 34}}, "AdministrativeDivision" -> {{1, 6}, {9, 17}, {62, 74}}|>

In[6]:= TextCases["Boston, Worcester, and Springfield are the largest cities in Massachusetts.", {"City", "AdministrativeDivision"} -> "Position"]

Out[6]= <|"City" -> {{1, 6}, {9, 17}, {24, 34}}, "AdministrativeDivision" -> {{1, 6}, {9, 17}, {62, 74}}|>
```

## See Also

* [`TextCases`](https://reference.wolfram.com/language/ref/TextCases.en.md)
* [`TextContents`](https://reference.wolfram.com/language/ref/TextContents.en.md)
* [`StringPosition`](https://reference.wolfram.com/language/ref/StringPosition.en.md)
* [`Containing`](https://reference.wolfram.com/language/ref/Containing.en.md)

## Related Guides

* [Text Content Types](https://reference.wolfram.com/language/guide/TextContentTypes.en.md)
* [Text Analysis](https://reference.wolfram.com/language/guide/TextAnalysis.en.md)
* [Natural Language Processing](https://reference.wolfram.com/language/guide/NaturalLanguageProcessing.en.md)

## History

* [Introduced in 2015 (10.2)](https://reference.wolfram.com/language/guide/SummaryOfNewFeaturesIn102.en.md) \| [Updated in 2019 (12.0)](https://reference.wolfram.com/language/guide/SummaryOfNewFeaturesIn120.en.md)