TextSearch
TextSearch[source,form]
searches for files referenced by source that contain text matching form.
TextSearch[source,form,"prop"]
returns the property prop for each result.
Details and Options
- TextSearch supports the following basic search forms, which can be combined arbitrarily:
-
"string" complete string must occur in the document {s1,s2,…} all si must occur in the document s1s2… at least one of the si must occur in the document "field"s1 the si must occur in the given field FixedOrder[s1,s2,…] the si must occur in the order given Except[q] q cannot occur in the document SearchAdjustment[q,…] q occurs and is assigned a certain weight etc. Between[…],LessThan[…],… numbers etc. in particular ranges etc. occur SearchQueryString["query"] complete search-engine-style query All all indexed documents are returned - TextSearch also supports the alternative forms ContainsAll[…], ContainsAny[…] and ContainsNone[…], as well as And[…], Or[…] and Not[…].
- TextSearch allows the following to be used as sources:
-
"path" or File["path"] individual file or directory of files "name" SearchIndexObject["name"] SearchIndexObject[…] search index generated by CreateSearchIndex {obj1,obj2,…} list of objects - TextSearch[source,form] returns a SearchResultObject[…] expression.
- The properties prop can be any of the properties for ContentObject, in which case TextSearch[source,form,prop] is equivalent to SearchResultObject[…][All,prop].
- The following additional properties are supported in TextSearch[source,form,prop]:
-
"Count" total number of search results "ContentObject" search results as content objects in a list "Association" search results in an association - The property "Score" defines the default order in which search results are sorted.
- Possible options include:
-
ContentFieldOptions < > weighting options for fields DocumentWeightingRules None weighting of documents based on the values of certain fields MaxItems All the number of items to return - Files with an extension typical of binary files as well as files that contain non-textual byte values will not be indexed or searched.
- TextSearch supports many file formats that can be imported as plain text. Some of these include: "TXT", "CSV", "JSON", "XML", "PDF", "NB", "EPS".
- TextSearch does not support most image, audio and other kinds of file formats that do not have a text component.
- TextSearch reads the "Plaintext" element of a file, which is given by Import[file, "Plaintext"].
Examples
open allclose allBasic Examples (3)
Create an index over a specific directory:
Search for files in the specified directory that contain the word "dog" using the index:
Search for files containing "man" but not "animal" using advanced query syntax and show a contextual snippet:
TextSearch can also query a directory without indexing it:
Search for files containing both words:
Options (3)
ContentFieldOptions (1)
Possible Issues (2)
Only field weights can be specified at search time, while other content field options need to be specified at index creation time:
The following search returns a result because "IgnoreCase" is True by default, and "IgnoreCase" cannot be specified at search time:
When the same option is specified at index time, no result is returned:
The "Score" fields of objects matched in different indices are not comparable in general:
And indeed, in general a search on multiple indices returns results not sorted by "Score":
Text
Wolfram Research (2015), TextSearch, Wolfram Language function, https://reference.wolfram.com/language/ref/TextSearch.html (updated 2017).
CMS
Wolfram Language. 2015. "TextSearch." Wolfram Language & System Documentation Center. Wolfram Research. Last Modified 2017. https://reference.wolfram.com/language/ref/TextSearch.html.
APA
Wolfram Language. (2015). TextSearch. Wolfram Language & System Documentation Center. Retrieved from https://reference.wolfram.com/language/ref/TextSearch.html