TextSearch

TextSearch[source,form]

searches for files referenced by source that contain text matching form.

TextSearch[source,form,"prop"]

returns the property prop for each result.

Details and Options

  • TextSearch supports the following basic search forms, which can be combined arbitrarily:
  • "string"complete string must occur in the document
    {s1,s2,}all si must occur in the document
    s1|s2|at least one of the si must occur in the document
    "field"s1the si must occur in the given field
    FixedOrder[s1,s2,]the si must occur in the order given
    Except[q]q cannot occur in the document
    SearchAdjustment[q,]q occurs and is assigned a certain weight etc.
    Between[],LessThan[],numbers etc. in particular ranges etc. occur
    SearchQueryString["query"]complete search-engine-style query
    Allall indexed documents are returned
  • TextSearch also supports the alternative forms ContainsAll[], ContainsAny[] and ContainsNone[], as well as And[], Or[] and Not[].
  • TextSearch allows the following to be used as sources:
  • "path" or File["path"]individual file or directory of files
    "name"SearchIndexObject["name"]
    SearchIndexObject[]search index generated by CreateSearchIndex
    {obj1,obj2,}list of objects
  • TextSearch[source,form] returns a SearchResultObject[] expression.
  • The properties prop can be any of the properties for ContentObject, in which case TextSearch[source,form,prop] is equivalent to SearchResultObject[][All,prop].
  • The following additional properties are supported in TextSearch[source,form,prop]:
  • "Count"total number of search results
    "ContentObject"search results as content objects in a list
    "Association"search results in an association
  • The property "Score" defines the default order in which search results are sorted.
  • Possible options include:
  • ContentFieldOptions<||>weighting options for fields
    DocumentWeightingRulesNoneweighting of documents based on the values of certain fields
    MaxItemsAllthe number of items to return
  • Files with an extension typical of binary files as well as files that contain non-textual byte values will not be indexed or searched.
  • TextSearch supports many file formats that can be imported as plain text. Some of these include: "TXT", "CSV", "JSON", "XML", "PDF", "NB", "EPS".
  • TextSearch does not support most image, audio and other kinds of file formats that do not have a text component.
  • TextSearch reads the "Plaintext" element of a file, which is given by Import[file, "Plaintext"].

Examples

open allclose all

Basic Examples  (3)

Create an index over a specific directory:

Search for files in the specified directory that contain the word "dog" using the index:

Search for files containing "man" but not "animal" using advanced query syntax and show a contextual snippet:

TextSearch can also query a directory without indexing it:

Search for files containing both words:

Search for files containing either word:

Queries can be combined:

Scope  (1)

Create an index using example text:

Add a second index:

Search in both indices:

Options  (3)

ContentFieldOptions  (1)

Specify weights for the fields in the index at query time:

Set a weight of 2 for the "Keywords" field:

When no weight is set, "doc1" gets a higher score:

DocumentWeightingRules  (1)

Define a "ConfidenceLevel" field and use it for document weighting:

MaxItems  (1)

Get only the first result:

Properties & Relations  (1)

Queries are case insensitive:

Queries only match entire words:

Possible Issues  (2)

Only field weights can be specified at search time, while other content field options need to be specified at index creation time:

The following search returns a result because "IgnoreCase" is True by default, and "IgnoreCase" cannot be specified at search time:

When the same option is specified at index time, no result is returned:

The "Score" fields of objects matched in different indices are not comparable in general:

And indeed, in general a search on multiple indices returns results not sorted by "Score":

Introduced in 2015
 (10.2)
 |
Updated in 2017
 (11.1)