TextSearch

TextSearch[source,form]

searches for files referenced by source that contain text matching form.

TextSearch[source,form,"prop"]

returns the property prop for each result.

Details and Options

  • TextSearch supports the following basic search forms, which can be combined arbitrarily:
  • "string"complete string must occur in the document
    {s1,s2,}all si must occur in the document
    s1|s2|at least one of the si must occur in the document
    "field"s1the si must occur in the given field
    FixedOrder[s1,s2,]the si must occur in the order given
    Except[q]q cannot occur in the document
    SearchAdjustment[q,]q occurs and is assigned a certain weight etc.
    Between[],LessThan[],numbers etc. in particular ranges etc. occur
    SearchQueryString["query"]complete search-engine-style query
  • TextSearch also supports the alternative forms ContainsAll[], ContainsAny[] and ContainsNone[], as well as And[], Or[] and Not[].
  • TextSearch allows the following to be used as sources:
  • "path" or File["path"]individual file or directory of files
    "name"SearchIndexObject["name"]
    SearchIndexObject[]search index generated by CreateSearchIndex
    {obj1,obj2,}list of objects
  • TextSearch[source,form] returns a SearchResultObject[] expression.
  • The properties prop can be any of the properties for ContentObject, in which case TextSearch[source,form,prop] is equivalent to SearchResultObject[][All,prop].
  • The following additional properties are supported in TextSearch[source,form,prop]:
  • "Count"total number of search results
    "ContentObject"search results as content objects in a list
    "Association"search results in an association
  • The property "Score" defines the default order in which search results are sorted.
  • Possible options include:
  • ContentFieldOptions<||>weighting options for fields
    DocumentWeightingRulesNoneweighting of documents based on the values of certain fields
    MaxItemsAllthe number of items to return
  • Files with an extension typical of binary files as well as files that contain non-textual byte values will not be indexed or searched.
  • TextSearch supports many file formats that can be imported as plain text. Some of these include: "TXT", "CSV", "JSON", "XML", "PDF", "NB", "EPS".
  • TextSearch does not support most image, audio and other kinds of file formats that do not have a text component.
  • TextSearch reads the "Plaintext" element of a file, which is given by Import[file, "Plaintext"].

Examples

open allclose all

Basic Examples  (3)

Create an index over a specific directory:

In[1]:=
Click for copyable input
Out[1]=

Search for files in the specified directory that contain the word "dog" using the index:

In[2]:=
Click for copyable input
Out[2]=

Search for files containing "man" but not "animal" using advanced query syntax and show a contextual snippet:

In[3]:=
Click for copyable input
Out[3]=

TextSearch can also query a directory without indexing it:

In[1]:=
Click for copyable input
Out[1]=

Search for files containing both words:

In[1]:=
Click for copyable input
Out[1]=

Search for files containing either word:

In[2]:=
Click for copyable input
Out[2]=

Queries can be combined:

In[3]:=
Click for copyable input
Out[3]=

Scope  (1)

Options  (3)

Properties & Relations  (1)

Possible Issues  (2)

See Also

TextSearchReport  SearchIndexObject  ContentObject  CreateSearchIndex  SearchResultObject  SearchIndices  FindList  StringCases

Introduced in 2015
(10.2)