TextSearch

TextSearch[source,form]

searches for files referenced by source that contain text matching form.

TextSearch[source,form,"prop"]

returns the property prop for each result.

Details and Options

TextSearch supports the following basic search forms, which can be combined arbitrarily:

	"string"	complete string must occur in the document
	{s₁,s₂,…}	all s_i must occur in the document
	s₁\|s₂\|…	at least one of the s_i must occur in the document
	"field"s₁	the s_i must occur in the given field
	FixedOrder[s₁,s₂,…]	the s_i must occur in the order given
	Except[q]	q cannot occur in the document
	SearchAdjustment[q,…]	q occurs and is assigned a certain weight etc.
	Between[…],LessThan[…],…	numbers etc. in particular ranges etc. occur
	SearchQueryString["query"]	complete search-engine-style query
	All	all indexed documents are returned

TextSearch also supports the alternative forms ContainsAll[…], ContainsAny[…] and ContainsNone[…], as well as And[…], Or[…] and Not[…].
TextSearch allows the following to be used as sources:

	"path" or File["path"]	individual file or directory of files
	"name"	SearchIndexObject["name"]
	SearchIndexObject[…]	search index generated by CreateSearchIndex
	{obj₁,obj₂,…}	list of objects

TextSearch[source,form] returns a SearchResultObject[…] expression.
The properties prop can be any of the properties for ContentObject, in which case TextSearch[source,form,prop] is equivalent to SearchResultObject[…][All,prop].
The following additional properties are supported in TextSearch[source,form,prop]:
"Count" total number of search results

"ContentObject" search results as content objects in a list

"Association" search results in an association
The property "Score" defines the default order in which search results are sorted.
Possible options include:

ContentFieldOptions	<\|\|>	weighting options for fields
DocumentWeightingRules	None	weighting of documents based on the values of certain fields
MaxItems	All	the number of items to return

Files with an extension typical of binary files as well as files that contain non-textual byte values will not be indexed or searched.
TextSearch supports many file formats that can be imported as plain text. Some of these include: "TXT", "CSV", "JSON", "XML", "PDF", "NB", "EPS".
TextSearch does not support most image, audio and other kinds of file formats that do not have a text component.
TextSearch reads the "Plaintext" element of a file, which is given by Import[file, "Plaintext"].

Examples

open allclose all

Basic Examples (3)

Create an index over a specific directory:

Search for files in the specified directory that contain the word "dog" using the index:

Search for files containing "man" but not "animal" using advanced query syntax and show a contextual snippet:

TextSearch can also query a directory without indexing it:

Search for files containing both words:

Search for files containing either word:

Queries can be combined:

Scope (1)

Create an index using example text:

Add a second index:

Search in both indices:

Options (3)

ContentFieldOptions (1)

Specify weights for the fields in the index at query time:

Set a weight of 2 for the "Keywords" field:

When no weight is set, "doc1" gets a higher score:

DocumentWeightingRules (1)

Define a "ConfidenceLevel" field and use it for document weighting:

MaxItems (1)

Get only the first result:

Properties & Relations (1)

Queries are case insensitive:

Queries only match entire words:

Possible Issues (2)

Only field weights can be specified at search time, while other content field options need to be specified at index creation time:

The following search returns a result because "IgnoreCase" is True by default, and "IgnoreCase" cannot be specified at search time:

When the same option is specified at index time, no result is returned:

The "Score" fields of objects matched in different indices are not comparable in general:

And indeed, in general a search on multiple indices returns results not sorted by "Score":

Top

More Learning

Tech Support

Educational Programs for Adults

Educational Programs for Youth

Events

Wolfram Initiatives

Educational Resources

Hobbies & Projects

Wolfram Solutions

Wolfram Solutions For Education

Get Started

Grow Your Skills

Work with Us

Read

Educational Programs for Adults

Educational Programs for Youth

Events

TextSearch

Details and Options

Examples

Basic Examples (3)

Scope (1)

Options (3)

ContentFieldOptions (1)

DocumentWeightingRules (1)

MaxItems (1)

Properties & Relations (1)

Possible Issues (2)

Text

CMS

APA

BibTeX

BibLaTeX

	"Count"	total number of search results
	"ContentObject"	search results as content objects in a list
	"Association"	search results in an association

TextSearch

Details and Options

Examples

Basic Examples (3)

Scope (1)

Options (3)

ContentFieldOptions (1)

DocumentWeightingRules (1)

MaxItems (1)

Properties & Relations (1)

Possible Issues (2)

See Also

Related Guides

Related Links

History

Text

CMS

APA

BibTeX

BibLaTeX