ContentFieldOptions

ContentFieldOptions

is an option for CreateSearchIndex and related functions that allows options to be specified for handling different fields in content that is being indexed.

Details

  • ContentFieldOptions-><|"name1"->opts1,"name2"->opts2,|> specifies that the field named namei should be indexed using the options given in the association optsi.
  • Possible entries in each optsi association include:
  • "BulkRetrievalOptimized"whether to index the field to optimize for bulk retrieval
    "CamelCaseMatching"whether to camel case to match multiword forms
    "DeleteStopWords"whether to delete stop words before indexing
    "IgnoreCase"whether case is ignored for indexing and matching
    "Language"what language to assume the field is in
    "LengthWeighted"whether matches in shorter fields count more
    "Searchable"whether the field should be searchable
    "StemmingMethod"how to stem words for indexing and matching
    "Stored"whether to store the literal content of the field in the index
    "Tokenized"whether the field should be tokenized before indexing
    "Type"overall type of field
    "Weight"how to weight this field when searching
  • Typical types of fields include: "Title", "Text", "String", "Date", "DateTime", "Integer", "Real", "Boolean".
  • Different field types are given different default weights.
  • Field types such as "Title" and "Integer" are stored by default, while those such as "Text" are not.
  • "Title" and "Text" are tokenized and undergo stopword deletion by default, unlike "String" or "Date".
  • All field types are searchable by default.
  • All field types are not optimized for bulk retrieval by default.
  • By default, a match in a longer field will have a lower impact on the final score than a match in a shorter field. To disable this behavior, which is the default for all field types, set "LengthWeighted" to False.
  • The default value for "StemmingMethod" is "Porter". Alternative values include "Kstem" and None.
  • If explicit options are specified in addition to a type, the explicit options override defaults for that type.
  • All->opts can be used to indicate option settings to be used for all types by default.

Examples

Basic Examples  (12)

Create an example index, setting the language of "Field2" to French:

In[1]:=
Click for copyable input

The French stopwords "le" and "la" are ignored, resulting in a match:

In[2]:=
Click for copyable input
Out[2]=

Store the textual content so that it is returned in the search result:

In[1]:=
Click for copyable input
In[2]:=
Click for copyable input
Out[2]=

Setting the field type to "Field2" weights it more heavily when ranking search results, and also returns its value in content objects:

In[1]:=
Click for copyable input
In[2]:=
Click for copyable input
Out[2]=

If exact case is important, "IgnoreCase" can be set to False for a field:

In[1]:=
Click for copyable input

Since the case does not match, no results are found:

In[2]:=
Click for copyable input
Out[2]=

"CamelCaseMatching" can be disabled for non-word content if desired:

In[1]:=
Click for copyable input

This would match if "CamelCaseMatching" were enabled:

In[2]:=
Click for copyable input
Out[2]=

Stemming can be disabled for non-word content:

In[1]:=
Click for copyable input

This would match if stemming were enabled:

In[2]:=
Click for copyable input
Out[2]=

The "Weight" of a field can be specified for higher result ranking:

In[1]:=
Click for copyable input

When the match is in the "Keyword" field, the score is multiplied by the "Weight" of 10:

In[2]:=
Click for copyable input
Out[2]=

Non-searchable fields cannot be searched but if stored, can be retrieved from the resulting content objects:

In[1]:=
Click for copyable input
In[2]:=
Click for copyable input
Out[2]=
In[3]:=
Click for copyable input
Out[3]=

Disable stop word deletion for certain fields:

In[1]:=
Click for copyable input

The stop word "or" is only found in those fields:

In[2]:=
Click for copyable input
Out[2]=

By default, a match in a longer field has a lower impact on the final score than a match in a shorter field:

In[1]:=
Click for copyable input
In[2]:=
Click for copyable input
Out[2]=

This behavior can be disabled by setting "LengthWeighted" to False:

In[3]:=
Click for copyable input
In[4]:=
Click for copyable input
Out[4]=

Set "Tokenized" to False to require a verbatim match of a field:

In[1]:=
Click for copyable input
In[2]:=
Click for copyable input
Out[2]=

The field needs to be queried explicitly, or it is not matched:

In[3]:=
Click for copyable input
Out[3]=

The field is only matched verbatim:

In[4]:=
Click for copyable input
Out[4]=
In[5]:=
Click for copyable input
Out[5]=
In[6]:=
Click for copyable input
Out[6]=

When a field is used for document weighting, setting "BulkRetrievalOptimized" to True can improve the performance:

In[1]:=
Click for copyable input
In[2]:=
Click for copyable input
In[3]:=
Click for copyable input
Out[3]=
In[4]:=
Click for copyable input
Out[4]=

See Also

CreateSearchIndex  ContentObject  TextSearch  SearchAdjustment

Introduced in 2016
(11.0)