CreateSemanticSearchIndex

CreateSemanticSearchIndex[source]

creates a search index from the data in source.

CreateSemanticSearchIndex[source,"name"]

gives the search index the specified name.

Details and Options

  • CreateSemanticSearchIndex is used to extract features from text that can be used to search the content semantically.
  • Possible values for source are:
  • "string"a plain string
    File["path"]individual file
    URL["url"]the text representation of "url"
    CloudObject[]a cloud object
    LocalObject[]a local object
    {obj1,obj2,}list of objects
  • Sources can be tagged; possible values include:
  • {obj1val1,}a list of vectors and associated values
    {obj1,}{val1,}a rule between vectors and values
  • Accepted forms of vali include:
  • "string"string labels
    <|"tag1"v1,|>an association of tags and metadata values
  • CreateSemanticSearchIndex supports the following options:
  • DistanceFunctionEuclideanDistancethe distance function to use
    FeatureExtractor "SentenceBERT"how to extract features from text segments
    GeneratedAssetLocation $GeneratedAssetLocationthe location of the index
    MethodAutomaticmethod details
    OverwriteTargetAutomaticwhether to overwrite an existing location
    ProgressReporting$ProgressReportingwhether to report the progress of the computation
    WorkingPrecision"Real32"precision of floating-point calculations
  • Possible values for DistanceFunction include EuclideanDistance, SquaredEuclideanDistance, CosineDistance, JaccardDissimilarity and HammingDistance.
  • Possible values for FeatureExtractor include:
  • "SentenceBERT"a local model based on SentenceBERT
    fa custom extractor function
  • Custom extractors f must operate on a list of strings and produce a list of vectors of the same length.
  • Detailed options can be given using Method<|opt1val1|>. Possible values for opti are:
  • "ContextPadding"minimal overlap between items
    "MaximumItemLength"maximum length of a text segment
    "MinimumItemLength"minimum length of a text segment
    "SplitPattern"Automaticwhere to split long strings
  • The automatic "SplitPattern" tries to split text in paragraphs, newlines and words to create chunks between "MinimumItemLength" and "MaximumItemLength".
  • Possible settings for WorkingPrecision include:
  • "Integer8"signed 8-bit integers from through 127
    "Real32"single-precision real (32-bit)
    "Real64"double-precision real (64-bit)

Examples

open allclose all

Basic Examples  (1)

Create a new SemanticSearchIndex:

Search in the text by semantic similarity:

Scope  (3)

Create an index from a string:

Create an index from a file:

Create an index from a URL:

Options  (4)

FeatureExtractor  (1)

Train a custom feature extractor:

Use it to extract features from another text:

GeneratedAssetLocation  (3)

Specify a custom location to store the database:

Retrieve the location:

By default, the database is stored in a local object:

Store the vector database in a file:

Retrieve the location:

Recreate the database from the file reference:

Applications  (1)

Create a reverse mapping between a word and its definitions:

Build an index using the map:

Perform reverse lookup in a dictionary by matching the query against the definitions:

Wolfram Research (2024), CreateSemanticSearchIndex, Wolfram Language function, https://reference.wolfram.com/language/ref/CreateSemanticSearchIndex.html.

Text

Wolfram Research (2024), CreateSemanticSearchIndex, Wolfram Language function, https://reference.wolfram.com/language/ref/CreateSemanticSearchIndex.html.

CMS

Wolfram Language. 2024. "CreateSemanticSearchIndex." Wolfram Language & System Documentation Center. Wolfram Research. https://reference.wolfram.com/language/ref/CreateSemanticSearchIndex.html.

APA

Wolfram Language. (2024). CreateSemanticSearchIndex. Wolfram Language & System Documentation Center. Retrieved from https://reference.wolfram.com/language/ref/CreateSemanticSearchIndex.html

BibTeX

@misc{reference.wolfram_2024_createsemanticsearchindex, author="Wolfram Research", title="{CreateSemanticSearchIndex}", year="2024", howpublished="\url{https://reference.wolfram.com/language/ref/CreateSemanticSearchIndex.html}", note=[Accessed: 30-December-2024 ]}

BibLaTeX

@online{reference.wolfram_2024_createsemanticsearchindex, organization={Wolfram Research}, title={CreateSemanticSearchIndex}, year={2024}, url={https://reference.wolfram.com/language/ref/CreateSemanticSearchIndex.html}, note=[Accessed: 30-December-2024 ]}