Faiss Index Lookup#

Faiss Index Lookup is a tool tailored for querying within a user-provided Faiss-based vector store. In combination with our Large Language Model (LLM) tool, it empowers users to extract contextually relevant information from a domain knowledge base.

Requirements#

  • For AzureML users, the tool is installed in default image, you can use the tool without extra installation.

  • For local users, if your index is stored in local path,

    pip install promptflow-vectordb

    if your index is stored in Azure storage,

    pip install promptflow-vectordb[azure]

Prerequisites#

For AzureML users,#

  • step 1. Prepare an accessible path on Azure Blob Storage. Here’s the guide if a new storage account needs to be created: Azure Storage Account.

  • step 2. Create related Faiss-based index files on Azure Blob Storage. We support the LangChain format (index.faiss + index.pkl) for the index files, which can be prepared either by employing our promptflow-vectordb SDK or following the quick guide from LangChain documentation. Please refer to the instructions of An example code for creating Faiss index for building index using promptflow-vectordb SDK.

  • step 3. Based on where you put your own index files, the identity used by the promptflow runtime should be granted with certain roles. Please refer to Steps to assign an Azure role:

    Location

    Role

    workspace datastores or workspace default blob

    AzureML Data Scientist

    other blobs

    Storage Blob Data Reader

For local users,#

  • Create Faiss-based index files in local path by only doing step 2 above.

Inputs#

The tool accepts the following inputs:

Name

Type

Description

Required

path

string

URL or path for the vector store.

local path (for local users):
<local_path_to_the_index_folder>

Azure blob URL format (with [azure] extra installed):
https://<account_name>.blob.core.windows.net/<container_name>/<path_and_folder_name>.

AML datastore URL format (with [azure] extra installed):
azureml://subscriptions/<your_subscription>/resourcegroups/<your_resource_group>/workspaces/<your_workspace>/data/<data_path>

public http/https URL (for public demonstration):
http(s)://<path_and_folder_name>

Yes

vector

list[float]

The target vector to be queried, which can be generated by the LLM tool.

Yes

top_k

integer

The count of top-scored entities to return. Default value is 3.

No

Outputs#

The following is an example for JSON format response returned by the tool, which includes the top-k scored entities. The entity follows a generic schema of vector search result provided by our promptflow-vectordb SDK. For the Faiss Index Search, the following fields are populated:

Field Name

Type

Description

text

string

Text of the entity

score

float

Distance between the entity and the query vector

metadata

dict

Customized key-value pairs provided by user when create the index

Output
[
  {
    "metadata": {
      "link": "http://sample_link_0",
      "title": "title0"
    },
    "original_entity": null,
    "score": 0,
    "text": "sample text #0",
    "vector": null
  },
  {
    "metadata": {
      "link": "http://sample_link_1",
      "title": "title1"
    },
    "original_entity": null,
    "score": 0.05000000447034836,
    "text": "sample text #1",
    "vector": null
  },
  {
    "metadata": {
      "link": "http://sample_link_2",
      "title": "title2"
    },
    "original_entity": null,
    "score": 0.20000001788139343,
    "text": "sample text #2",
    "vector": null
  }
]