Faiss Index Lookup#

Faiss Index Lookup is a tool tailored for querying within a user-provided Faiss-based vector store. In combination with our Large Language Model (LLM) tool, it empowers users to extract contextually relevant information from a domain knowledge base.

Requirements#

For AzureML users, the tool is installed in default image, you can use the tool without extra installation.
For local users, if your index is stored in local path,

pip install promptflow-vectordb

if your index is stored in Azure storage,

pip install promptflow-vectordb[azure]

Prerequisites#

For AzureML users,#

step 1. Prepare an accessible path on Azure Blob Storage. Here’s the guide if a new storage account needs to be created: Azure Storage Account.
step 2. Create related Faiss-based index files on Azure Blob Storage. We support the LangChain format (index.faiss + index.pkl) for the index files, which can be prepared either by employing our promptflow-vectordb SDK or following the quick guide from LangChain documentation. Please refer to the instructions of An example code for creating Faiss index for building index using promptflow-vectordb SDK.
step 3. Based on where you put your own index files, the identity used by the promptflow runtime should be granted with certain roles. Please refer to Steps to assign an Azure role:

Location

Role

workspace datastores or workspace default blob

AzureML Data Scientist

other blobs

Storage Blob Data Reader

Location	Role
workspace datastores or workspace default blob	AzureML Data Scientist
other blobs	Storage Blob Data Reader

For local users,#

Create Faiss-based index files in local path by only doing step 2 above.

Inputs#

The tool accepts the following inputs:

Name	Type	Description	Required
path	string	URL or path for the vector store. local path (for local users): `<local_path_to_the_index_folder>` Azure blob URL format (with [azure] extra installed): https://`<account_name>`.blob.core.windows.net/`<container_name>`/`<path_and_folder_name>`. AML datastore URL format (with [azure] extra installed): azureml://subscriptions/`<your_subscription>`/resourcegroups/`<your_resource_group>`/workspaces/`<your_workspace>`/data/`<data_path>` public http/https URL (for public demonstration): http(s)://`<path_and_folder_name>`	Yes
vector	list[float]	The target vector to be queried, which can be generated by the LLM tool.	Yes
top_k	integer	The count of top-scored entities to return. Default value is 3.	No

Outputs#

The following is an example for JSON format response returned by the tool, which includes the top-k scored entities. The entity follows a generic schema of vector search result provided by our promptflow-vectordb SDK. For the Faiss Index Search, the following fields are populated:

Field Name	Type	Description
text	string	Text of the entity
score	float	Distance between the entity and the query vector
metadata	dict	Customized key-value pairs provided by user when create the index

Output

[
  {
    "metadata": {
      "link": "http://sample_link_0",
      "title": "title0"
    },
    "original_entity": null,
    "score": 0,
    "text": "sample text #0",
    "vector": null
  },
  {
    "metadata": {
      "link": "http://sample_link_1",
      "title": "title1"
    },
    "original_entity": null,
    "score": 0.05000000447034836,
    "text": "sample text #1",
    "vector": null
  },
  {
    "metadata": {
      "link": "http://sample_link_2",
      "title": "title2"
    },
    "original_entity": null,
    "score": 0.20000001788139343,
    "text": "sample text #2",
    "vector": null
  }
]