Vector DB Lookup#

Vector DB Lookup is a vector search tool that allows users to search top k similar vectors from vector database. This tool is a wrapper for multiple third-party vector databases. The list of current supported databases is as follows.

Name	Description
Azure Cognitive Search	Microsoft’s cloud search service with built-in AI capabilities that enrich all types of information to help identify and explore relevant content at scale.
Qdrant	Qdrant is a vector similarity search engine that provides a production-ready service with a convenient API to store, search and manage points (i.e. vectors) with an additional payload.
Weaviate	Weaviate is an open source vector database that stores both objects and vectors. This allows for combining vector search with structured filtering.

This tool will support more vector databases.

Requirements#

For AzureML users, the tool is installed in default image, you can use the tool without extra installation.
For local users,

pip install promptflow-vectordb

Prerequisites#

The tool searches data from a third-party vector database. To use it, you should create resources in advance and establish connection between the tool and the resource.

Azure Cognitive Search:
- Create resource Azure Cognitive Search.
- Add “Cognitive search” connection. Fill “API key” field with “Primary admin key” from “Keys” section of created resource, and fill “API base” field with the URL, the URL format is https://{your_serive_name}.search.windows.net.
Qdrant:
- Follow the installation to deploy Qdrant to a self-maintained cloud server.
- Add “Qdrant” connection. Fill “API base” with your self-maintained cloud server address and fill “API key” field.
Weaviate:
- Follow the installation to deploy Weaviate to a self-maintained instance.
- Add “Weaviate” connection. Fill “API base” with your self-maintained instance address and fill “API key” field.

Inputs#

The tool accepts the following inputs:

Azure Cognitive Search:

Name	Type	Description	Required
connection	CognitiveSearchConnection	The created connection for accessing to Cognitive Search endpoint.	Yes
index_name	string	The index name created in Cognitive Search resource.	Yes
text_field	string	The text field name. The returned text field will populate the text of output.	No
vector_field	string	The vector field name. The target vector is searched in this vector field.	Yes
search_params	dict	The search parameters. It’s key-value pairs. Except for parameters in the tool input list mentioned above, additional search parameters can be formed into a JSON object as search_params. For example, use `{"select": ""}` as search_params to select the returned fields, use `{"search": ""}` to perform a hybrid search.	No
search_filters	dict	The search filters. It’s key-value pairs, the input format is like `{"filter": ""}`	No
vector	list	The target vector to be queried, which can be generated by Embedding tool.	Yes
top_k	int	The count of top-scored entities to return. Default value is 3	No

Qdrant:

Name	Type	Description	Required
connection	QdrantConnection	The created connection for accessing to Qdrant server.	Yes
collection_name	string	The collection name created in self-maintained cloud server.	Yes
text_field	string	The text field name. The returned text field will populate the text of output.	No
search_params	dict	The search parameters can be formed into a JSON object as search_params. For example, use `{"params": {"hnsw_ef": 0, "exact": false, "quantization": null}}` to set search_params.	No
search_filters	dict	The search filters. It’s key-value pairs, the input format is like `{"filter": {"should": [{"key": "", "match": {"value": ""}}]}}`	No
vector	list	The target vector to be queried, which can be generated by Embedding tool.	Yes
top_k	int	The count of top-scored entities to return. Default value is 3	No

Weaviate:

Name	Type	Description	Required
connection	WeaviateConnection	The created connection for accessing to Weaviate.	Yes
class_name	string	The class name.	Yes
text_field	string	The text field name. The returned text field will populate the text of output.	No
vector	list	The target vector to be queried, which can be generated by Embedding tool.	Yes
top_k	int	The count of top-scored entities to return. Default value is 3	No

Outputs#

The following is an example JSON format response returned by the tool, which includes the top-k scored entities. The entity follows a generic schema of vector search result provided by promptflow-vectordb SDK.

Azure Cognitive Search:

For Azure Cognitive Search, the following fields are populated:

Field Name	Type	Description
original_entity	dict	the original response json from search REST API
score	float	@search.score from the original entity, which evaluates the similarity between the entity and the query vector
text	string	text of the entity
vector	list	vector of the entity

Output

[
  {
    "metadata": null,
    "original_entity": {
      "@search.score": 0.5099789,
      "id": "",
      "your_text_filed_name": "sample text1",
      "your_vector_filed_name": [-0.40517663431890405, 0.5856996257406859, -0.1593078462266455, -0.9776269170785785, -0.6145604369828972],
      "your_additional_field_name": ""
    },
    "score": 0.5099789,
    "text": "sample text1",
    "vector": [-0.40517663431890405, 0.5856996257406859, -0.1593078462266455, -0.9776269170785785, -0.6145604369828972]
  }
]

Qdrant:

For Qdrant, the following fields are populated:

Field Name	Type	Description
original_entity	dict	the original response json from search REST API
metadata	dict	payload from the original entity
score	float	score from the original entity, which evaluates the similarity between the entity and the query vector
text	string	text of the payload
vector	list	vector of the entity

Output

[
  {
    "metadata": {
      "text": "sample text1"
    },
    "original_entity": {
      "id": 1,
      "payload": {
        "text": "sample text1"
      },
      "score": 1,
      "vector": [0.18257418, 0.36514837, 0.5477226, 0.73029673],
      "version": 0
    },
    "score": 1,
    "text": "sample text1",
    "vector": [0.18257418, 0.36514837, 0.5477226, 0.73029673]
  }
]

Weaviate:

For Weaviate, the following fields are populated:

Field Name	Type	Description
original_entity	dict	the original response json from search REST API
score	float	certainty from the original entity, which evaluates the similarity between the entity and the query vector
text	string	text in the original entity
vector	list	vector of the entity

Output

[
  {
    "metadata": null,
    "original_entity": {
      "_additional": {
        "certainty": 1,
        "distance": 0,
        "vector": [
          0.58,
          0.59,
          0.6,
          0.61,
          0.62
        ]
      },
      "text": "sample text1."
    },
    "score": 1,
    "text": "sample text1.",
    "vector": [
      0.58,
      0.59,
      0.6,
      0.61,
      0.62
    ]
  }
]