Vector DB Lookup#
Vector DB Lookup is a vector search tool that allows users to search top k similar vectors from vector database. This tool is a wrapper for multiple third-party vector databases. The list of current supported databases is as follows.
Name |
Description |
---|---|
Azure Cognitive Search |
Microsoft’s cloud search service with built-in AI capabilities that enrich all types of information to help identify and explore relevant content at scale. |
Qdrant |
Qdrant is a vector similarity search engine that provides a production-ready service with a convenient API to store, search and manage points (i.e. vectors) with an additional payload. |
Weaviate |
Weaviate is an open source vector database that stores both objects and vectors. This allows for combining vector search with structured filtering. |
This tool will support more vector databases.
Requirements#
For AzureML users, the tool is installed in default image, you can use the tool without extra installation.
For local users,
pip install promptflow-vectordb
Prerequisites#
The tool searches data from a third-party vector database. To use it, you should create resources in advance and establish connection between the tool and the resource.
Azure Cognitive Search:
Create resource Azure Cognitive Search.
Add “Cognitive search” connection. Fill “API key” field with “Primary admin key” from “Keys” section of created resource, and fill “API base” field with the URL, the URL format is
https://{your_serive_name}.search.windows.net
.
Qdrant:
Follow the installation to deploy Qdrant to a self-maintained cloud server.
Add “Qdrant” connection. Fill “API base” with your self-maintained cloud server address and fill “API key” field.
Weaviate:
Follow the installation to deploy Weaviate to a self-maintained instance.
Add “Weaviate” connection. Fill “API base” with your self-maintained instance address and fill “API key” field.
Inputs#
The tool accepts the following inputs:
Azure Cognitive Search:
Name
Type
Description
Required
connection
CognitiveSearchConnection
The created connection for accessing to Cognitive Search endpoint.
Yes
index_name
string
The index name created in Cognitive Search resource.
Yes
text_field
string
The text field name. The returned text field will populate the text of output.
No
vector_field
string
The vector field name. The target vector is searched in this vector field.
Yes
search_params
dict
The search parameters. It’s key-value pairs. Except for parameters in the tool input list mentioned above, additional search parameters can be formed into a JSON object as search_params. For example, use
{"select": ""}
as search_params to select the returned fields, use{"search": ""}
to perform a hybrid search.No
search_filters
dict
The search filters. It’s key-value pairs, the input format is like
{"filter": ""}
No
vector
list
The target vector to be queried, which can be generated by Embedding tool.
Yes
top_k
int
The count of top-scored entities to return. Default value is 3
No
Qdrant:
Name
Type
Description
Required
connection
QdrantConnection
The created connection for accessing to Qdrant server.
Yes
collection_name
string
The collection name created in self-maintained cloud server.
Yes
text_field
string
The text field name. The returned text field will populate the text of output.
No
search_params
dict
The search parameters can be formed into a JSON object as search_params. For example, use
{"params": {"hnsw_ef": 0, "exact": false, "quantization": null}}
to set search_params.No
search_filters
dict
The search filters. It’s key-value pairs, the input format is like
{"filter": {"should": [{"key": "", "match": {"value": ""}}]}}
No
vector
list
The target vector to be queried, which can be generated by Embedding tool.
Yes
top_k
int
The count of top-scored entities to return. Default value is 3
No
Weaviate:
Name
Type
Description
Required
connection
WeaviateConnection
The created connection for accessing to Weaviate.
Yes
class_name
string
The class name.
Yes
text_field
string
The text field name. The returned text field will populate the text of output.
No
vector
list
The target vector to be queried, which can be generated by Embedding tool.
Yes
top_k
int
The count of top-scored entities to return. Default value is 3
No
Outputs#
The following is an example JSON format response returned by the tool, which includes the top-k scored entities. The entity follows a generic schema of vector search result provided by promptflow-vectordb SDK.
Azure Cognitive Search:
For Azure Cognitive Search, the following fields are populated:
Field Name
Type
Description
original_entity
dict
the original response json from search REST API
score
float
@search.score from the original entity, which evaluates the similarity between the entity and the query vector
text
string
text of the entity
vector
list
vector of the entity
Output
[ { "metadata": null, "original_entity": { "@search.score": 0.5099789, "id": "", "your_text_filed_name": "sample text1", "your_vector_filed_name": [-0.40517663431890405, 0.5856996257406859, -0.1593078462266455, -0.9776269170785785, -0.6145604369828972], "your_additional_field_name": "" }, "score": 0.5099789, "text": "sample text1", "vector": [-0.40517663431890405, 0.5856996257406859, -0.1593078462266455, -0.9776269170785785, -0.6145604369828972] } ]
Qdrant:
For Qdrant, the following fields are populated:
Field Name
Type
Description
original_entity
dict
the original response json from search REST API
metadata
dict
payload from the original entity
score
float
score from the original entity, which evaluates the similarity between the entity and the query vector
text
string
text of the payload
vector
list
vector of the entity
Output
[ { "metadata": { "text": "sample text1" }, "original_entity": { "id": 1, "payload": { "text": "sample text1" }, "score": 1, "vector": [0.18257418, 0.36514837, 0.5477226, 0.73029673], "version": 0 }, "score": 1, "text": "sample text1", "vector": [0.18257418, 0.36514837, 0.5477226, 0.73029673] } ]
Weaviate:
For Weaviate, the following fields are populated:
Field Name
Type
Description
original_entity
dict
the original response json from search REST API
score
float
certainty from the original entity, which evaluates the similarity between the entity and the query vector
text
string
text in the original entity
vector
list
vector of the entity
Output
[ { "metadata": null, "original_entity": { "_additional": { "certainty": 1, "distance": 0, "vector": [ 0.58, 0.59, 0.6, 0.61, 0.62 ] }, "text": "sample text1." }, "score": 1, "text": "sample text1.", "vector": [ 0.58, 0.59, 0.6, 0.61, 0.62 ] } ]