Task 03 - Implement vector search of audio transcriptions and summaries (30 minutes)

Introduction

After seeing you generate a call transcript and summary in the prior tasks of this exercise, the Contoso Suites development team is excited at the prospect of saving these transcripts and then searching through them. They would like to use Cosmos DB to store the transcript and then perform semantic search–that is, searching by concepts or meanings rather than specific words. The primary benefit to semantic search is that requesters can retrieve documents based on the meaning of terms rather than exact term match. This will return documents that the requester might have intended but did not include any of the precise words that appeared in the request. Further, we will search on the abstractive summarization of the transcript, shortening the transcript to a segment of approximately two sentences. Searching the summaries rather than full texts will reduce the number of false positive results when using a vector comparison technique such as cosine similarity for search.

Description

In this task, you will implement functionality to save the text of a call transcription, as well as its embedding (via Azure OpenAI), in a Cosmos DB container. You will then make use of Cosmos DB’s vector similarity capability to perform semantic search in the existing Streamlit application.

Success Criteria

You are saving transcripts and abstractive summaries in Cosmos DB, as well as an embedding of the abstractive summary.
You are able to perform a search on the Call Center Search page, returning call transcripts based on a text query.

Learning Resources

Key Tasks

01: Create call transcripts container

Create a new Cosmos DB container in the ContosoSuites database. Name this container CallTranscripts. It should have a partition key of /call_id and the same vector embedding policy that you used in Exercise 03: a container vector policy using the cosine distance function, based on a float32 data type, and 1536 dimensions, using an index type of diskANN. Designate the field for storing vector data as request_vector.

Expand this section to view the solution

Container vector policies and vector indexing policies must be defined at the time of container creation. In order to create a container, perform the following steps:

In the Azure portal, navigate to your Cosmos DB resource.
Select Data Explorer in the left-hand menu.
On the Data Explorer page, select New Container
In the New Container dialog:
1. Select Use existing under Database id and select the ContosoSuites database from the dropdown list.
2. Enter CallTranscripts into the Container id box.
3. Enter /call_id into the Partition key box.
4. Expand the Container Vectory Policy section of the dialog, select Add vector embedding, and then enter the following values into the specified fields:
  - Path: Enter “/request_vector”.
  - Data type: Select float32.
  - Distance function: Select cosine.
  - Dimensions: Enter 1536. This is based on the number of dimensions generated by the ada-text-embedding-002 model in Azure OpenAI.
  - Index type: Select diskANN. Given the number of dimensions being specified, 1536, the flat index type will not work, as it only supports a maximum of 505 dimensions for vectors. The quantizedFlat index could also be used here. diskANN is a more efficient index type, but given the amount of data we are working with in this lab, you likely will not notice any difference in performance.
5. Select OK to create the container.

02: Make embedding request

Open the 4_Call_Center.py file in the ContosoSuitesDashboard folder. Complete the function make_azure_openai_embedding_request(), accepting a string of text and creating an embedding based on it. Be sure to use the embedding_deployment_name secret, as the GPT-4o model deployment will not be appropriate for generating embeddings.

Expand this section to view the solution

The completed version of the make_azure_openai_embedding_request() function is as follows:

def make_azure_openai_embedding_request(text):
    """Create and return a new embedding request. Key assumptions:
    - Azure OpenAI endpoint, key, and deployment name stored in Streamlit secrets."""

    token_provider = get_bearer_token_provider(
        DefaultAzureCredential(), "https://cognitiveservices.azure.com/.default"
    )
    aoai_endpoint = st.secrets["aoai"]["endpoint"]
    aoai_embedding_deployment_name = st.secrets["aoai"]["embedding_deployment_name"]

    client = openai.AzureOpenAI(
        azure_ad_token_provider=token_provider,
        api_version="2024-06-01",
        azure_endpoint = aoai_endpoint
    )
    # Create and return a new embedding request
    return client.embeddings.create(
        model=aoai_embedding_deployment_name,
        input=text
    )

If the code shows some errors on the DefaultAzureCredential or get_bearer_token_provider you may need to add this line at the top of the file:

from azure.identity import DefaultAzureCredential, get_bearer_token_provider

03: Generate embeddings

Complete the function generate_embeddings_for_call_contents(). This function should normalize the incoming text before calling make_azure_openai_embedding_request(). Then, you should return the actual embeddings–not just the entire response object!.

Expand this section to view the solution

The completed version of the generate_embeddings_for_call_contents() function is as follows:

def generate_embeddings_for_call_contents(call_contents):
    """Generate embeddings for call contents. Key assumptions:
    - Call contents is a single string.
    - Azure OpenAI endpoint, key, and deployment name stored in Streamlit secrets."""

    # Normalize the text for tokenization
    normalized_content = normalize_text(call_contents)

    # Call make_azure_openai_embedding_request() with the normalized content
    response = make_azure_openai_embedding_request(normalized_content)

    return response.data[0].embedding

04: Save transcript

Complete the function save_transcript_to_cosmos_db(). This function should create a new record in the CallTranscripts container.

Expand this section to view the solution

The completed version of the save_transcript_to_cosmos_db() function is as follows:

def save_transcript_to_cosmos_db(transcript_item):
    """Save embeddings to Cosmos DB vector store. Key assumptions:
    - transcript_item is a JSON object containing call_id (int), 
        call_transcript (string), and request_vector (list).
    - Cosmos DB endpoint, client_id, and database name stored in Streamlit secrets."""

    cosmos_client_id = st.secrets["cosmos"]["client_id"]
    cosmos_credentials = DefaultAzureCredential(managed_identity_client_id=cosmos_client_id)

    cosmos_endpoint = st.secrets["cosmos"]["endpoint"]
    cosmos_database_name = st.secrets["cosmos"]["database_name"]
    cosmos_container_name = "CallTranscripts"

    # Create a CosmosClient
    client = CosmosClient(url=cosmos_endpoint, credential=cosmos_credentials)
    # Load the Cosmos database and container
    database = client.get_database_client(cosmos_database_name)
    container = database.get_container_client(cosmos_container_name)

    # Insert the call transcript
    container.create_item(body=transcript_item)

05: Update call center search page

Open the 5_Call_Center_search.py file. Replace the make_azure_openai_embedding_request() stub function with the function you created in 4_Call_Center.py. The function call will be exactly the same between these two files.

Expand this section to view the solution

The completed version of the make_azure_openai_embedding_request() function is as follows:

def make_azure_openai_embedding_request(text):
    """Create and return a new embedding request. Key assumptions:
    - Azure OpenAI endpoint, key, and deployment name stored in Streamlit secrets."""

    token_provider = get_bearer_token_provider(
        DefaultAzureCredential(), "https://cognitiveservices.azure.com/.default"
    )
    aoai_endpoint = st.secrets["aoai"]["endpoint"]
    aoai_embedding_deployment_name = st.secrets["aoai"]["embedding_deployment_name"]

    client = openai.AzureOpenAI(
        azure_ad_token_provider=token_provider,
        api_version="2024-06-01",
        azure_endpoint = aoai_endpoint
    )
    # Create and return a new embedding request
    return client.embeddings.create(
        model=aoai_embedding_deployment_name,
        input=text
    )

06: Make Cosmos DB vector search request

Complete the make_cosmos_db_vector_search_request() function. It should perform a VectorDistance() query and return up to max_results results with a minimum similarity score of minimum_similarity_score. The columns it should return include id, call_id, call_transcript, abstractive_summary, and the results of VectorDistance() as SimilarityScore.

Expand this section to view the solution

The completed version of the make_cosmos_db_vector_search_request() function is as follows:

def make_cosmos_db_vector_search_request(query_embedding, max_results=5,minimum_similarity_score=0.5):
    """Create and return a new vector search request. Key assumptions:
    - Query embedding is a list of floats based on a search string.
    - Cosmos DB endpoint, client_id, and database name stored in Streamlit secrets."""

    cosmos_client_id = st.secrets["cosmos"]["client_id"]
    cosmos_credentials = DefaultAzureCredential(managed_identity_client_id=cosmos_client_id)

    cosmos_endpoint = st.secrets["cosmos"]["endpoint"]
    cosmos_database_name = st.secrets["cosmos"]["database_name"]
    cosmos_container_name = "CallTranscripts"

    # Create a CosmosClient
    client = CosmosClient(url=cosmos_endpoint, credential=cosmos_credentials)
    # Load the Cosmos database and container
    database = client.get_database_client(cosmos_database_name)
    container = database.get_container_client(cosmos_container_name)

    results = container.query_items(
        query=f"""
            SELECT TOP {max_results}
                c.id,
                c.call_id,
                c.call_transcript,
                c.abstractive_summary,
                VectorDistance(c.request_vector, @request_vector) AS SimilarityScore
            FROM c
            WHERE
                VectorDistance(c.request_vector, @request_vector) > {minimum_similarity_score}
            ORDER BY
                VectorDistance(c.request_vector, @request_vector)
            """,
        parameters=[
            {"name": "@request_vector", "value": query_embedding}
        ],
        enable_cross_partition_query=True
    )

    # Create and return a new vector search request
    return results

07: Test and deploy

After filling in these code segments, re-run the application and navigate to the Call Center page. Ensure that you can still generate a transcript of the sample call audio. Then, run the function to save your transcript into Cosmos DB. After that, switch to the Call Center Search page and use the term “Airport Gateway hotel” for your search. Ensure that you get one result back. Then, increase the Minimum Similarity Score slider to a point higher than your similarity score and ensure you get zero records back.

Then, deploy the application and ensure that the functionality behaves as expected as an App Service.