Skip to main content Link Menu Expand (external link) Document Search Copy Copied

Task 02 - Query data from Azure AI Search (20 minutes)

Introduction

The ChatGPT series of models available in Azure OpenAI has been trained on a wide variety of datasets online, but these datasets tend to be publicly available. For private and proprietary datasets, we will need a different approach than to expect ChatGPT has the available information. In this second task of the exercise, we will push data into an Azure storage account and convert it into an Azure AI Search index.

Before our GPT-4 model can reference our proprietary datasets, we first need to vectorize the data–that is, convert data inputs into a collection of floating-point numbers. We can store these vectorized results (also known as embeddings) in a variety of data stores. In this exercise, we will use Azure AI Search. Azure AI Search allows us to store not only the embeddings but also the raw text data. This will allow us to perform hybrid lookup operations–that is, keyword search against the raw text as well as vector search against the embeddings.

Description

In this task, you will load data that Contoso Suites staff has provided to you into Azure Blob Storage. This data contains a summary, in JSON format, of several of their resort properties and hotels located on the resorts. This is an example of the type of data the company would like to use to enhance chat results, so they would like you to incorporate this data into the Azure OpenAI proof of concept. You may find these files in the src/data folder for the repository.

Success Criteria

  • You have created an Azure Blob Storage container and uploaded the Resorts and Hotels data files.
  • You are able to view the files in Azure Blob Storage.
  • You have executed code to generate a new Azure AI Search index.
  • Your front-end application can handle chat against the contoso-suites-faq index.

Key Tasks

01: Import files into storage account

Import the data from Resorts.txt, Hotels.txt, and FAQ.txt into a container named contoso-suites.

Expand this section to view the solution

Make sure you use the storage account you created in exercise 1, as the storage account must be in the same region as Azure AI Search.

One approach to this is as follows:

  1. Navigate to the storage account in the Azure portal.
  2. Select the Containers option from the Data storage menu.
  3. Create a new container using the + Container option. Name the container contoso-suites.
  4. Inside the “contoso-suites” container, select the Upload option and choose each text file.
  5. The files do not need to be in separate folders in the blob storage container.

Alternatively, you may right-click on each file in Visual Studio Code and select Upload to Azure Storage… This is functionality that the Azure Storage extension in Visual Studio Code offers. From there, choose the appropriate storage account and the contoso-suites container. If you have not already created the contoso-suites container, you will need to do so before uploading the first file. Ensure that the files do get uploaded into the storage account before continuing.

02: Build an Azure AI Search index

Create a contoso-suites-faq index in Azure AI Search, built on the three files in the contoso-suites container in your storage account. You should embed with text-embedding-002-ada and support Hybrid (Vector + Keyword) search.

Expand this section to view the solution
  1. In the Azure portal, navigate to the resource group you have created and select the Search service in the resource group.
  2. Navigate to the Overview menu option in the Search service. Then, select the Import and vectorize data menu option.

    Select the Import and vectorize data menu option.

  3. Select Azure Blob Storage as the data connection type.
  4. In the Configure your Azure Blob Storage form, select your subscription, the storage account for this training, and the contoso-suites blob container. Select the checkbox to authenticate using managed identity. Choose User-assigned from the drop-down and select the User-assigned managed identity that was created in your resource group. Then select Next.

    Configure your Azure Blob Storage account.

  5. In the Vectorize your text form, ensure that the kind of service is Azure OpenAI and choose the Azure OpenAI service associated with your resource group. After that, pick text-embedding-ada-002 as the model deployment. Select authentication type User assigned identity and choose your user-assigned managed identity. Check the box acknowledging that connecting to an Azure OpenAI service will incur additional costs and then select Next to continue.

    Select your Azure OpenAI service and the text-embedding-ada-002 model deployment.

  6. On the Vectorize and enrich your images page, select Next without checking any boxes.
  7. On the Advanced settings page, select Next without changing any settings.
  8. On the Review and create page, enter contoso-suites-faq as your object names prefix and then select Create.

    Add contoso-suites-faq as the object name prefix and create the index.

  9. You can navigate to the Indexers page in Search management. Within a minute or two, you should see a Success status and three documents succeeded.

    The Indexers menu option shows that index preparation was successful.

  10. Then, navigate to the Indexes menu option. It may take several minutes for the index to populate, but you should eventually see results.

    The Indexes menu option shows a set of documents.

03: Enable chat against FAQ data

Update the front-end code in src/ContosoSuitesDashboard/ to enable chat operations against the FAQ index.

  1. Create a file called secrets.toml in src/ContosoSuitesDashboard/.streamlit/. Pattern it after secrets.template.toml but include your Azure OpenAI, Azure AI Search, Azure AI Services Speech service, Azure AI Services Language service, Web API endpoint, and Cosmos DB details.
  2. The Contoso Suites team has followed instructions online on how to interact with an Azure OpenAI model deployment. They would like you to extend the code in src/ContosoSuitesDashboard/pages/1_Chat_with_Data.py to ensure any queries make use of the Azure AI Search index you have created rather than relying on the inbuilt knowledge of the default GPT-4 model.
  3. You can test your work by running the following queries. The first two queries should respond with relevant information. The final two queries should not give you a valid response because they ask for information outside the scope of our data.
    1. Which hotels on the island of Curacao have EV stations?
    2. Do I need to pay extra money to use an EV station when I am a guest of the hotel?
    3. How tall is the highest mountain peak in the Alps?
    4. What is the current political situation in Thailand?

If you are using GitHub Codespaces, select the Open in Browser button when you receive an informational message that “Your application running on port 8501 is available.”

Expand this section to view the solution
  1. Create a file called secrets.toml in src/ContosoSuitesDashboard/.streamlit/. Copy the contents of secrets.template.toml as a starting point. Then, fill in the details from the Azure services you deployed.
    1. For Azure OpenAI secrets:
      1. In the Azure portal, find the resource group you created.
      2. Navigate to the Azure OpenAI service in your resource group.
      3. In the Resource Management menu, select the Keys and Endpoint entry. Copy the value of KEY 1 and save it as key in the [aoai] section of your secrets file. Copy the value of Endpoint and save it as endpoint.
    2. For Azure AI Search service secrets:
      1. Return to the resource group and then select your Azure AI Search service.
      2. Copy the value of Url from the Essentials panel and save it as endpoint in the [search] section of your secrets file.

        Select the Azure AI Search service URL and save it to the Secrets file.

      3. In the Settings menu, select the Keys entry. Copy the value of Primary admin key and save it as key in the [search] section of your secrets file.
    3. For Azure AI Speech service secrets:
      1. Return to the resource group and select your Speech service.
      2. In the Resource Management menu, select the Keys and Endpoint entry. Copy the value of KEY 1 and save it as key in the [speech] section of your secrets file. Copy the value of Location/Region and save it as region.
    4. For Azure AI Language service secrets:
      1. Return to the resource group and select your Language service.
      2. In the Resource Management menu, select the Keys and Endpoint entry. Copy the value of KEY 1 and save it as key in the [language] section of your secrets file. Copy the value of Endpoint and save it as endpoint.
    5. For API secrets:
      1. For now, set the value of endpoint to http://localhost:5292. In Exercise 2, when you run the Web API code locally, you will see the URL it uses for hosting. If the hosting port differs from 5292, change your secret to match that hosting port.
      2. Return to the resource group and select the App Service named {your_unique_id}-api.
      3. Copy the value of Default domain and save it for later (and include https:// if it is not there when you copy the value). You will need to change the value of [api][endpoint] to this URL when you deploy the Streamlit application to Azure App Services, so you will need this URL in the next task.
    6. For Cosmos DB secrets:
      1. Return to the resource group and select the Azure Cosmos DB account.
      2. In the Settings menu, navigate to the Keys option. Copy the value of URI and save it as endpoint in the [cosmos] section of your secrets file.
      3. Find and copy the ClientId of the User-Assigned Managed Identity in your resource group. Paste it as the client_id value in the [cosmos] file
  2. Open the file src/ContosoSuitesDashboard/pages/1_Chat_with_Data.py. The code will run as-is, but will not have knowledge of your search index. To support chat with data, make the following changes to the Python script.
    1. Add the search secrets to the create_chat_completion() function, below the Azure OpenAI secrets and above the call to create a client.

        search_endpoint = st.secrets["search"]["endpoint"]
            search_key = st.secrets["search"]["key"]
            search_index_name = st.secrets["search"]["index_name"]
      

      Python is a whitespace-significant language, so you will need to ensure that any code you add is appropriately indented. If you are not familiar with whitespace rules in Python, the Python extension for Visual Studio Code will help track whitespace-related errors.

    2. You may also wish to update the docstring for create_chat_completion() to reference this new assumption.
    3. Change the create_chat_completion() function’s return statement’s chat completion request to one that includes an Azure AI Search data source.

        return client.chat.completions.create(
                model=aoai_deployment_name,
                messages=[
                    {"role": m["role"], "content": m["content"]}
                    for m in messages
                ],
                stream=True,
                extra_body={
                    "data_sources": [
                        {
                            "type": "azure_search",
                            "parameters": {
                                "endpoint": search_endpoint,
                                "index_name": search_index_name,
                                "authentication": {
                                    "type": "api_key",
                                    "key": search_key
                                }
                            }
                        }
                    ]
                }
            )
      
  3. In order to test your code, navigate to the src/ContosoSuitesDashboard/ folder in your terminal. Then, run the following command to begin the Streamlit dashboard.

     python -m streamlit run Index.py
    

If you are using GitHub Codespaces, select the Open in Browser button when you receive an informational message that “Your application running on port 8501 is available.”

Navigate to the Chat with Data page and then ask each of the following questions in turn. In addition, your answers should be fairly similar to the summarized answers below.

  1. Which hotels on the island of Curacao have EV stations?
    1. Answer: Seaside Luxury Resort in Curacao Willemstad and The Executive Suites in Curacao Westpunt
  2. Do I need to pay extra money to use an EV station when I am a guest of the hotel?
    1. Answer: Usage fee will vary by location
  3. How tall is the highest mountain peak in the Alps?
    1. Answer: The requested information is not available in the retrieved data. Please try another query or topic.
  4. What is the current political situation in Thailand?
    1. Answer: The requested information is not available in the retrieved data. Please try another query or topic.