# Create a Question Answering (QA) System in Under 20 Minutes

This notebook demonstrates how to create a Question Answering (QA) webservice in under 20 minutes. We use Azure Machine Learning ([AzureML](https://azure.microsoft.com/en-us/services/machine-learning-service/)) Service to deploy a pre-trained [AllenNLP model](https://allennlp.org/models
), [BiDAF](https://www.semanticscholar.org/paper/Bidirectional-Attention-Flow-for-Machine-Seo-Kembhavi/007ab5528b3bd310a80d553cccad4b78dc496b02
), using Azure Container Instances ([ACI](https://azure.microsoft.com/en-us/services/container-instances/)).

![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/nlp/examples/question_answering/bidaf_quickstart.png)

## Table of Contents

1. [Deploy Model](#1.-Deploy-Model)
    - [1.1 Link to or Create a Workspace](#1.1-Link-to-or-Create-a-Workspace)
    - [1.2 Register BiDAF model for Deployment](#1.2-Register-BiDAF-model-for-Deployment)  
    - [1.3 Create Scoring Script](#1.3-Create-Scoring-Script)  
    - [1.4 Create a YAML File for the Environment](#1.4-Create-a-YAML-File-for-the-Environment)  
    - [1.5 Image Creation](#1.5-Image-Creation)
    - [1.6 Deploy the Image as a Web Service to Azure Container Instance](#1.6-Deploy-the-Image-as-a-Web-Service-to-Azure-Container-Instance)
    
2. [Test Deployed Webservice](#2.-Test-Deployed-Webservice)
    - [2.1 Real-time Scoring](#2.1-Real-time-Scoring)
    - [2.2 Batch Scoring](#2.2-Batch-Scoring)  
    
3. [Conclusion](#Conclusion)

In [1]:
import sys
sys.path.append("../../")
import json
import urllib
import scrapbook as sb

#import utils
from utils_nlp.common.timer import Timer
from utils_nlp.azureml import azureml_utils

from azureml.core.webservice import AciWebservice, Webservice
from azureml.core.image import ContainerImage
from azureml.core.conda_dependencies import CondaDependencies
from azureml.core.model import Model

In [2]:
CPU_CORES = 1
MEMORY_GB = 8
DEPLOYMENT_PYTHON_VERSION = '3.6.8'
DEPLOYMENT_CONDA_PACKAGES = ['jsonnet','cmake','regex','pytorch','torchvision']
DEPLOYMENT_PIP_PACKAGES = ['allennlp==0.8.4','azureml-sdk==1.0.48']
CONTAINER_TAGS = {'area': "nlp", 'type': "question-answering BiDAF"}
MODEL_TAGS = {"bidaf": "demo"}
config_path = (
    "./.azureml"
)  # Path to the directory containing config.json with azureml credentials

webservice_name = "aci-bidaf-service" #name for webservice; must be unique within your workspace

# Azure resources
subscription_id = "YOUR_SUBSCRIPTION_ID"
resource_group = "YOUR_RESOURCE_GROUP_NAME"  
workspace_name = "YOUR_WORKSPACE_NAME"  
workspace_region = "YOUR_WORKSPACE_REGION" #Possible values eastus, eastus2 and so on.

## 1. Deploy Model

### 1.1 Link to or Create a Workspace

The following cell looks to set up the connection to your [Azure Machine Learning service Workspace](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#workspace). You can choose to connect to an existing workspace or create a new one. 

**To access an existing workspace:**
1. If you have a `config.json` file, you do not need to provide the workspace information; you will only need to update the `config_path` variable that is defined above which contains the file.
2. Otherwise, you will need to supply the following:
    * The name of your workspace
    * Your subscription id
    * The resource group name

**To create a new workspace:**

Set the following information:
* A name for your workspace
* Your subscription id
* The resource group name
* [Azure region](https://azure.microsoft.com/en-us/global-infrastructure/regions/) to create the workspace in, such as `eastus2`. 

This will automatically create a new resource group for you in the region provided if a resource group with the name given does not already exist. 

In [None]:
ws = azureml_utils.get_or_create_workspace(
    config_path=config_path,
    subscription_id=subscription_id,
    resource_group=resource_group,
    workspace_name=workspace_name,
    workspace_region=workspace_region,
)

In [4]:
print('Workspace name: ' + ws.name, 
      'Azure region: ' + ws.location, 
      'Subscription id: ' + ws.subscription_id, 
      'Resource group: ' + ws.resource_group, sep='\n')

### 1.2 Register BiDAF model for Deployment

This step downloads the pre-trained [AllenNLP](https://allennlp.org/models) pretrained model and registers the model in our Workspace. The pre-trained AllenNLP model we use is called Bidirectional Attention Flow for Machine Comprehension ([BiDAF](https://www.semanticscholar.org/paper/Bidirectional-Attention-Flow-for-Machine-Seo-Kembhavi/007ab5528b3bd310a80d553cccad4b78dc496b02
)) It achieved state-of-the-art performance on the [SQuAD](https://rajpurkar.github.io/SQuAD-explorer/) dataset in 2017 and is a well-respected, performant baseline for QA. AllenNLP's pre-trained BIDAF model is trained on the SQuAD training set and achieves an EM score of 68.3 on the SQuAD development set. See the [BIDAF deep dive notebook](https://github.com/microsoft/nlp-recipes/examples/question_answering/bidaf_deep_dive.ipynb
) for more information on this algorithm and AllenNLP implementation.

In [5]:
bidaf_model_url = 'https://s3-us-west-2.amazonaws.com/allennlp/models/bidaf-model-2017.09.15-charpad.tar.gz'
urllib.request.urlretrieve(bidaf_model_url, filename="bidaf.tar.gz")
!tar xvzf bidaf.tar.gz

x config.json
x vocabulary/
x vocabulary/non_padded_namespaces.txt
x vocabulary/tokens.txt
x weights.th


Registering a model means registering one or more files that make up a model (in our case, we register all the files contained in the downloaded .tar.gz file). Here we demonstrate how to register a model using the AzureML SDK, but see the [model registration](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-deploy-and-where#registermodel
) documentation for other registration methods.


**Note**: If you have already registered the model, you need not re-register it. Rather, just retrieve the pre-existing model in your Workspace with `bidaf_model = Model(ws, name='bidaf')`

In [6]:
bidaf_model = Model.register(workspace = ws,
                       model_path ="bidaf.tar.gz",
                       model_name = "bidaf",
                       tags = MODEL_TAGS,
                       description = "BiDAF Pretrained Model")

Registering model bidaf


### 1.3 Create Scoring Script

In this section we show an example of an entry script, score.py, which is called from the deployed webservice. The script must contain:

1. init() - This function loads the model in a global object.  
2. run() - This function is used for model prediction. The inputs and outputs to run() typically use JSON for serialization and deserilization. 

Our scoring script allows for both real-time and batch prediction. Each observation is a dictionary with two keys: _question_ and _passage_. With batch prediction we pass in a list of observations and use AllenNLPs `predict_batch_json()` method. For real-time prediction we pass in a single observation and use AllenNLPs `predict()` method.

In [7]:
%%writefile score.py
import json
from allennlp.predictors import Predictor
from azureml.core.model import Model

def init():
    global model
    bidaf_dir_path = Model.get_model_path('bidaf')
    model = Predictor.from_path(bidaf_dir_path)

def run(rawdata):
    try:
        data = json.loads(rawdata)
        
        # if one question-passage pair was passed
        if type(data) == dict:
            passage = data['passage']
            question = data['question']
            result = model.predict(question, passage)["best_span_str"]
        
        # if multiple question-passage pairs were passed
        elif type(data) == list:
            result = model.predict_batch_json(data)
            result = [i["best_span_str"] for i in result]

    except Exception as e:
        result = str(e)
        return json.dumps({"error": result})
    return json.dumps({"result":result})

Overwriting score.py


### 1.4 Create a YAML File for the Environment 

To ensure the fit results are consistent with the training results, the SDK dependency versions need to be the same as the environment that trains the model. The following cells create a file, bidafenv.yml, which specifies the dependencies from the run.

In [8]:
myenv = CondaDependencies.create(conda_packages= DEPLOYMENT_CONDA_PACKAGES,
                                 pip_packages= DEPLOYMENT_PIP_PACKAGES, 
                                 python_version = DEPLOYMENT_PYTHON_VERSION)
myenv.add_channel('conda-forge')
myenv.add_channel('pytorch')

conda_env_file_name = 'bidafenv.yml'
myenv.save_to_file('.', conda_env_file_name)

'bidafenv.yml'

### 1.5 Image Creation

In this step we create a container image which is wrapper containing the entry script, yaml file with package dependencies and the model. The created image is then deployed as a webservice in the next step. This step can take up to 10 minutes and even longer if the model is large.

In [9]:
image_config = ContainerImage.image_configuration(execution_script = "score.py",
                                                  runtime = "python",
                                                  conda_file = conda_env_file_name,
                                                  description = "Image with BiDAF model",
                                                  tags = CONTAINER_TAGS)

image = ContainerImage.create(name = "bidaf-image",
                              models = [bidaf_model],
                              image_config = image_config,
                              workspace = ws)

image.wait_for_creation(show_output = True)

Creating image
Running.......................................................................................................................................
Succeeded
Image creation operation finished for image bidaf-image:36, operation "Succeeded"


If the above step fails, then use the below command to see logs

In [10]:
# print(image.image_build_log_uri)

### 1.6 Deploy the Image as a Web Service to Azure Container Instance

Azure Container Instances are mostly used for deploying your models as a web service if one or more of the following conditions are true:  
1. You need to quickly deploy and validate your model.
2. You are testing a model that is under development.  


To set them up properly, we need to indicate the number of CPU cores and the amount of memory we want to allocate to our web service.

In [11]:
#Set the web service configuration
aci_config = AciWebservice.deploy_configuration(cpu_cores = CPU_CORES, 
                                               memory_gb = MEMORY_GB)

The final step to deploying our web service is to call WebService.deploy_from_image(). This function uses the Docker image and the deployment configuration we created above to perform the following:  
1. Deploy the docker image to an Azure Container Instance
2. Call the init() function in our scoring file
3. Provide an HTTP endpoint for scoring calls  

The deploy_from_image method requires the following parameters:
1. workspace: the workspace containing the service
2. name: a unique name used to identify the service in the workspace
3. image: a docker image object that contains the environment needed for scoring/inference
4. deployment_config: a configuration object describing the compute type

**Note**: The web service creation can take a few minutes

In [12]:
# deploy image as web service
aci_service = Webservice.deploy_from_image(workspace = ws, 
                                           name = webservice_name,
                                           image = image,
                                           deployment_config = aci_config)

aci_service.wait_for_deployment(show_output = True)
print(aci_service.state)

Creating service
Running.............................................
SucceededACI service creation operation finished, operation "Succeeded"
Healthy


Fetch logs to debug in case of failures.

In [13]:
# print(aci_service.get_logs())

If you want to reuse an existing service versus creating a new one, call the webservice with the name of the service. You can look up all the deployed webservices under deployment in the Azure Portal. Below is an example:

In [14]:
# aci_service = Webservice(workspace=ws, name='<<serive-name>>')

# to use the webservice
# aci_service.run()

**Conclusion**: Now we have a deployed webservice and deploying the model took less than 20 minutes!

## 2. Test Deployed Webservice

Depending on the needs of our QA system, we can either do real-time or batch scoring. We show an example of both types of scoring below using the following example [passage](https://www.semanticscholar.org/paper/Bidirectional-Attention-Flow-for-Machine-Seo-Kembhavi/007ab5528b3bd310a80d553cccad4b78dc496b02) and questions:

In [15]:
passage = "Machine Comprehension (MC), answering questions about a given context, \
requires modeling complex interactions between the context and the query. Recently,\
attention mechanisms have been successfully extended to MC. Typically these mechanisms\
use attention to summarize the query and context into a single vector, couple \
attentions temporally, and often form a uni-directional attention. In this paper \
we introduce the Bi-Directional Attention Flow (BIDAF) network, a multi-stage \
hierarchical process that represents the context at different levels of granularity \
and uses a bi-directional attention flow mechanism to achieve a query-aware context \
representation without early summarization. Our experimental evaluations show that \
our model achieves the state-of-the-art results in Stanford QA (SQuAD) and\
CNN/DailyMail Cloze Test datasets."

question1 = "What is BIDAF?"
question2 = "What datasets does BIDAF achieve state-of-the-art results on?"
question3 = "What do attention mechanisms do?"

### 2.1 Real-time Scoring

We prepare data for predicting answers for one passage-question pair by creating a dictionary with _question_ and _passage_ keys

In [16]:
data = {"passage": passage, "question":question1}
data = json.dumps(data)

In [17]:
with Timer() as t:
    score = aci_service.run(input_data=data)
    t.stop()
    print("Time elapsed: {}".format(t))
    
result = json.loads(score)
try:
    output = result["result"]
    sb.glue("answer", output)
    print("Answer:", output)
except:
    print(result["error"])

Time elapsed: 0.5916
Answer: Bi-Directional Attention Flow


We see that the model responded to the question "What is BiDAF?" with "Bi-Directional Attention Flow".

### 2.2 Batch Scoring

We prepare the data for batch scoring by creating a list of dictionaries with _passage_ and _question_ keys.

In [18]:
data_multiple = [{"passage": passage, "question":i} for i in [question1, question2, question3]]
data_multiple = json.dumps(data_multiple)

In [19]:
with Timer() as t:
    score = aci_service.run(input_data=data_multiple)
    t.stop()
    print("Time elapsed: {}".format(t))
    
result = json.loads(score)
try:
    output = result["result"]
    print(output)
except:
    print(result["error"])

Time elapsed: 0.5267
['Bi-Directional Attention Flow', 'Stanford QA (SQuAD) andCNN/DailyMail Cloze Test', 'have been successfully extended to MC']


We see that the model responded to the question "What is BiDAF?" with "Bi-Directional Attention Flow", the question "What datasets does BIDAF achieve state-of-the-art results on?" with "Stanford QA (SQuAD) and CNN/DailyMail Cloze Test", and the question "What do attention mechanisms do?" with "summarize the query and context into a single vector, couple attentions temporally, and often form a uni-directional attention". All these answers make sense given the passage and demonstrate that the AllenNLP pre-trained model is a good model for a deployed QA system. 

## Conclusion

This notebook demonstrated how to produce a fast QA service in under 20 minutes using Azure Container Instances (ACI). We deployed a popular pre-trained model, BiDAF, provided by AllenNLP, which was state-of-the-art in 2017 and performs well on our example queries. 