Task 02 - Transcribe a live conversation with the Speech service (30 minutes)

Introduction

The prior task had us build a ConversationTranscriber to perform speech-to-text on an existing WAV file. In this task, we will extend the concept to support live transcription of speech to text from your microphone. This process is similar to transcription of audio files but has an additional complication in Streamlit due to the way its page refresh model works.

Description

In this task, you will extend the functionality you developed in the prior task to support microphone input. You will use a Streamlit Extras stateful button to start and stop the microphone. After you stop the microphone, you will write out all of the transcribed messages, using Streamlit’s session state to hold messages between page refreshes.

The key tasks are as follows:

In the Streamlit dashboard, open the page 2_Call_Center.py. Navigate to the main() function and into the “Perform a Live Call” section. Implement the TODO items to fill out the function call.
Implement the TODO items in the create_live_transcription_request() function to complete this function.
Select the Record button to begin recording. Then, speak into your microphone. You should see transcribed messages appear as print statements in your console. When you are done, select the Record button again to stop recording and print the messages that finished transcribing. Try using a message such as the following: “Hello. My name is Felix Henderson. I recently stayed at The Sunrise Retreat and wanted to let you know that I had a fantastic experience. The breakfast was fantastic and rooms were exquisite. I was really surprised with how well run the entire hotel was. I greatly enjoyed my time there and will definitely be back.”

Streamlit refreshes the page on each interaction, so it is not possible cleanly to await any closing messages and shut down the listening process–each button click immediately stops any running operations and refreshes the page. Because of this, the approach we take is constantly to update session state with transcription results. This allows us to capture transcribed results because we know the cancellation event will not run.

Success Criteria

You have completed the create_live_transcription_request() function.
You are able to transcribe speech from your microphone.

Learning Resources

Solution

Expand this section to view the solution

The code to implement the “Perform a Live Call” section in the main() function is as follows:

  start_recording = stx.button("Record", key="recording_in_progress")
  if start_recording:
      with st.spinner("Transcribing your conversation..."):
          create_live_transcription_request(speech_key, speech_region)

  if 'transcription_results' in st.session_state:
      st.write(st.session_state.transcription_results)
  else:
      print("Nothing in transcription results!")

The create_live_transcription_request() function uses the Azure AI Services Speech service to accept microphone input and perform speech-to-text transcription. It then returns the transcribed text as a list of utterances after you de-select the Record button.

The code for the completed create_live_transcription_request() function is as follows:

# Creates speech configuration with subscription information
speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=speech_region)
speech_config.speech_recognition_language=speech_recognition_language
transcriber = speechsdk.transcription.ConversationTranscriber(speech_config)

done = False

def handle_final_result(evt):
    all_results.append(evt.result.text)
    print(evt.result.text)

all_results = []

def stop_cb(evt: speechsdk.SessionEventArgs):
    """callback that signals to stop continuous transcription upon receiving an event `evt`"""
    print('CLOSING {}'.format(evt))
    nonlocal done
    done = True

# Subscribe to the events fired by the conversation transcriber
transcriber.transcribed.connect(handle_final_result)
transcriber.session_started.connect(lambda evt: print('SESSION STARTED: {}'.format(evt)))
transcriber.session_stopped.connect(lambda evt: print('SESSION STOPPED {}'.format(evt)))
transcriber.canceled.connect(lambda evt: print('CANCELLED {}'.format(evt)))
# stop continuous transcription on either session stopped or canceled events
transcriber.session_stopped.connect(stop_cb)
transcriber.canceled.connect(stop_cb)

transcriber.start_transcribing_async()

# Streamlit refreshes the page on each interaction,
# so a clean start and stop isn't really possible with button presses.
# Instead, we're constantly updating transcription results, so that way,
# when the user clicks the button to stop, we can just stop updating the results.
# This might not capture the final message, however, if the user stops before
# we receive the message--we won't be able to call the stop event.
while not done:
    st.session_state.transcription_results = all_results
    time.sleep(1)

return