Skip to main content Link Menu Expand (external link) Document Search Copy Copied

Task 01 - Transcribe a recorded support call with the Speech service (30 minutes)

Introduction

In the previous exercise, we used the Azure AI Services Speech service’s recognize_once_async() method to perform basic speech-to-text for an utterance, sending the resulting text to the Azure OpenAI Service. This approach works for short phrases of up to approximately 15 seconds. For longer recordings or dialogues, we will need to use the transcription option built into the Speech service. In this task, we will enable the Contoso Suites call center to transcribe customer calls in an offline fashion.

Description

In this task, you will make use of your existing Azure AI Services Speech service to work against a new Streamlit application page. You will write code to accept a WAV file upload and then use the ConversationTranscriber to perform speech-to-text transcription.

The key tasks are as follows:

  1. In the Streamlit dashboard, open the page 2_Call_Center.py. Navigate to the main() function and into the “Simulate a Call” section. Implement the TODO items to fill out the function call.
  2. Implement the TODO items in the create_transcription_request() function to complete this function.
  3. Upload the file 01_Customer_Call.wav and perform a transcription. You should see the output of the transcription service as a list of strings.

Success Criteria

  • You have completed the create_transcription_request() function.
  • You are able to upload a WAV file and obtain a transcription.

Learning Resources

Solution

Expand this section to view the solution
  • The code to implement the “Simulate a Call” section in the main() function is as follows:

      uploaded_file = st.file_uploader("Upload an audio file", type="wav")
      if uploaded_file is not None and ('file_transcription' not in st.session_state or st.session_state.file_transcription is False):
          st.audio(uploaded_file, format='audio/wav')
          with st.spinner("Transcribing the call..."):
              all_results = create_transcription_request(uploaded_file, speech_key, speech_region)
              st.session_state.file_transcription_results = all_results
              st.session_state.file_transcription = True
          st.success("Transcription complete!")
    
      if 'file_transcription_results' in st.session_state:
          st.write(st.session_state.file_transcription_results)
    
  • The create_transcription_request() function uses the Azure AI Services Speech service to accept a WAV file as input and perform speech-to-text transcription. It then returns the transcribed text as a list of utterances.

    • The code for the completed create_transcription_request() function is as follows:

      # Create an instance of a speech config with specified subscription key and service region.
      speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=speech_region)
      speech_config.speech_recognition_language=speech_recognition_language
      
      # Prepare audio settings for the wave stream
      channels = 1
      bits_per_sample = 16
      samples_per_second = 16000
      
      # Create audio configuration using the push stream
      wave_format = speechsdk.audio.AudioStreamFormat(samples_per_second, bits_per_sample, channels)
      stream = speechsdk.audio.PushAudioInputStream(stream_format=wave_format)
      audio_config = speechsdk.audio.AudioConfig(stream=stream)
      
      transcriber = speechsdk.transcription.ConversationTranscriber(speech_config, audio_config)
      all_results = []
      
      def handle_final_result(evt):
          all_results.append(evt.result.text)
      
      done = False
      
      def stop_cb(evt):
          print('CLOSING on {}'.format(evt))
          nonlocal done
          done= True
      
      # Subscribe to the events fired by the conversation transcriber
      transcriber.transcribed.connect(handle_final_result)
      transcriber.session_started.connect(lambda evt: print('SESSION STARTED: {}'.format(evt)))
      transcriber.session_stopped.connect(lambda evt: print('SESSION STOPPED {}'.format(evt)))
      transcriber.canceled.connect(lambda evt: print('CANCELED {}'.format(evt)))
      # stop continuous transcription on either session stopped or canceled events
      transcriber.session_stopped.connect(stop_cb)
      transcriber.canceled.connect(stop_cb)
      
      transcriber.start_transcribing_async()
      
      # Read the whole wave files at once and stream it to sdk
      _, wav_data = wavfile.read(audio_file)
      stream.write(wav_data.tobytes())
      stream.close()
      while not done:
          time.sleep(.5)
      
      transcriber.stop_transcribing_async()
      return all_results