Task 01 - Transcribe a recorded support call with the Speech service (30 minutes)
Introduction
In the previous exercise, we used the Azure AI Services Speech service’s recognize_once_async()
method to perform basic speech-to-text for an utterance, sending the resulting text to the Azure OpenAI Service. This approach works for short phrases of up to approximately 15 seconds. For longer recordings or dialogues, we will need to use the transcription option built into the Speech service. In this task, we will enable the Contoso Suites call center to transcribe customer calls in an offline fashion.
Description
In this task, you will make use of your existing Azure AI Services Speech service to work against a new Streamlit application page. You will write code to accept a WAV file upload and then use the ConversationTranscriber
to perform speech-to-text transcription.
The key tasks are as follows:
- In the Streamlit dashboard, open the page
2_Call_Center.py
. Navigate to themain()
function and into the “Simulate a Call” section. Implement the TODO items to fill out the function call. - Implement the TODO items in the
create_transcription_request()
function to complete this function. - Upload the file
01_Customer_Call.wav
and perform a transcription. You should see the output of the transcription service as a list of strings.
Success Criteria
- You have completed the
create_transcription_request()
function. - You are able to upload a WAV file and obtain a transcription.
Learning Resources
- st.file_uploader
- st.audio
- st.spinner
- st.success
- Streamlit Session State
- How to use the audio input stream
- AudioConfig Class
- AI Services Speech SDK Transcription sample
Solution
Expand this section to view the solution
-
The code to implement the “Simulate a Call” section in the
main()
function is as follows:uploaded_file = st.file_uploader("Upload an audio file", type="wav") if uploaded_file is not None and ('file_transcription' not in st.session_state or st.session_state.file_transcription is False): st.audio(uploaded_file, format='audio/wav') with st.spinner("Transcribing the call..."): all_results = create_transcription_request(uploaded_file, speech_key, speech_region) st.session_state.file_transcription_results = all_results st.session_state.file_transcription = True st.success("Transcription complete!") if 'file_transcription_results' in st.session_state: st.write(st.session_state.file_transcription_results)
-
The
create_transcription_request()
function uses the Azure AI Services Speech service to accept a WAV file as input and perform speech-to-text transcription. It then returns the transcribed text as a list of utterances.-
The code for the completed
create_transcription_request()
function is as follows:# Create an instance of a speech config with specified subscription key and service region. speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=speech_region) speech_config.speech_recognition_language=speech_recognition_language # Prepare audio settings for the wave stream channels = 1 bits_per_sample = 16 samples_per_second = 16000 # Create audio configuration using the push stream wave_format = speechsdk.audio.AudioStreamFormat(samples_per_second, bits_per_sample, channels) stream = speechsdk.audio.PushAudioInputStream(stream_format=wave_format) audio_config = speechsdk.audio.AudioConfig(stream=stream) transcriber = speechsdk.transcription.ConversationTranscriber(speech_config, audio_config) all_results = [] def handle_final_result(evt): all_results.append(evt.result.text) done = False def stop_cb(evt): print('CLOSING on {}'.format(evt)) nonlocal done done= True # Subscribe to the events fired by the conversation transcriber transcriber.transcribed.connect(handle_final_result) transcriber.session_started.connect(lambda evt: print('SESSION STARTED: {}'.format(evt))) transcriber.session_stopped.connect(lambda evt: print('SESSION STOPPED {}'.format(evt))) transcriber.canceled.connect(lambda evt: print('CANCELED {}'.format(evt))) # stop continuous transcription on either session stopped or canceled events transcriber.session_stopped.connect(stop_cb) transcriber.canceled.connect(stop_cb) transcriber.start_transcribing_async() # Read the whole wave files at once and stream it to sdk _, wav_data = wavfile.read(audio_file) stream.write(wav_data.tobytes()) stream.close() while not done: time.sleep(.5) transcriber.stop_transcribing_async() return all_results
-