Get started with speech

Microsoft’s speech services

The Speech Services by Cognitive Services are the unification of speech-to-text, text-to-speech, and speech-translation into a single Azure subscription. It’s easy to speech enable your applications, tools, and devices with the Speech SDK, Speech Devices SDK, or REST APIs.

Speech architecture

Speech-to-text

Speech-to-text from Azure Speech Services, enables real-time transcription of audio streams into text that your applications, tools, or devices can consume, display, and take action on as command input. This service is powered by the same recognition technology that Microsoft uses for Cortana and Office products, and works seamlessly with the translation and text-to-speech.

Text-to-speech

Text-to-speech from Azure Speech Services is a service that enables your applications, tools, or devices to convert text into natural human-like synthesized speech. Choose from standard and neural voices, or create your own custom voice unique to your product or brand.

See also:

Get started with Custom Speech service

What is Custom Speech? Access Custom Speech portal

Get started with a device and speech

Get Speech Devices SDK and find suitable development kits here.

Testing a speech platform device

The signal paths and architecture used for testing a Microsoft Windows Speech Platform device are described below: Speech testing

Capture streams represent audio signals acquired by integrated microphone(s), and pre-processed for use by a speech recognition engine or keyword spotter. Render streams represent audio signals destined for playback via device speakers or playback accessories, and enable echo cancellation algorithm functionality.

Updated: