AI & ML Academy - Customizable AI Models

Welcome to the AI & ML Academy (AIA) - Customizable AI Models!

In this section, we will go through an overview of AI Customizable AI Models and what services reside within each. There is also a level of customization that can be achieved in each of these pillars through services like Custom Vision and Custom Speech that we will introduce. Finally, we provide some best practices for each of these services.

AI Services

Azure AI Services are cloud-based artificial intelligence (AI) services that help you build cognitive intelligence into your applications. They are available as REST APIs, client library SDKs, and user interfaces. You can add cognitive features to your applications, without having AI or data science skills. These AI services enable you to build cognitive solutions that can see, hear, speak, understand, and even make decisions.


AI Services features (1) the ability to leverage off-the-shelf APIs pre-trained to tag and analyze your images and videos, and (2) customizable models that allow you to train models using your own data.

Computer Vision

Analyze content in images and video with a turn-key API service

Custom Vision

Build Custom Image Classification & Object Detection models for your scenario

Face API

Detect and identify people in images

  • Face API Learning Path - Get started detecting and analyzing faces
  • Transparency Note - Understand how Face API works, the choices you can make as System Owner that influence accuracy, and the importance of thinking about the whole system, including the technology, the people, and the environment


Cognitive Services for Language similarly provides pre-trained, pre-configured models to use in a turn-key fashion, as well as customizable services that enable you to leverage your own data with the provided platform and tooling.

With an Azure subscription, navigate to the Language Studio to explore all of the tools offered for Natural Language Processing within Azure AI Services.

Key functionality includes:

  • Named Entity Recognition (NER): Identify and categorize named entities from input documents, including names of people, locations, organizations, events, products, addresses, phone numbers, emails, URLs, IP addresses, dates & times, and quantities.
  • Personally identifying (PII) and health (PHI) information detection: Identify, categorize, and redact sensitive information from documents, including names of people, job roles, organizations, addresses, phone numbers, emails, URLs, IP addresses, dates & times, quantities, ABA routing numbers, SWIFT codes, credit card numbers, International Banking Account Numbers, and country/region-specific identification (e.g. U.S. Social Security Numbers).
  • Language detection: Determine which language a document is written in.
  • Sentiment analysis and opinion mining: Leverage Sentiment Analysis to label text as positive, neutral, or negative. Use Opinion Mining to gather more granular information about sentiment, including the subject the text is referring to as well as the associated opinion or sentiment.
  • Summarization: Summarize documents or conversations.
  • Key phrase extraction: Identify and extract the main concepts in text.
  • Entity linking: Identify entities in text and provide a Wikipedia link for more information.
  • Text Analytics for Health: Extract and label medical information from health documents, such as doctor’s notes, discharge summaries, clinical documents, and electronic health records.
  • Custom Text Classification: Train your own text classification models using your data.
  • Conversational language understanding: Predict what the user’s intent is when they say a particular phrase or sentence so that you can reply accordingly.
  • Question answering: Find the most appropriate answer for a user’s question for a conversational client application.

To start getting hands-on, refer to the following resources:


Cognitive Services for Speech also has both pre-trained models as well as customizable services to convert speech to text, text to speech, as well as speaker recognition, and speech translation.

Speech to Text

Text to Speech

Speaker Recognition

Speech Translation

Custom Neural Voice

Content Creation