AI & ML Academy - Customizable AI Models
Welcome to the AI & ML Academy (AIA) - Customizable AI Models!
In this section, we will go through an overview of AI Customizable AI Models and what services reside within each. There is also a level of customization that can be achieved in each of these pillars through services like Custom Vision and Custom Speech that we will introduce. Finally, we provide some best practices for each of these services.
When to Use Pre-built vs. Custom AI Models
Decision Framework
Before investing resources in custom AI model development, follow this decision framework:
- Evaluate Pre-built Models First
- Test existing Azure AI Services with your data
- Document performance and metrics
- Determine if the performace is acceptable for your use case
- Assess Customization Requirements
- Identify if you need domain-specific concepts
- Determine if you require higher accuracy for specific tasks
- Consider if you need to detect/recognize unique objects/entities
- Calculate Resource Requirements
- Data requirements: Assess your current data collection—do you have sufficient high-quality samples for training? Consider both quantity (typically hundreds or thousands of examples) and diversity of examples. While you can train some models with 50 images, the more samples you have the better model quality.
- Time investment: Estimate hours required for data preparation, labeling, model training, validation, and deployment.
- Technical expertise: Determine if your team has the necessary skills in data science, engineering, and domain knowledge—or if you’ll need to acquire these resources.
- Infrastructure costs: Calculate expenses for compute resources, storage, and potential specialized hardware.
- Ongoing maintenance: Budget for regular model retraining, performance monitoring, and addressing model drift over time.
- Make the Decision
- Use pre-built models if: they meet 80%+ of your needs, you have limited data, or you need quick deployment
- Invest in custom models if: you have domain-specific requirements, sufficient training data, and performance gaps in pre-built solutions
AI Services
Azure AI Services are cloud-based artificial intelligence (AI) services that help you build cognitive intelligence into your applications. They are available as REST APIs, client library SDKs, and user interfaces. You can add cognitive features to your applications, without having AI or data science skills. These AI services enable you to build cognitive solutions that can see, hear, speak, understand, and even make decisions.
Vision
AI Services features (1) the ability to leverage off-the-shelf APIs pre-trained to tag and analyze your images and videos, and (2) customizable models that allow you to train models using your own data.
Best Practices for Vision AI
- Pre-built vs. Custom Decision Process
- Start with Azure Computer Vision API to establish a baseline
- Document specific failure cases and performance metrics
- Only proceed to Custom Vision when clear gaps are identified
- Data Preparation for Custom Vision
- Collect 50+ images per category (more for complex scenarios)
- Ensure diverse examples that represent real-world conditions
- Include variations in lighting, angles, backgrounds, and occlusion
- Label consistently and precisely, especially for object detection
- Training and Optimization
- Use balanced datasets across all categories
- Start with Standard training mode before considering Advanced
- Monitor performance metrics (Precision, Recall, AP) for each class
- Implement iterative improvement by adding images where model fails
- Deployment and Monitoring
- Export models to appropriate format for your deployment target
- Implement A/B testing comparing custom vs. pre-built models
- Set up monitoring for drift and performance degradation
- Schedule regular retraining with new data
Computer Vision
Analyze content in images and video with a turn-key API service
- Vision Studio - Explore functionality by trying each API (requires Azure account)
- Computer Vision Learning Path - Get started analyzing images with the API
Custom Vision
Build Custom Image Classification & Object Detection models for your scenario
- Learning Path for Object Detection - Get started using AI to recognize objects in images using the Custom Vision service
- Learning Path for Image Classification - Get started classifying images using the Custom Vision service
Face API
Detect and identify people in images
- Face API Learning Path - Get started detecting and analyzing faces
- Transparency Note - Understand how Face API works, the choices you can make as System Owner that influence accuracy, and the importance of thinking about the whole system, including the technology, the people, and the environment
Language
AI Services for Language similarly provides pre-trained, pre-configured models to use in a turn-key fashion, as well as customizable services that enable you to leverage your own data with the provided platform and tooling.
Best Practices for Language AI
- Pre-built vs. Custom Decision Process
- Test standard language models with text samples
- Identify domain-specific terminology or concepts that the model do not recognize
- Measure accuracy for key tasks (NER, classification, sentiment)
- Data Preparation for Custom Language Models
- Collect 100+ examples per category/entity type
- Ensure data represents the diversity of language in your domain
- Balance datasets across all categories
- Create clear annotation guidelines for consistent labeling
- Training and Optimization
- Start with simpler models before moving to more complex ones
- Evaluate domain-specific performance metrics
- Test with edge cases and ambiguous examples
- Deployment and Governance
- Implement regular evaluation with human review
- Create feedback loops to collect correction data
- Monitor for bias in model outputs
- Schedule periodic retraining with new examples
With an Azure subscription, navigate to the Language Studio to explore all of the tools offered for Natural Language Processing within Azure AI Services.
Key functionality includes:
- Named Entity Recognition (NER): Identify and categorize named entities from input documents, including names of people, locations, organizations, events, products, addresses, phone numbers, emails, URLs, IP addresses, dates & times, and quantities.
- Custom Named Entity Recognition: Build custom models to extract domain-specific entities.
- Personally identifying (PII) and health (PHI) information detection: Identify, categorize, and redact sensitive information from documents, including names of people, job roles, organizations, addresses, phone numbers, emails, URLs, IP addresses, dates & times, quantities, ABA routing numbers, SWIFT codes, credit card numbers, International Banking Account Numbers, and country/region-specific identification (e.g. U.S. Social Security Numbers).
- Language detection: Determine which language a document is written in.
- Sentiment analysis and opinion mining: Leverage Sentiment Analysis to label text as positive, neutral, or negative. Use Opinion Mining to gather more granular information about sentiment, including the subject the text is referring to as well as the associated opinion or sentiment.
- Summarization: Summarize documents or conversations.
- Key phrase extraction: Identify and extract the main concepts in text.
- Entity linking: Identify entities in text and provide a Wikipedia link for more information.
- Text Analytics for Health: Extract and label medical information from health documents, such as doctor’s notes, discharge summaries, clinical documents, and electronic health records.
- Custom Text Classification: Train your own text classification models using your data.
- Conversational language understanding: Predict what the user’s intent is when they say a particular phrase or sentence so that you can reply accordingly.
- Question answering: Find the most appropriate answer for a user’s question for a conversational client application.
To start getting hands-on, refer to the following resources:
- GitHub Python Samples for Text: Common scenario operations with the Azure Text Analytics client library for Python
- Learning Path for Common Pre-configured Language APIs: Get started extracting insights from text
- Learning Path for Customizable Language solutions: Build custom text classification and custom NER models using the Language service
Speech
- Pre-built vs. Custom Decision Process
- Test standard speech models with audio samples
- Measure Word Error Rate (WER) against your specific use case
- Document accuracy issues with domain-specific terminology, accents, or background noise
- Proceed to custom models only when clear performance gaps exist
- Data Preparation for Custom Language Models
- Collect diverse audio samples (at least 30 minutes for basic customization)
- Ensure audio quality matches expected real-world conditions
- Include representative background noise, accents, and speaking styles
- Prepare accurate transcriptions for training data
- For Custom Neural Voice, record 300-2000 utterances in studio conditions
- Training and Optimization
- Start with acoustic model adaptation for domain-specific terminology
- Consider language model adaptation for specialized vocabulary
- Evaluate models with test sets that represent challenging cases
- Iterate with additional data focused on error cases
- Deployment and Governance
- Implement responsible AI practices, especially for voice synthesis
- Obtain proper consent for voice talent recordings
- For voice synthesis, clearly disclose when AI-generated voices are used
- Monitor for drift in accuracy as usage environments change
- Schedule periodic retraining with new data samples
AI Services for Speech also has both pre-trained models as well as customizable services to convert speech to text, text to speech, as well as speaker recognition, and speech translation.
Speech to Text
- Speech to Text Overview: Explore the capabilities of the STT service.
Text to Speech
- Text to Speech Overview: Explore the capabilities of the TTS service.
Speaker Recognition
- Speaker Recognition Overview: Explore Speaker Recognition and some commonly asked questions.
Speech Translation
- Speech Translation QuickStart: Get hands-on with translating speech from a microphone in the language of your choice.
Custom Neural Voice
- Custom Neural Voice Overview: Explore the different project types for Custom Neural Voice. This page is followed by a great step-by-step guide on how to get started.
- How to Create a Custom Neural Voice: A great tutorial along with best practices.
- Transparency Note and Use Cases: Understanding some key considerations when using Custom Neural Voice
- Guidelines for responsible deployment of synthetic voice technology: Understand how to responsibly use synthetic voice technology.
- Data, privacy, and security for Custom Neural Voice: Understand how the data will be used and processed.
Content Creation
- Speech Studio: Test the features of Content Creation.
- Audio Content Creation: Tutorial on how to convert text to speech using Microsoft AI voices.