Check out Part 1 - A Lap Around Azure Media Player if you would like a quick intro before diving in.
Once you start taking advantage of Azure Media Services to deliver audio and video content to your users, you also gain access to an expanding suite of capabilities: Azure Media Analytics). Using artificial intelligence and machine learning technologies, Azure Media Analytics enables a variety of insights and capabilities including indexing, facial and emotion detection, optical character recognition, and time lapsing.
The quickest way to use Azure Media Analytics is through a SaaS application called Video Indexer. This service is free for 10 hours, after which you can connect it to your Azure Media Services account and assume a pricing model dependent on storage and processing speed. Video Indexer includes a number of preprocessed videos, or you can upload your own.
In the sample videos, you can search for terms—like “football”—and get a feel for how the indexing could work for your application needs.
If you select one of the videos from the original sample list, you’ll see even more insights gleaned from that video, including keywords, sentiment analysis, and facial recognition.
While the output is compelling, you might be wondering how to make use of it in the context of your own applications. First, you can connect Video Indexer with your own Media Services account in Azure. Within the Azure portal, you will then have access to the various assets produced by Video Indexer (as well as be able to access its capability there). Below you can clearly see the various outputs of the process including various caption files (with the .vtt, .smi, and .ttml extensions), an audio index file (.aib), and the XML-formatted keywords file.
For this particular video from Vimeo, here’s a snippet of the keywords XML showing the extracted keywords, locations in the clip, and confidence rating.
<rss version="2.0" xmlns:mavis="http://www.microsoft.com/dtds/mavis/')"> <channel> <mavis:keywords>dude crazy running,specific complex handshaking,parallel universe,girls,birthday,kid,month,julian,familiarity,sense,handshake,grabbing,fun,friends,life,friendship,shot</mavis:keywords> <items> <mavis:keyword Content="dude crazy running" Count="1" AvgConfidence="0.90')"> <mavis:keyworddetail Confidence="0.90" Offset="24.86" /> </mavis:keyword> <mavis:keyword Content="specific complex handshaking" Count="1" AvgConfidence="0.80')"> <mavis:keyworddetail Confidence="0.80" Offset="77.79" /> </mavis:keyword> <mavis:keyword Content="parallel universe" Count="1" AvgConfidence="1.00')"> <mavis:keyworddetail Confidence="1.00" Offset="111.74" /> </mavis:keyword> <mavis:keyword Content="girls" Count="1" AvgConfidence="0.85')"> <mavis:keyworddetail Confidence="0.85" Offset="35.46" /> </mavis:keyword> <mavis:keyword Content="birthday" Count="2" AvgConfidence="0.80')"> <mavis:keyworddetail Confidence="0.61" Offset="36.25" /> <mavis:keyworddetail Confidence="1.00" Offset="39.48" /> </mavis:keyword> … </items> </channel> </rss>
In the snippet of XML above, you’ll see that “birthday” appears twice, near the 36 and the 39 second marks. You might use this information to present the user a list of links that take them directly to the portions of the video containing the keyword. If using Video Indexer to play back the video, the
t query parameter can be used to specify the start time, in seconds:
For even more programmatic control, an underlying API drives the functionality exposed by the Azure portal and the Video Indexer. The key method, Get Video Index, returns a JSON document containing all the results of the indexing operations, and other APIs can be used to submit and process videos as well as list and search videos for specific content.
And lastly, from a user experience perspective, the insights widgets and player that you see within the Video Indexer app can be embedded and customized as part of your own application.
Pretty cool stuff, go check it out!