Video Chapters Generation¶
Generate video chapters based on Azure Content Understanding and Azure OpenAI.
%pip install -r ../requirements.txt
Load environment variables¶
from dotenv import load_dotenv
import os
load_dotenv(dotenv_path=".env", override=True)
AZURE_AI_SERVICE_ENDPOINT = os.getenv("AZURE_AI_SERVICE_ENDPOINT")
AZURE_AI_SERVICE_API_VERSION = os.getenv("AZURE_AI_SERVICE_API_VERSION", "2024-12-01-preview")
AZURE_OPENAI_ENDPOINT = os.getenv("AZURE_OPENAI_ENDPOINT")
AZURE_OPENAI_API_VERSION = os.getenv("AZURE_OPENAI_API_VERSION", "2024-08-01-preview")
AZURE_OPENAI_CHAT_DEPLOYMENT_NAME = os.getenv("AZURE_OPENAI_CHAT_DEPLOYMENT_NAME")
AUTHENTICATION_URL = os.getenv("AUTHENTICATION_URL")
File to Analyze¶
from pathlib import Path
VIDEO_LOCATION = Path("../data/FlightSimulator.mp4")
Create a custom analyzer and submit the video to generate the description¶
The custom analyzer schema is defined in ../analyzer_templates/video_content_understanding.json. The main custom field is segmentDescription as we need to get the descriptions of video segments and feed them into chatGPT to generate the scenes and chapters. Adding transcripts will help to increase the accuracy of scenes/chapters segmentation results. To get transcripts, we will need to set thereturnDetails parameter in the config field to True.
In this example, we will use the utility class AzureContentUnderstandingClient to load the analyzer schema from the template file and submit it to Azure Content Understanding service. Then, we will analyze the video to get the segment descriptions and transcripts.
import sys
from pathlib import Path
import json
import uuid
# add the parent directory to the path to use shared modules
parent_dir = Path(Path.cwd()).parent
sys.path.append(
str(parent_dir)
)
from python.content_understanding_client import AzureContentUnderstandingClient
from azure.identity import AzureCliCredential, get_bearer_token_provider
credential = AzureCliCredential()
token_provider = get_bearer_token_provider(credential, AUTHENTICATION_URL)
# The analyzer template is used to define the schema of the output
ANALYZER_TEMPLATE_PATH = "../analyzer_templates/video_content_understanding.json"
ANALYZER_ID = "video_scene_chapter" + "_" + str(uuid.uuid4()) # Unique identifier for the analyzer
# Create the Content Understanding (CU) client
cu_client = AzureContentUnderstandingClient(
endpoint=AZURE_AI_SERVICE_ENDPOINT,
api_version=AZURE_AI_SERVICE_API_VERSION,
token_provider=token_provider,
x_ms_useragent="azure-ai-content-understanding-python/video_chapter_generation", # This header is used for sample usage telemetry, please comment out this line if you want to opt out.
)
# Use the client to create an analyzer
response = cu_client.begin_create_analyzer(
ANALYZER_ID, analyzer_template_path=ANALYZER_TEMPLATE_PATH)
result = cu_client.poll_result(response)
print(json.dumps(result, indent=2))
Use the created analyzer to extract video content¶
It might take some time depending on the video length. Try with short videos to get results faster
# Submit the video for content analysis
response = cu_client.begin_analyze(ANALYZER_ID, file_location=VIDEO_LOCATION)
# Wait for the analysis to complete and get the content analysis result
video_cu_result = cu_client.poll_result(
response, timeout_seconds=3600) # 1 hour timeout for long videos
# Print the content analysis result
print(f"Video Content Understanding result: ", video_cu_result)
# Optional - Delete the analyzer if it is no longer needed
cu_client.delete_analyzer(ANALYZER_ID)
Aggregate video segments to generate video scenes¶
ChatGPT will be used to combine segment descriptions and transcripts into scenes and provide concise descriptions for each scene.
After running this step, you will have a metadata json file of video scenes that can be used to generate video chapters. Each scene has start and end timestamps, short description and corresponding transcripts if available
from python.utility import OpenAIAssistant, generate_scenes
# Create an OpenAI Assistant to interact with Azure OpenAI
openai_assistant = OpenAIAssistant(
aoai_end_point=AZURE_OPENAI_ENDPOINT,
aoai_api_version=AZURE_OPENAI_API_VERSION,
deployment_name=AZURE_OPENAI_CHAT_DEPLOYMENT_NAME,
aoai_api_key=None,
)
# Generate the scenes using the video segment result from Azure Content Understanding
scene_result = generate_scenes(video_cu_result, openai_assistant)
# Write the scene result to a json file
scene_output_json_file = "./scene_results.json"
with open(scene_output_json_file, "w") as f:
f.write(scene_result.model_dump_json(indent=2))
print(f"Scene result is saved to {scene_output_json_file}")
# Print the scene result for the debugging purpose
print(scene_result.model_dump_json(indent=2))
Create video chapters¶
Create video chapters by combining the video scenes with chatGPT. After running this step, you will have a video chapters json file. Each chapter has start and end timestamps, a title and list of scenes that belong to the chapter.
from python.utility import generate_chapters
# Generate the chapters using the scenes result
chapter_result = generate_chapters(scene_result, openai_assistant)
# Write the chapter result to a json file
chapter_output_json_file = "./chapter_results.json"
with open(chapter_output_json_file, "w") as f:
f.write(chapter_result.model_dump_json(indent=2))
print(f"Chapter result is saved to {chapter_output_json_file}")
# Print out the chapter result for the debugging purpose
print(chapter_result)