Skip to the content.

Video Annotation Summary For Action Recognition

To create a training or evaluation set for action recognition, the ground truth start/end position of actions in videos needs to be annotated. We looked into various tools for this and the tool we liked most (by far) is called VGG Image Annotator (VIA) written by the VGG group at Oxford.

Instructions For Using VIA Tool

We will now provide a few tips/steps how to use the VIA tool. A fully functioning live demo of the tool can be found here.

Screenshot of VIA Tool

How to use the tool for action recognition:

Scripts for use with the VIA Tool

The VIA tool outputs annotations as a csv file. Often however we need each annotated action to be written as its own clip and into separate files. These clips can then serve as training examples for action recognition models. We provide some scripts to aid in the construction of such datasets:

Annotation Tools Comparison

Below is a list of alternative UIs for annotating actions, however in our opinion the VIA tool is the by far best performer. We distinguish between:

See also the HACS Dataset web page for some examples showing these two types of annotations.

Tool Name Annotation Type Pros Cons Whether Open Source
MuViLab Fixed-length clips annotation <ul><li> Accelerate clip annotation by displaying many clips at the same time</li>
<li> Especially helpful when the actions are sparse</li></ul>
<ul><li> Not useful when the actions are very short (eg a second)</li></ul> Open source on Github
VIA (VGG Image Annotator) segmentations annotation <ul><li> Light-weight, no prerequisite besides downloading a zip file</li>
<li> Actively developed Gitlab project </li>
<li> Support for: annotating video in high precision(on milliseconds and frame), previewing the annotated clips, export start and end time of the actions to csv, annotating multiple actions in different track on the same video </li>
<li> Easy to ramp-up and use</li></ul>
<ul><li> Code can be instabilities, e.g sometimes the tool becomes unresponsive.</li></ul> Open source on Gitlab
ANVIL Segmentations annotation <ul> <li> Support for high precision annotation, export the start and end time.</li></ul> <ul><li> Heavier prerequisite with Java required </li>
<li> Harder to ramp-up compared to VIA with lots of specifications, etc. </li>
<li> Java-related issues can make the tool difficult to run. </li></ul>
Not open source, but free to download
Action Annotation Tool Segmentations annotation <ul><li> Add labels to key frames in video</li>
<li> Support high precision to milliseconds</li></ul>
<ul><li> Much less convenient compared to VIA or ANVIL</li>
<li> Not in active delevepment</li></ul>
Open source on Github

References