Azure Batch Service

Background

Azure Batch Service is a cloud-scale job-scheduling service. Users can submit a parallel job consisting of multiple tasks with a given set of dependencies and Azure Batch Service will execute them on Azure, in dependency order, exploiting as much parallelism as possible between independent tasks. Batch is a popular service, managing over hundreds of thousands of VMs on Azure.

Integrating scheduling with virtual machine (VM) management, Azure Batch Service supports auto-scaling the number of VMs created, spinning up or down according to the needs of the job. This differs from many other schedulers—like Yarn or Mesos, for example—that must be installed on a pre-created set of VMs.

Challenge

The Batch team wanted to invest in a new microservices-based architecture that would reliably scale to meet the demands of the service. The complex responsive design demanded that each microservice be able to:

Solution

Coding three of their core microservices with Coyote, the team used Coyote’s state machines programming model for fully asynchronous, non-blocking computation. The team also wrote detailed functional specifications—as well as models of external services—to allow for exhaustive testing of concurrent behaviors and failures. These services totalled to more than 100,000 lines of code.

Coyote’s key advantages

The Batch team reported several key advantages of developing and testing their code using Coyote.

Read more about it in this SoCC‘21 paper.