# How to test for resilience with Azure Chaos Studio

# Prepare your app for chaos

When you run a mission-critical application, it needs to stay up and running in every circumstance. It could be that its cache is unavailable, that the database fails or that the connection to another service is temporarily blocked. The application should be able to deal with all of these circumstances. But how do you test for that? Azure Chaos Studio (opens new window) can help, by using chaos engineering and fault injection to create real-world chaos. This helps you to understand and improve your application and infrastructure resilience, to keep it up and running when things go wrong.

In this post, we'll use Azure Chaos Studio (opens new window) to target Azure Cache for Redis (opens new window) that is used by an existing website.

# Prerequisites

If you want to follow along, you'll need the following:

# Configure and use Azure Chaos Studio

You can use Azure Chaos Studio to create chaos experiments that test the resilience of your applications. To get started, we first need to make sure that the chaos resource provider is registered for our Azure subscription.

  1. Go to the Azure portal (opens new window)
  2. Click on the menu on the top left and select Subscriptions
  3. Select the Azure subscription that you want to use for Azure Chaos Studio
  4. In the subscription, navigate to the Resource providers menu
  5. In the "Filter by name..." field, enter "Chaos"
  6. You'll find the Microsoft.Chaos resource provider. If the Status is NotRegistered, select it and click Register to register the provider

(Register the Microsoft.Chaos resource provider)

Next, we need to enable targets for Chaos Studio. This enables Azure resources to be directed and monitored by Azure Chaos Studio. Some resources can be enabled directly and others, like VMs, can be enabled with agents.

  1. In the Azure portal, search for Chaos Studio and select the result. This will open Azure Chaos Studio
  2. Click on the Targets menu. Here you'll see the Azure resources that can be enabled and used by Azure Chaos Studio
  3. Select the Azure Cache for Redis that you want to use. Click on Enable targets > Enable service-direct targets
  4. Next, select the Azure Cache for Redis again. This opens the Manage actions menu which allows you to enable the capabilities that Chaos Studio has for the resource
  5. Select Azure Cache for Redis (Reboot) and click Save

(Enable Chaos Studio targets)

Now that we have targets for Chaos Studio, we can create an experiment. A chaos experiment consists of actions that are executed against targets in one or more steps.

  1. In Azure Chaos Studio, click on the Experiments menu
  2. Click Add an experiment
    1. Select a Resource Group
    2. Enter a Name for the experiment
    3. Select a Location
    4. Click Next: Experiment designer
  3. We'll create one step with one branch
    1. Click Add Action and Add Fault after that
    2. Create a new fault by selecting Azure Cache for Redis (Reboot). There are many faults that can be used by Chaos Studio. You can find a list of them here (opens new window)
    3. Select Unknown for rebootType
    4. Click Next: Target resources
  4. Next, select the Azure Cache for Redis as the target for this fault and click Add
  5. Click Create to create the experiment

(Create a Chaos Experiment)

Before we can start the experiment, we need to allow it to execute faults on the target Azure Cache for Redis. This step is built in so that we don't accidentally create chaos in the wrong applications.

  1. Navigate to the Azure Cache for Redis
  2. Click on the Access Control (IAM) menu
  3. Select Add role assignment
  4. We will assign the chaos experiment the Contributor role. So click Contributor and click Next
  5. For Assign access to, select Managed identity
  6. Click Select members
  7. In the Managed identity dropdown, select Chaos Experiment, and select the experiment that we just created
  8. Click Select to add the experiment as a member
  9. Now click Review + assign to assign the role to Azure Cache for Redis

(Add role assignment to Azure Cache for Redis)

Everything is in place to start the chaos experiment.

  1. Go back to Azure Chaos Studio and select the experiment
  2. Click Start to run the experiment
  3. After a while, the experiment results will be visible. Click on the details link to see the results. This shows the steps, branches and faults that were executed and if they succeeded or failed. In our case, the fault rebooted the Azure Cache for Redis

(The experiment ran successfully)

  1. Additionally, you can investigate the resources that are affected by the experiment. In our case, the Azure Cache for Redis is used by a website that runs in an Azure App Service Web App, which has Application Insights enabled. With that, we can see that its availability wasn't affected by the experiment, which is the outcome that we want

(The Web App availability is unaffected)

# Conclusion

Azure Chaos Studio (opens new window) is a great tool to model and execute real-world chaos for your applications by injecting many types of faults (opens new window) into your solution. You can use this to test and improve the resilience of your applications and infrastructure. Go and check it out!