What is a Service Mesh?

Before looking at Istio specifically, it is worth stepping back and understanding what a service mesh is and why one might exist in the first place.

A service mesh is a dedicated infrastructure layer that manages communication between the services in your application. It sits between your workloads and the network, handling concerns such as routing, retries, encryption, identity, and telemetry without each application needing to implement those concerns itself.

The Problem a Service Mesh Solves

In a small system with one or two services, service-to-service communication is usually straightforward. As an application grows into many services - often deployed and owned by different teams - a number of cross-cutting concerns start to appear repeatedly:

How do services find each other?
How is traffic encrypted between services?
How do you confirm the identity of the caller?
How do you roll out a new version of a service safely?
How do you handle transient failures, retries, and timeouts consistently?
How do you understand which services are calling which, and how often?

Without a service mesh, each application team typically solves these problems in their own code, often using a different library or framework per language. This leads to inconsistent behaviour, duplicated effort, and policies that are hard to change centrally.

A service mesh moves these concerns out of the application and into the platform.

How a Service Mesh Works

Most service meshes share a similar architecture made up of two parts:

Data plane - lightweight network proxies that sit alongside your workloads and intercept the traffic going in and out of them. The proxies are where the actual work happens: TLS, routing decisions, retries, metrics, and so on.
Control plane - a central component that configures the proxies. Platform and application teams define policy using the mesh’s APIs, and the control plane translates that policy into proxy configuration and distributes it.

Application code does not need to change. From the workload’s point of view it is still making a normal network call. The proxy transparently applies the mesh’s policies on the way out and on the way in.

flowchart LR
    subgraph Pod A
        AppA["App"] --> ProxyA["Proxy"]
    end
    subgraph Pod B
        ProxyB["Proxy"] --> AppB["App"]
    end
    Control["Control Plane"] -. config .-> ProxyA
    Control -. config .-> ProxyB
    ProxyA -->|mTLS| ProxyB

What a Service Mesh Typically Provides

Although features vary between implementations, most service meshes offer capabilities in the following areas:

Traffic management - routing, traffic splitting, canary releases, header-based routing, retries, timeouts, and circuit breaking.
Security - automatic mutual TLS (mTLS) between services, workload identity, and fine-grained authorisation policies that decide which services can call which.
Observability - consistent metrics, distributed traces, and access logs for every call between services, without each team instrumenting their own code.
Resilience - patterns such as retries with backoff, request hedging, and outlier detection that improve the behaviour of the system under load and failure.

Service Mesh vs Ingress Controller

It is easy to confuse a service mesh with an ingress controller. They are related but solve different problems:

An ingress controller is concerned with north-south traffic - requests coming into the cluster from outside.
A service mesh is primarily concerned with east-west traffic - communication between services already inside the cluster.

Many service meshes, including Istio, also provide their own gateway component that can handle north-south traffic, but the defining feature of a mesh is what it does for service-to-service traffic inside the cluster.

When a Service Mesh Makes Sense

A service mesh is most valuable when:

You have many services that communicate with each other.
You need consistent security policies, especially encryption in transit and identity-based authorisation.
You want to release changes safely using techniques such as traffic splitting and canary deployments.
You need clear visibility into how services depend on and call each other.
Multiple teams are deploying into the same platform and you want consistent behaviour across them.

A service mesh is usually not the right first step for:

Very small applications with only a handful of services.
Simple lift-and-shift workloads that do not need advanced traffic or security policies.
Teams that are still learning the basics of Kubernetes - a mesh adds another layer of concepts to understand and operate.

Tip

A service mesh is a powerful tool, but it is not free. It adds new components to operate, new resources to understand, and some runtime overhead. Adopt one when you have a concrete problem it solves, not because it is a popular pattern.

With that context in mind, the next section looks at Istio specifically - one of the most widely used service mesh implementations, and the one available as a managed add-on in AKS.