Common Kubernetes Issues

Kubernetes can present various error messages and issues that can be confusing, especially for newcomers. This guide covers the most common Kubernetes error messages, explains what they mean, and provides step-by-step solutions to resolve them.

ImagePullBackOff / ErrImagePull

What it means: Kubernetes cannot pull the container image from the registry.

Common causes:

Image name is incorrect or doesn’t exist
Image registry is unreachable
Authentication issues with private registries
Network connectivity problems

How to diagnose:

# Check pod status
kubectl get pods

# Get detailed information
kubectl describe pod <pod-name>

# Check events
kubectl get events --field-selector involvedObject.name=<pod-name>

How to fix:

Verify image name and tag:

# Check if the image exists
docker pull <image-name>

# Verify the image name in your deployment
kubectl get deployment <deployment-name> -o yaml | grep image

Check image registry credentials:

# Create image pull secret for private registries
kubectl create secret docker-registry myregistrykey \
  --docker-server=<registry-server> \
  --docker-username=<username> \
  --docker-password=<password> \
  --docker-email=<email>

# Add to deployment
kubectl patch deployment <deployment-name> -p '{"spec":{"template":{"spec":{"imagePullSecrets":[{"name":"myregistrykey"}]}}}}'

Fix image reference:

# Update deployment with correct image
kubectl set image deployment/<deployment-name> <container-name>=<correct-image>

CrashLoopBackOff

What it means: The container starts but crashes repeatedly, and Kubernetes keeps trying to restart it.

Common causes:

Application exits immediately due to configuration errors
Missing environment variables or configuration
Application cannot bind to specified port
Insufficient resources
Wrong command or entrypoint

How to diagnose:

# Check pod status and restart count
kubectl get pods

# Check container logs
kubectl logs <pod-name> --previous

# Get detailed pod information
kubectl describe pod <pod-name>

How to fix:

Check application logs:

# Check current logs
kubectl logs <pod-name>

# Check previous container logs
kubectl logs <pod-name> --previous

Verify configuration:

# Check environment variables
kubectl describe pod <pod-name> | grep -A 10 "Environment"

# Check mounted volumes
kubectl describe pod <pod-name> | grep -A 10 "Mounts"

Test application locally:

# Run container locally to debug
docker run -it <image-name> /bin/sh

Check resource limits:

# Increase resource limits in deployment
resources:
  limits:
    memory: "512Mi"
    cpu: "500m"
  requests:
    memory: "256Mi"
    cpu: "250m"

Pending

What it means: The pod cannot be scheduled on any node.

Common causes:

Insufficient resources on nodes
Node selector constraints not met
Taints and tolerations preventing scheduling
Persistent volume issues
Image pull secrets missing

How to diagnose:

# Check pod status
kubectl get pods

# Check scheduling events
kubectl describe pod <pod-name>

# Check node resources
kubectl top nodes

# Check node conditions
kubectl get nodes -o wide

How to fix:

Check resource availability:

# Check node resources
kubectl describe nodes

# Check resource quotas
kubectl describe resourcequota -n <namespace>

Review scheduling constraints:

# Check node selectors
kubectl get pod <pod-name> -o yaml | grep -A 5 nodeSelector

# Check taints and tolerations
kubectl describe node <node-name> | grep Taints

Scale cluster or adjust resources:

# Reduce resource requests
kubectl patch deployment <deployment-name> -p '{"spec":{"template":{"spec":{"containers":[{"name":"<container-name>","resources":{"requests":{"cpu":"100m","memory":"128Mi"}}}]}}}}'

Service and Networking Issues

Service Endpoints Not Found

What it means: Service has no endpoints, meaning no pods are backing the service.

Common causes:

No pods match the service selector
Pods are not ready
Label selectors don’t match

How to diagnose:

# Check service endpoints
kubectl get endpoints <service-name>

# Check service selector
kubectl describe service <service-name>

# Check pod labels
kubectl get pods --show-labels

How to fix:

Verify label selectors:

# Check service selector
kubectl get service <service-name> -o yaml | grep -A 3 selector

# Check pod labels
kubectl get pods -l <selector-key>=<selector-value>

Ensure pods are ready:

# Check pod readiness
kubectl get pods -o wide

# Check readiness probe configuration
kubectl describe pod <pod-name>

DNS Resolution Issues

What it means: Pods cannot resolve service names or external domains.

Common causes:

DNS pods not running
Incorrect DNS configuration
Network policies blocking DNS
Service name typos

How to diagnose:

# Check DNS pods
kubectl get pods -n kube-system -l k8s-app=kube-dns

# Test DNS resolution from a pod
kubectl run dns-test --image=busybox --rm -it -- nslookup kubernetes.default.svc.cluster.local

How to fix:

Check DNS pods:

# Restart DNS pods if needed
kubectl delete pods -n kube-system -l k8s-app=kube-dns

Test DNS resolution:

# Test service resolution
kubectl exec -it <pod-name> -- nslookup <service-name>

# Test external resolution
kubectl exec -it <pod-name> -- nslookup google.com

Connection Refused / Connection Timeout

What it means: Network connectivity issues between pods or to services.

Common causes:

Incorrect port configuration
Application not listening on expected port
Network policies blocking traffic
Firewall rules

How to diagnose:

# Check service configuration
kubectl describe service <service-name>

# Test connectivity from within cluster
kubectl run nettest --image=busybox --rm -it -- wget -qO- <service-name>:<port>

# Check if application is listening
kubectl exec -it <pod-name> -- netstat -tulpn

How to fix:

Verify port configuration:

# Check service ports
kubectl get service <service-name> -o yaml

# Check container ports
kubectl get pod <pod-name> -o yaml | grep -A 5 ports

Test application connectivity:

# Port forward to test directly
kubectl port-forward pod/<pod-name> 8080:8080

# Test locally
curl localhost:8080

Storage Issues

VolumeMountError / VolumeAttachError

What it means: Problems mounting or attaching volumes to pods.

Common causes:

Persistent Volume (PV) not available
Incorrect volume configuration
Storage class issues
Node-specific storage problems

How to diagnose:

# Check PVC status
kubectl get pvc

# Check PV status
kubectl get pv

# Check storage class
kubectl get storageclass

# Check pod events
kubectl describe pod <pod-name>

How to fix:

Check PVC and PV status:

# Describe PVC for details
kubectl describe pvc <pvc-name>

# Check if PV is bound
kubectl get pv -o wide

Verify storage class:

# List available storage classes
kubectl get storageclass

# Check default storage class
kubectl get storageclass -o yaml | grep "is-default-class"

No Persistent Volumes Available

What it means: No PV matches the PVC requirements.

Common causes:

No PVs with sufficient capacity
Access mode mismatch
Storage class not found
Node selector constraints

How to fix:

Create appropriate PV:

apiVersion: v1
kind: PersistentVolume
metadata:
  name: example-pv
spec:
  capacity:
    storage: 1Gi
  accessModes:
    - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  storageClassName: standard
  hostPath:
    path: /data

Adjust PVC requirements:

# Edit PVC to match available PVs
kubectl edit pvc <pvc-name>

Resource and Quota Issues

Insufficient CPU/Memory

What it means: Pods cannot be scheduled due to resource constraints.

Common causes:

Resource requests exceed node capacity
Resource quotas exceeded
Resource limits too restrictive

How to diagnose:

# Check node resources
kubectl top nodes

# Check resource quotas
kubectl describe resourcequota

# Check pod resource requests
kubectl describe pod <pod-name> | grep -A 10 "Requests"

How to fix:

Adjust resource requests:

resources:
  requests:
    memory: "64Mi"
    cpu: "50m"
  limits:
    memory: "128Mi"
    cpu: "100m"

Scale cluster:

# Add more nodes (cloud specific)
# For AKS
az aks scale --resource-group <rg> --name <cluster> --node-count 3

Quota Exceeded

What it means: Resource usage exceeds defined quotas.

How to diagnose:

# Check current quota usage
kubectl describe resourcequota

# Check namespace resource usage
kubectl top pods -n <namespace>

How to fix:

Increase quota limits:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: compute-quota
spec:
  hard:
    requests.cpu: "4"
    requests.memory: 8Gi
    limits.cpu: "8"
    limits.memory: 16Gi

Optimize resource usage:

# Reduce resource requests in deployments
kubectl patch deployment <name> -p '{"spec":{"template":{"spec":{"containers":[{"name":"container","resources":{"requests":{"cpu":"100m"}}}]}}}}'

Troubleshooting Checklist

When encountering issues, follow this systematic approach:

Check Pod Status:
```
kubectl get pods -o wide
```

Review Events:

kubectl get events --sort-by=.metadata.creationTimestamp

Examine Logs:
```
kubectl logs <pod-name> --previous
```
Describe Resources:
```
kubectl describe pod <pod-name>
```
Check Resource Usage:
```
kubectl top nodes
kubectl top pods
```
Verify Configuration:
```
kubectl get <resource> -o yaml
```
Test Connectivity:
```
kubectl exec -it <pod-name> -- /bin/sh
```

Prevention Best Practices

Always set resource requests and limits
Use health checks (readiness and liveness probes)
Validate configurations with dry-run
Monitor cluster resources regularly
Use proper image tags (avoid ’latest’)
Implement proper logging and monitoring
Use namespaces for resource isolation
Keep cluster and node images updated

By understanding these common issues and their solutions, you’ll be better equipped to troubleshoot and maintain healthy Kubernetes environments.

Common Kubernetes Issues

Pod-Related Issues

ImagePullBackOff / ErrImagePull

CrashLoopBackOff

Pending

Service and Networking Issues

Service Endpoints Not Found

DNS Resolution Issues

Connection Refused / Connection Timeout

Storage Issues

VolumeMountError / VolumeAttachError

No Persistent Volumes Available

Resource and Quota Issues

Insufficient CPU/Memory

Quota Exceeded

Troubleshooting Checklist

Prevention Best Practices