Common Kubernetes Issues

Kubernetes can present various error messages and issues that can be confusing, especially for newcomers. This guide covers the most common Kubernetes error messages, explains what they mean, and provides step-by-step solutions to resolve them.

ImagePullBackOff / ErrImagePull

What it means: Kubernetes cannot pull the container image from the registry.

Common causes:

  • Image name is incorrect or doesn’t exist
  • Image registry is unreachable
  • Authentication issues with private registries
  • Network connectivity problems

How to diagnose:

# Check pod status
kubectl get pods

# Get detailed information
kubectl describe pod <pod-name>

# Check events
kubectl get events --field-selector involvedObject.name=<pod-name>

How to fix:

  1. Verify image name and tag:

    # Check if the image exists
    docker pull <image-name>
    
    # Verify the image name in your deployment
    kubectl get deployment <deployment-name> -o yaml | grep image
  2. Check image registry credentials:

    # Create image pull secret for private registries
    kubectl create secret docker-registry myregistrykey \
      --docker-server=<registry-server> \
      --docker-username=<username> \
      --docker-password=<password> \
      --docker-email=<email>
    
    # Add to deployment
    kubectl patch deployment <deployment-name> -p '{"spec":{"template":{"spec":{"imagePullSecrets":[{"name":"myregistrykey"}]}}}}'
  3. Fix image reference:

    # Update deployment with correct image
    kubectl set image deployment/<deployment-name> <container-name>=<correct-image>

CrashLoopBackOff

What it means: The container starts but crashes repeatedly, and Kubernetes keeps trying to restart it.

Common causes:

  • Application exits immediately due to configuration errors
  • Missing environment variables or configuration
  • Application cannot bind to specified port
  • Insufficient resources
  • Wrong command or entrypoint

How to diagnose:

# Check pod status and restart count
kubectl get pods

# Check container logs
kubectl logs <pod-name> --previous

# Get detailed pod information
kubectl describe pod <pod-name>

How to fix:

  1. Check application logs:

    # Check current logs
    kubectl logs <pod-name>
    
    # Check previous container logs
    kubectl logs <pod-name> --previous
  2. Verify configuration:

    # Check environment variables
    kubectl describe pod <pod-name> | grep -A 10 "Environment"
    
    # Check mounted volumes
    kubectl describe pod <pod-name> | grep -A 10 "Mounts"
  3. Test application locally:

    # Run container locally to debug
    docker run -it <image-name> /bin/sh
  4. Check resource limits:

    # Increase resource limits in deployment
    resources:
      limits:
        memory: "512Mi"
        cpu: "500m"
      requests:
        memory: "256Mi"
        cpu: "250m"

Pending

What it means: The pod cannot be scheduled on any node.

Common causes:

  • Insufficient resources on nodes
  • Node selector constraints not met
  • Taints and tolerations preventing scheduling
  • Persistent volume issues
  • Image pull secrets missing

How to diagnose:

# Check pod status
kubectl get pods

# Check scheduling events
kubectl describe pod <pod-name>

# Check node resources
kubectl top nodes

# Check node conditions
kubectl get nodes -o wide

How to fix:

  1. Check resource availability:

    # Check node resources
    kubectl describe nodes
    
    # Check resource quotas
    kubectl describe resourcequota -n <namespace>
  2. Review scheduling constraints:

    # Check node selectors
    kubectl get pod <pod-name> -o yaml | grep -A 5 nodeSelector
    
    # Check taints and tolerations
    kubectl describe node <node-name> | grep Taints
  3. Scale cluster or adjust resources:

    # Reduce resource requests
    kubectl patch deployment <deployment-name> -p '{"spec":{"template":{"spec":{"containers":[{"name":"<container-name>","resources":{"requests":{"cpu":"100m","memory":"128Mi"}}}]}}}}'

Service and Networking Issues

Service Endpoints Not Found

What it means: Service has no endpoints, meaning no pods are backing the service.

Common causes:

  • No pods match the service selector
  • Pods are not ready
  • Label selectors don’t match

How to diagnose:

# Check service endpoints
kubectl get endpoints <service-name>

# Check service selector
kubectl describe service <service-name>

# Check pod labels
kubectl get pods --show-labels

How to fix:

  1. Verify label selectors:

    # Check service selector
    kubectl get service <service-name> -o yaml | grep -A 3 selector
    
    # Check pod labels
    kubectl get pods -l <selector-key>=<selector-value>
  2. Ensure pods are ready:

    # Check pod readiness
    kubectl get pods -o wide
    
    # Check readiness probe configuration
    kubectl describe pod <pod-name>

DNS Resolution Issues

What it means: Pods cannot resolve service names or external domains.

Common causes:

  • DNS pods not running
  • Incorrect DNS configuration
  • Network policies blocking DNS
  • Service name typos

How to diagnose:

# Check DNS pods
kubectl get pods -n kube-system -l k8s-app=kube-dns

# Test DNS resolution from a pod
kubectl run dns-test --image=busybox --rm -it -- nslookup kubernetes.default.svc.cluster.local

How to fix:

  1. Check DNS pods:

    # Restart DNS pods if needed
    kubectl delete pods -n kube-system -l k8s-app=kube-dns
  2. Test DNS resolution:

    # Test service resolution
    kubectl exec -it <pod-name> -- nslookup <service-name>
    
    # Test external resolution
    kubectl exec -it <pod-name> -- nslookup google.com

Connection Refused / Connection Timeout

What it means: Network connectivity issues between pods or to services.

Common causes:

  • Incorrect port configuration
  • Application not listening on expected port
  • Network policies blocking traffic
  • Firewall rules

How to diagnose:

# Check service configuration
kubectl describe service <service-name>

# Test connectivity from within cluster
kubectl run nettest --image=busybox --rm -it -- wget -qO- <service-name>:<port>

# Check if application is listening
kubectl exec -it <pod-name> -- netstat -tulpn

How to fix:

  1. Verify port configuration:

    # Check service ports
    kubectl get service <service-name> -o yaml
    
    # Check container ports
    kubectl get pod <pod-name> -o yaml | grep -A 5 ports
  2. Test application connectivity:

    # Port forward to test directly
    kubectl port-forward pod/<pod-name> 8080:8080
    
    # Test locally
    curl localhost:8080

Storage Issues

VolumeMountError / VolumeAttachError

What it means: Problems mounting or attaching volumes to pods.

Common causes:

  • Persistent Volume (PV) not available
  • Incorrect volume configuration
  • Storage class issues
  • Node-specific storage problems

How to diagnose:

# Check PVC status
kubectl get pvc

# Check PV status
kubectl get pv

# Check storage class
kubectl get storageclass

# Check pod events
kubectl describe pod <pod-name>

How to fix:

  1. Check PVC and PV status:

    # Describe PVC for details
    kubectl describe pvc <pvc-name>
    
    # Check if PV is bound
    kubectl get pv -o wide
  2. Verify storage class:

    # List available storage classes
    kubectl get storageclass
    
    # Check default storage class
    kubectl get storageclass -o yaml | grep "is-default-class"

No Persistent Volumes Available

What it means: No PV matches the PVC requirements.

Common causes:

  • No PVs with sufficient capacity
  • Access mode mismatch
  • Storage class not found
  • Node selector constraints

How to fix:

  1. Create appropriate PV:

    apiVersion: v1
    kind: PersistentVolume
    metadata:
      name: example-pv
    spec:
      capacity:
        storage: 1Gi
      accessModes:
        - ReadWriteOnce
      persistentVolumeReclaimPolicy: Retain
      storageClassName: standard
      hostPath:
        path: /data
  2. Adjust PVC requirements:

    # Edit PVC to match available PVs
    kubectl edit pvc <pvc-name>

Resource and Quota Issues

Insufficient CPU/Memory

What it means: Pods cannot be scheduled due to resource constraints.

Common causes:

  • Resource requests exceed node capacity
  • Resource quotas exceeded
  • Resource limits too restrictive

How to diagnose:

# Check node resources
kubectl top nodes

# Check resource quotas
kubectl describe resourcequota

# Check pod resource requests
kubectl describe pod <pod-name> | grep -A 10 "Requests"

How to fix:

  1. Adjust resource requests:

    resources:
      requests:
        memory: "64Mi"
        cpu: "50m"
      limits:
        memory: "128Mi"
        cpu: "100m"
  2. Scale cluster:

    # Add more nodes (cloud specific)
    # For AKS
    az aks scale --resource-group <rg> --name <cluster> --node-count 3

Quota Exceeded

What it means: Resource usage exceeds defined quotas.

How to diagnose:

# Check current quota usage
kubectl describe resourcequota

# Check namespace resource usage
kubectl top pods -n <namespace>

How to fix:

  1. Increase quota limits:

    apiVersion: v1
    kind: ResourceQuota
    metadata:
      name: compute-quota
    spec:
      hard:
        requests.cpu: "4"
        requests.memory: 8Gi
        limits.cpu: "8"
        limits.memory: 16Gi
  2. Optimize resource usage:

    # Reduce resource requests in deployments
    kubectl patch deployment <name> -p '{"spec":{"template":{"spec":{"containers":[{"name":"container","resources":{"requests":{"cpu":"100m"}}}]}}}}'

Troubleshooting Checklist

When encountering issues, follow this systematic approach:

  1. Check Pod Status:

    kubectl get pods -o wide
  2. Review Events:

    kubectl get events --sort-by=.metadata.creationTimestamp
  3. Examine Logs:

    kubectl logs <pod-name> --previous
  4. Describe Resources:

    kubectl describe pod <pod-name>
  5. Check Resource Usage:

    kubectl top nodes
    kubectl top pods
  6. Verify Configuration:

    kubectl get <resource> -o yaml
  7. Test Connectivity:

    kubectl exec -it <pod-name> -- /bin/sh

Prevention Best Practices

  • Always set resource requests and limits
  • Use health checks (readiness and liveness probes)
  • Validate configurations with dry-run
  • Monitor cluster resources regularly
  • Use proper image tags (avoid ’latest’)
  • Implement proper logging and monitoring
  • Use namespaces for resource isolation
  • Keep cluster and node images updated

By understanding these common issues and their solutions, you’ll be better equipped to troubleshoot and maintain healthy Kubernetes environments.