Common Kubernetes Issues
Kubernetes can present various error messages and issues that can be confusing, especially for newcomers. This guide covers the most common Kubernetes error messages, explains what they mean, and provides step-by-step solutions to resolve them.
Pod-Related Issues
ImagePullBackOff / ErrImagePull
What it means: Kubernetes cannot pull the container image from the registry.
Common causes:
- Image name is incorrect or doesn’t exist
- Image registry is unreachable
- Authentication issues with private registries
- Network connectivity problems
How to diagnose:
# Check pod status
kubectl get pods
# Get detailed information
kubectl describe pod <pod-name>
# Check events
kubectl get events --field-selector involvedObject.name=<pod-name>
How to fix:
Verify image name and tag:
# Check if the image exists docker pull <image-name> # Verify the image name in your deployment kubectl get deployment <deployment-name> -o yaml | grep image
Check image registry credentials:
# Create image pull secret for private registries kubectl create secret docker-registry myregistrykey \ --docker-server=<registry-server> \ --docker-username=<username> \ --docker-password=<password> \ --docker-email=<email> # Add to deployment kubectl patch deployment <deployment-name> -p '{"spec":{"template":{"spec":{"imagePullSecrets":[{"name":"myregistrykey"}]}}}}'
Fix image reference:
# Update deployment with correct image kubectl set image deployment/<deployment-name> <container-name>=<correct-image>
CrashLoopBackOff
What it means: The container starts but crashes repeatedly, and Kubernetes keeps trying to restart it.
Common causes:
- Application exits immediately due to configuration errors
- Missing environment variables or configuration
- Application cannot bind to specified port
- Insufficient resources
- Wrong command or entrypoint
How to diagnose:
# Check pod status and restart count
kubectl get pods
# Check container logs
kubectl logs <pod-name> --previous
# Get detailed pod information
kubectl describe pod <pod-name>
How to fix:
Check application logs:
# Check current logs kubectl logs <pod-name> # Check previous container logs kubectl logs <pod-name> --previous
Verify configuration:
# Check environment variables kubectl describe pod <pod-name> | grep -A 10 "Environment" # Check mounted volumes kubectl describe pod <pod-name> | grep -A 10 "Mounts"
Test application locally:
# Run container locally to debug docker run -it <image-name> /bin/sh
Check resource limits:
# Increase resource limits in deployment resources: limits: memory: "512Mi" cpu: "500m" requests: memory: "256Mi" cpu: "250m"
Pending
What it means: The pod cannot be scheduled on any node.
Common causes:
- Insufficient resources on nodes
- Node selector constraints not met
- Taints and tolerations preventing scheduling
- Persistent volume issues
- Image pull secrets missing
How to diagnose:
# Check pod status
kubectl get pods
# Check scheduling events
kubectl describe pod <pod-name>
# Check node resources
kubectl top nodes
# Check node conditions
kubectl get nodes -o wide
How to fix:
Check resource availability:
# Check node resources kubectl describe nodes # Check resource quotas kubectl describe resourcequota -n <namespace>
Review scheduling constraints:
# Check node selectors kubectl get pod <pod-name> -o yaml | grep -A 5 nodeSelector # Check taints and tolerations kubectl describe node <node-name> | grep Taints
Scale cluster or adjust resources:
# Reduce resource requests kubectl patch deployment <deployment-name> -p '{"spec":{"template":{"spec":{"containers":[{"name":"<container-name>","resources":{"requests":{"cpu":"100m","memory":"128Mi"}}}]}}}}'
Service and Networking Issues
Service Endpoints Not Found
What it means: Service has no endpoints, meaning no pods are backing the service.
Common causes:
- No pods match the service selector
- Pods are not ready
- Label selectors don’t match
How to diagnose:
# Check service endpoints
kubectl get endpoints <service-name>
# Check service selector
kubectl describe service <service-name>
# Check pod labels
kubectl get pods --show-labels
How to fix:
Verify label selectors:
# Check service selector kubectl get service <service-name> -o yaml | grep -A 3 selector # Check pod labels kubectl get pods -l <selector-key>=<selector-value>
Ensure pods are ready:
# Check pod readiness kubectl get pods -o wide # Check readiness probe configuration kubectl describe pod <pod-name>
DNS Resolution Issues
What it means: Pods cannot resolve service names or external domains.
Common causes:
- DNS pods not running
- Incorrect DNS configuration
- Network policies blocking DNS
- Service name typos
How to diagnose:
# Check DNS pods
kubectl get pods -n kube-system -l k8s-app=kube-dns
# Test DNS resolution from a pod
kubectl run dns-test --image=busybox --rm -it -- nslookup kubernetes.default.svc.cluster.local
How to fix:
Check DNS pods:
# Restart DNS pods if needed kubectl delete pods -n kube-system -l k8s-app=kube-dns
Test DNS resolution:
# Test service resolution kubectl exec -it <pod-name> -- nslookup <service-name> # Test external resolution kubectl exec -it <pod-name> -- nslookup google.com
Connection Refused / Connection Timeout
What it means: Network connectivity issues between pods or to services.
Common causes:
- Incorrect port configuration
- Application not listening on expected port
- Network policies blocking traffic
- Firewall rules
How to diagnose:
# Check service configuration
kubectl describe service <service-name>
# Test connectivity from within cluster
kubectl run nettest --image=busybox --rm -it -- wget -qO- <service-name>:<port>
# Check if application is listening
kubectl exec -it <pod-name> -- netstat -tulpn
How to fix:
Verify port configuration:
# Check service ports kubectl get service <service-name> -o yaml # Check container ports kubectl get pod <pod-name> -o yaml | grep -A 5 ports
Test application connectivity:
# Port forward to test directly kubectl port-forward pod/<pod-name> 8080:8080 # Test locally curl localhost:8080
Storage Issues
VolumeMountError / VolumeAttachError
What it means: Problems mounting or attaching volumes to pods.
Common causes:
- Persistent Volume (PV) not available
- Incorrect volume configuration
- Storage class issues
- Node-specific storage problems
How to diagnose:
# Check PVC status
kubectl get pvc
# Check PV status
kubectl get pv
# Check storage class
kubectl get storageclass
# Check pod events
kubectl describe pod <pod-name>
How to fix:
Check PVC and PV status:
# Describe PVC for details kubectl describe pvc <pvc-name> # Check if PV is bound kubectl get pv -o wide
Verify storage class:
# List available storage classes kubectl get storageclass # Check default storage class kubectl get storageclass -o yaml | grep "is-default-class"
No Persistent Volumes Available
What it means: No PV matches the PVC requirements.
Common causes:
- No PVs with sufficient capacity
- Access mode mismatch
- Storage class not found
- Node selector constraints
How to fix:
Create appropriate PV:
apiVersion: v1 kind: PersistentVolume metadata: name: example-pv spec: capacity: storage: 1Gi accessModes: - ReadWriteOnce persistentVolumeReclaimPolicy: Retain storageClassName: standard hostPath: path: /data
Adjust PVC requirements:
# Edit PVC to match available PVs kubectl edit pvc <pvc-name>
Resource and Quota Issues
Insufficient CPU/Memory
What it means: Pods cannot be scheduled due to resource constraints.
Common causes:
- Resource requests exceed node capacity
- Resource quotas exceeded
- Resource limits too restrictive
How to diagnose:
# Check node resources
kubectl top nodes
# Check resource quotas
kubectl describe resourcequota
# Check pod resource requests
kubectl describe pod <pod-name> | grep -A 10 "Requests"
How to fix:
Adjust resource requests:
resources: requests: memory: "64Mi" cpu: "50m" limits: memory: "128Mi" cpu: "100m"
Scale cluster:
# Add more nodes (cloud specific) # For AKS az aks scale --resource-group <rg> --name <cluster> --node-count 3
Quota Exceeded
What it means: Resource usage exceeds defined quotas.
How to diagnose:
# Check current quota usage
kubectl describe resourcequota
# Check namespace resource usage
kubectl top pods -n <namespace>
How to fix:
Increase quota limits:
apiVersion: v1 kind: ResourceQuota metadata: name: compute-quota spec: hard: requests.cpu: "4" requests.memory: 8Gi limits.cpu: "8" limits.memory: 16Gi
Optimize resource usage:
# Reduce resource requests in deployments kubectl patch deployment <name> -p '{"spec":{"template":{"spec":{"containers":[{"name":"container","resources":{"requests":{"cpu":"100m"}}}]}}}}'
Troubleshooting Checklist
When encountering issues, follow this systematic approach:
Check Pod Status:
kubectl get pods -o wide
Review Events:
kubectl get events --sort-by=.metadata.creationTimestamp
Examine Logs:
kubectl logs <pod-name> --previous
Describe Resources:
kubectl describe pod <pod-name>
Check Resource Usage:
kubectl top nodes kubectl top pods
Verify Configuration:
kubectl get <resource> -o yaml
Test Connectivity:
kubectl exec -it <pod-name> -- /bin/sh
Prevention Best Practices
- Always set resource requests and limits
- Use health checks (readiness and liveness probes)
- Validate configurations with dry-run
- Monitor cluster resources regularly
- Use proper image tags (avoid ’latest’)
- Implement proper logging and monitoring
- Use namespaces for resource isolation
- Keep cluster and node images updated
By understanding these common issues and their solutions, you’ll be better equipped to troubleshoot and maintain healthy Kubernetes environments.