Kubectl Troubleshooting Tools

When working with Kubernetes, effective troubleshooting is crucial for maintaining healthy applications and clusters. The kubectl command-line tool provides a comprehensive set of commands for diagnosing and resolving issues. This guide covers the essential kubectl troubleshooting tools and techniques.

Core Troubleshooting Commands

1. kubectl describe

The describe command provides detailed information about Kubernetes resources, including events, status, and configuration.

Syntax

kubectl describe <resource-type> <resource-name>
kubectl describe <resource-type>/<resource-name>

Common Examples

# Describe a specific pod
kubectl describe pod my-app-pod

# Describe all pods in the current namespace
kubectl describe pods

# Describe a pod in a specific namespace
kubectl describe pod my-app-pod -n production

# Describe a deployment
kubectl describe deployment my-app

# Describe a service
kubectl describe service my-service

# Describe a node
kubectl describe node worker-node-1

Key Information Provided

Resource metadata: Name, namespace, labels, annotations
Spec configuration: Desired state and configuration
Status: Current state and conditions
Events: Recent events related to the resource
Volumes: Mounted volumes and their status
Network: IP addresses, ports, and endpoints

Example Output Analysis

kubectl describe pod failing-pod

Look for:

Status: Running, Pending, Failed, etc.
Restart Count: High restart counts indicate issues
Events: Error messages, scheduling issues, image pull problems
Conditions: Ready, Initialized, PodScheduled status

2. kubectl logs

The logs command retrieves container logs, essential for debugging application issues.

Syntax

kubectl logs <pod-name>
kubectl logs <pod-name> -c <container-name>

Advanced Log Options

# Get logs from the previous container instance
kubectl logs my-pod --previous

# Follow logs in real-time
kubectl logs my-pod -f

# Get logs from the last hour
kubectl logs my-pod --since=1h

# Get logs from a specific time
kubectl logs my-pod --since-time=2024-01-01T10:00:00Z

# Get last 100 lines
kubectl logs my-pod --tail=100

# Get logs from all containers in a pod
kubectl logs my-pod --all-containers=true

# Get logs from a specific container in a multi-container pod
kubectl logs my-pod -c sidecar-container

# Get logs from all pods with a specific label
kubectl logs -l app=my-app

# Get logs with timestamps
kubectl logs my-pod --timestamps

Debugging with Logs

# Check for errors in application logs
kubectl logs my-app-pod | grep -i error

# Monitor logs continuously
kubectl logs my-app-pod -f | grep -i "exception\|error\|fail"

# Save logs to a file for analysis
kubectl logs my-app-pod > app-logs.txt

3. kubectl exec

The exec command allows you to execute commands inside running containers for interactive debugging.

Syntax

kubectl exec <pod-name> -- <command>
kubectl exec -it <pod-name> -- <shell>

Common Debugging Commands

# Start an interactive shell session
kubectl exec -it my-pod -- /bin/bash
kubectl exec -it my-pod -- /bin/sh

# Execute a specific command
kubectl exec my-pod -- ls -la /app

# Check processes running in the container
kubectl exec my-pod -- ps aux

# Check network connectivity
kubectl exec my-pod -- ping google.com
kubectl exec my-pod -- nslookup kubernetes.default.svc.cluster.local

# Check disk usage
kubectl exec my-pod -- df -h

# Check environment variables
kubectl exec my-pod -- env

# Test application endpoints
kubectl exec my-pod -- curl localhost:8080/health

# For multi-container pods, specify the container
kubectl exec -it my-pod -c my-container -- /bin/bash

Network Troubleshooting Examples

# Test DNS resolution
kubectl exec my-pod -- nslookup my-service

# Check if a service is reachable
kubectl exec my-pod -- curl my-service:80

# Test external connectivity
kubectl exec my-pod -- wget -qO- http://httpbin.org/ip

# Check listening ports
kubectl exec my-pod -- netstat -tulpn

4. kubectl get events

Events provides a chronological log of what’s happening in your cluster.

Syntax

kubectl get events
kubectl get events --sort-by=.metadata.creationTimestamp

Filtering Events

# Get events for a specific namespace
kubectl get events -n production

# Get events for the last hour
kubectl get events --field-selector involvedObject.kind=Pod

# Get events for a specific resource
kubectl get events --field-selector involvedObject.name=my-pod

# Get warning and error events only
kubectl get events --field-selector type!=Normal

# Sort events by timestamp
kubectl get events --sort-by='.lastTimestamp'

# Watch events in real-time
kubectl get events --watch

Event Types to Watch For

FailedScheduling: Pod cannot be scheduled
FailedMount: Volume mount failures
ImagePullBackOff: Container image pull issues
CrashLoopBackOff: Container keeps crashing
NetworkNotReady: Network configuration issues

5. kubectl get with Wide Output

Get detailed information about resources with additional columns.

# Get pods with additional information
kubectl get pods -o wide

# Get nodes with detailed information
kubectl get nodes -o wide

# Get services with endpoints
kubectl get services -o wide

# Get all resources in a namespace
kubectl get all -o wide

6. kubectl top

Monitor resource usage (requires metrics server).

# Get CPU and memory usage for nodes
kubectl top nodes

# Get CPU and memory usage for pods
kubectl top pods

# Get resource usage for pods in a specific namespace
kubectl top pods -n production

# Sort by CPU usage
kubectl top pods --sort-by=cpu

# Sort by memory usage
kubectl top pods --sort-by=memory

7. kubectl debug

The debug command provides an interactive debugging session for pods, allowing you to inspect and modify resources on the fly.

Syntax

# Start a debug session
kubectl debug <pod-name> --image=<debug-image>

# Create a copy of a pod for debugging
kubectl debug <pod-name> --copy-to=<new-pod-name>

Advanced Troubleshooting Techniques

Resource Status Checking

# Check the status of all resources
kubectl get all --all-namespaces

# Check for resources in error states
kubectl get pods --field-selector=status.phase=Failed

# Check for pods that are not ready
kubectl get pods --field-selector=status.phase!=Running

# List persistent volume claims
kubectl get pvc -o wide

Configuration Debugging

# Get the YAML configuration of a resource
kubectl get pod my-pod -o yaml

# Get the JSON configuration
kubectl get pod my-pod -o json

# Explain resource fields
kubectl explain pod.spec.containers

# Validate a YAML file without applying it
kubectl apply --dry-run=client -f my-config.yaml

Port Forwarding for Testing

# Forward a local port to a pod
kubectl port-forward pod/my-pod 8080:80

# Forward to a service
kubectl port-forward service/my-service 8080:80

# Forward to a deployment
kubectl port-forward deployment/my-app 8080:80

Common Troubleshooting Scenarios

Scenario 1: Pod Won’t Start

# Check pod status
kubectl get pods

# Get detailed information
kubectl describe pod failing-pod

# Check events
kubectl get events --field-selector involvedObject.name=failing-pod

# Check logs if container started
kubectl logs failing-pod

Scenario 2: Service Not Accessible

# Check service configuration
kubectl describe service my-service

# Check endpoints
kubectl get endpoints my-service

# Verify pods are running and ready
kubectl get pods -l app=my-app

# Test from within the cluster
kubectl run test-pod --image=busybox --rm -it -- wget -qO- my-service:80

Scenario 3: High Resource Usage

# Check resource usage
kubectl top pods

# Check resource limits
kubectl describe pod high-usage-pod

# Check for resource quotas
kubectl describe resourcequota

Scenario 4: Networking Issues

# Check DNS resolution
kubectl exec test-pod -- nslookup kubernetes.default.svc.cluster.local

# Check network policies
kubectl get networkpolicies

# Test pod-to-pod communication
kubectl exec pod1 -- ping <pod2-ip>

Troubleshooting Cheat Sheet

Quick Commands for Common Issues

# Pod status overview
kubectl get pods --all-namespaces | grep -v Running

# Recent events
kubectl get events --sort-by=.metadata.creationTimestamp --all-namespaces | tail -20

# Resource usage
kubectl top pods --all-namespaces --sort-by=memory

# Failed pods
kubectl get pods --field-selector=status.phase=Failed --all-namespaces

# Pending pods
kubectl get pods --field-selector=status.phase=Pending --all-namespaces

# Node status
kubectl get nodes -o wide

# System pods status
kubectl get pods -n kube-system

Troubleshooting Workflow

Identify the Problem: Start with kubectl get commands to see the current state
Gather Information: Use kubectl describe to get detailed information
Check Events: Look for recent events that might explain the issue
Examine Logs: Check application and system logs for errors
Interactive Debugging: Use kubectl exec to investigate inside containers
Test Connectivity: Use port-forwarding and exec to test network connectivity
Monitor Resources: Check CPU, memory, and storage usage
Validate Configuration: Ensure configurations are correct and applied

By mastering these kubectl troubleshooting tools and techniques, you’ll be well-equipped to diagnose and resolve issues in your Kubernetes environments efficiently.