How do I debug Kubernetes ingress controllers for HTTP 502 or 504 errors?

Debugging Kubernetes Ingress controllers for HTTP 502 or 504 errors involves a systematic approach to identify the root cause. These HTTP status codes typically indicate communication issues between the Ingress controller and the backend services or upstream servers. Here’s a detailed step-by-step guide to troubleshoot these errors:

1. Understand HTTP 502 and 504 Errors

HTTP 502 (Bad Gateway): The Ingress controller successfully communicates with the backend server, but the backend server returned an invalid response.
HTTP 504 (Gateway Timeout): The Ingress controller did not receive a response from the backend server within the expected time frame.

2. Check the Ingress Resource Configuration

Inspect the Ingress manifest:
bash kubectl describe ingress <ingress-name>
Ensure that:
- The host and path rules match your desired configuration.
- The serviceName and servicePort are correctly defined.
Verify the annotations (e.g., for timeouts, load balancing, etc.), as these can affect behavior:
yaml annotations: nginx.ingress.kubernetes.io/proxy-connect-timeout: "60" nginx.ingress.kubernetes.io/proxy-read-timeout: "60"

3. Check the Backend Service

Verify the associated service configuration:
bash kubectl get service <service-name> -o yaml
Ensure:
- The service type (ClusterIP, NodePort, etc.) is appropriate.
- The service port matches the one configured in the Ingress resource.
If the service uses a selector, ensure it matches the labels of the target pods.

4. Inspect the Backend Pods

Check the status of the backend pods:
bash kubectl get pods -l <label-selector>
Ensure:
- The pods are running and ready.
- The pods’ containers are healthy (check readiness/liveness probes).
View pod logs to identify any issues:
bash kubectl logs <pod-name>
If the application is exposing a specific port, test connectivity within the cluster:
bash kubectl exec -it <pod-name> -- curl http://<service-name>:<service-port>

5. Inspect the Ingress Controller

Get logs for the Ingress controller pod:
bash kubectl logs -n <ingress-namespace> <ingress-controller-pod>
Look for errors or warnings related to the HTTP 502/504.
Check the Ingress controller’s deployment and configuration:
bash kubectl describe deployment -n <ingress-namespace> <ingress-controller-deployment>

6. Check Networking and DNS

Ensure the DNS resolution is working correctly for the backend services:
bash kubectl exec -it <pod-name> -- nslookup <service-name>
Test connectivity from the Ingress controller pod to the backend service:
bash kubectl exec -n <ingress-namespace> <ingress-controller-pod> -- curl http://<service-name>:<service-port>

7. Verify Load Balancer and Firewall Rules

If using an external load balancer:
- Check that it is forwarding traffic to the Ingress controller.
- Ensure health checks on the load balancer are passing.
Verify any firewall or network policies that might block traffic between the Ingress controller and the backend services.

8. Check Timeouts

HTTP 504 errors might occur due to timeouts in the Ingress controller or backend service. Increase timeout values in the Ingress annotations, for example:
yaml annotations: nginx.ingress.kubernetes.io/proxy-read-timeout: "120" nginx.ingress.kubernetes.io/proxy-send-timeout: "120"
Check the application’s processing time to ensure it can respond within the timeout window.

9. Debug with Tools

Use kubectl port-forward to directly access the backend service and verify its behavior:
bash kubectl port-forward svc/<service-name> <local-port>:<service-port> curl http://localhost:<local-port>
Use curl with detailed output to see the response headers and status:
bash curl -v http://<ingress-host>/<path>

10. Inspect Custom Configurations

If you are using a custom Ingress controller (e.g., Traefik, HAProxy, Istio), review its configuration and logs.
Some controllers may require specific annotations or CRDs to behave as expected.

11. Monitor Metrics

Enable monitoring for the Ingress controller using tools like Prometheus and Grafana.
Look for metrics that indicate high latency, connection errors, or dropped requests.

12. Test with Simplified Configuration

Create a minimal Ingress resource and a simple backend service (e.g., an nginx pod) to rule out complex configurations as the cause of the issue.

13. Check for Known Issues

Review the documentation and GitHub issues for your specific Ingress controller (e.g., NGINX, Traefik, HAProxy) for any known bugs or limitations.

14. Work with Load Balancer Logs

If using a cloud provider’s load balancer, inspect its logs for errors or misconfigurations.

By systematically following these steps, you should be able to identify and resolve the root cause of the HTTP 502 or 504 errors in your Kubernetes Ingress setup.