Troubleshooting Guide
How to troubleshoot ESB3027 AgileTV CDN Manager
This guide helps diagnose common issues with the acd-manager deployment and its associated pods.
1. Check Pod Status
Verify all pods are running:
kubectl get pods
Expected:
- Most pods should be in
Runningstate withREADYas1/1or2/2. - Pods marked as
0/1or0/2are not fully ready, indicating potential issues.
2. Investigate Unready or Failed Pods
Example:
kubectl describe pod acd-manager-6c85ddd747-rdlg6
- Look for events such as
CrashLoopBackOff,ImagePullBackOff, orErrImagePull. - Check container statuses for error messages.
3. Check Pod Logs
Fetch logs for troubleshooting:
kubectl logs acd-manager-6c85ddd747-rdlg6
- For pods with multiple containers:
kubectl logs acd-manager-<pod_name> -c <container_name>
- Focus on recent errors or exceptions.
4. Verify Connectivity and Dependencies
- PostgreSQL: Confirm the
acd-cluster-postgresql-0pod is healthy and accepting connections. - Kafka: Check
kafka-controllerpods are running and not experiencing issues. - Redis: Ensure Redis master and replicas are healthy.
- Grafana, Prometheus, VictoriaMetrics: Confirm these services are operational.
5. Check Resource Usage
High CPU or memory can cause pods to crash or become unresponsive:
kubectl top pods
Actions:
- Scale resources if needed.
- Review resource quotas and limits.
6. Check Events in Namespace
kubectl get events --sort-by='.lastTimestamp'
- Look for warnings or errors related to pod scheduling, network issues, or resource constraints.
7. Restart Problematic Pods
Sometimes, restarting pods can resolve transient issues:
kubectl delete pod <pod_name>
Kubernetes will automatically recreate the pod.
8. Verify Configurations and Secrets
- Check ConfigMaps and Secrets for correctness:
kubectl get configmaps
kubectl get secrets
- Confirm environment variables and mounted volumes are correctly configured.
9. Check Cluster Network
- Ensure network policies or firewalls are not blocking communication between pods and external services.
10. Additional Tips
- Upgrade or Rollback: If recent changes caused issues, consider rolling back or upgrading the deployment.
- Monitoring: Use Grafana and VictoriaMetrics dashboards for real-time insights.
- Documentation: Consult application-specific logs and documentation for known issues.
Summary Table
| Issue Type | Common Checks | Commands |
|---|---|---|
| Pod Not Ready | Describe pod, check logs | kubectl describe pod, kubectl logs |
| Connectivity | Verify service endpoints | kubectl get svc, curl from within pods |
| Resource Limits | Monitor resource usage | kubectl top pods |
| Events & Errors | Check cluster events | kubectl get events |
| Configuration | Validate configs and secrets | kubectl get configmaps, kubectl get secrets |
If issues persist, consider scaling down and up components or consulting logs and metrics for deeper analysis.