Troubleshooting Guide

How to troubleshoot ESB3027 AgileTV CDN Manager

This guide helps diagnose common issues with the acd-manager deployment and its associated pods.


1. Check Pod Status

Verify all pods are running:

kubectl get pods

Expected:

  • Most pods should be in Running state with READY as 1/1 or 2/2.
  • Pods marked as 0/1 or 0/2 are not fully ready, indicating potential issues.

2. Investigate Unready or Failed Pods

Example:

kubectl describe pod acd-manager-6c85ddd747-rdlg6
  • Look for events such as CrashLoopBackOff, ImagePullBackOff, or ErrImagePull.
  • Check container statuses for error messages.

3. Check Pod Logs

Fetch logs for troubleshooting:

kubectl logs acd-manager-6c85ddd747-rdlg6
  • For pods with multiple containers:
kubectl logs acd-manager-<pod_name> -c <container_name>
  • Focus on recent errors or exceptions.

4. Verify Connectivity and Dependencies

  • PostgreSQL: Confirm the acd-cluster-postgresql-0 pod is healthy and accepting connections.
  • Kafka: Check kafka-controller pods are running and not experiencing issues.
  • Redis: Ensure Redis master and replicas are healthy.
  • Grafana, Prometheus, VictoriaMetrics: Confirm these services are operational.

5. Check Resource Usage

High CPU or memory can cause pods to crash or become unresponsive:

kubectl top pods

Actions:

  • Scale resources if needed.
  • Review resource quotas and limits.

6. Check Events in Namespace

kubectl get events --sort-by='.lastTimestamp'
  • Look for warnings or errors related to pod scheduling, network issues, or resource constraints.

7. Restart Problematic Pods

Sometimes, restarting pods can resolve transient issues:

kubectl delete pod <pod_name>

Kubernetes will automatically recreate the pod.


8. Verify Configurations and Secrets

  • Check ConfigMaps and Secrets for correctness:
kubectl get configmaps
kubectl get secrets
  • Confirm environment variables and mounted volumes are correctly configured.

9. Check Cluster Network

  • Ensure network policies or firewalls are not blocking communication between pods and external services.

10. Additional Tips

  • Upgrade or Rollback: If recent changes caused issues, consider rolling back or upgrading the deployment.
  • Monitoring: Use Grafana and VictoriaMetrics dashboards for real-time insights.
  • Documentation: Consult application-specific logs and documentation for known issues.

Summary Table

Issue TypeCommon ChecksCommands
Pod Not ReadyDescribe pod, check logskubectl describe pod, kubectl logs
ConnectivityVerify service endpointskubectl get svc, curl from within pods
Resource LimitsMonitor resource usagekubectl top pods
Events & ErrorsCheck cluster eventskubectl get events
ConfigurationValidate configs and secretskubectl get configmaps, kubectl get secrets

If issues persist, consider scaling down and up components or consulting logs and metrics for deeper analysis.