Troubleshooting Guide

How to troubleshoot ESB3027 AgileTV CDN Manager

This guide helps diagnose common issues with the acd-manager deployment and its associated pods.

1. Check Pod Status

Verify all pods are running:

kubectl get pods

Expected:

Example:

kubectl describe pod acd-manager-6c85ddd747-rdlg6

Look for events such as CrashLoopBackOff, ImagePullBackOff, or ErrImagePull.
Check container statuses for error messages.

Fetch logs for troubleshooting:

kubectl logs acd-manager-6c85ddd747-rdlg6

kubectl logs acd-manager-<pod_name> -c <container_name>

PostgreSQL: Confirm the acd-cluster-postgresql-0 pod is healthy and accepting connections.
Kafka: Check kafka-controller pods are running and not experiencing issues.
Redis: Ensure Redis master and replicas are healthy.
Grafana, Prometheus, VictoriaMetrics: Confirm these services are operational.

High CPU or memory can cause pods to crash or become unresponsive:

kubectl top pods

Actions:

kubectl get events --sort-by='.lastTimestamp'

Look for warnings or errors related to pod scheduling, network issues, or resource constraints.

Sometimes, restarting pods can resolve transient issues:

kubectl delete pod <pod_name>

Kubernetes will automatically recreate the pod.

kubectl get configmaps
kubectl get secrets

Ensure network policies or firewalls are not blocking communication between pods and external services.

Upgrade or Rollback: If recent changes caused issues, consider rolling back or upgrading the deployment.
Monitoring: Use Grafana and VictoriaMetrics dashboards for real-time insights.
Documentation: Consult application-specific logs and documentation for known issues.

Issue Type	Common Checks	Commands
Pod Not Ready	Describe pod, check logs	`kubectl describe pod`, `kubectl logs`
Connectivity	Verify service endpoints	`kubectl get svc`, `curl` from within pods
Resource Limits	Monitor resource usage	`kubectl top pods`
Events & Errors	Check cluster events	`kubectl get events`
Configuration	Validate configs and secrets	`kubectl get configmaps`, `kubectl get secrets`

If issues persist, consider scaling down and up components or consulting logs and metrics for deeper analysis.