Operations Guide
Overview
This guide details some of the common commands that will be necessary to operate the ESB3027 AgileTV CDN Manager software. Before starting, you will need at least a basic understanding of the following command line tooling.
Getting and Describing Kubernetes Resources
The two most common commands in Kubernetes are get and describe for a specific resource
such as a Pod or Service. Using kubectl get typically lists all resources of a particular
type; for example, kubectl get pods will display all pods in the current namespace. To obtain
more detailed information about a specific resource, use kubectl describe <resource>, such as
kubectl describe pod postgresql-0 to view details about that particular pod.
When describing a pod, the output includes a recent Event history at the bottom. This can be extremely helpful for troubleshooting issues, such as why a pod failed to deploy or was restarted. However, keep in mind that this event history only reflects the most recent events from the past few hours, so it may not provide insights into problems that occurred days or weeks ago.
Obtaining Logs
Each Pod maintains its own logs for each container. To fetch the logs of a specific pod, use
kubectl logs <pod_name>. Adding the -f flag will stream the logs in follow mode, allowing
real-time monitoring. If a pod contains multiple containers, by default, only the logs from the
primary container are shown. To view logs from a different container within the same pod, use
the -c <container_name> flag.
Since each pod maintains its own logs, retrieving logs from all replicas of a Deployment or StatefulSet may be necessary to get a complete view. You can use label selectors to collect logs from all pods associated with the same application. For example, to fetch logs from all pods belonging to the “acd-manager” deployment, run:
kubectl logs -l app.kubernetes.io/name=acd-manager
To find the labels associated with a specific Deployment or ReplicaSet, describe the resource and look for the “Labels” field.
The following table describes the common labels currently used by deployments in the cluster.
Component Labels
| Label (key=value) | Description |
|---|---|
| app.kubernetes.io/component=manager | Identifies the ACD Manager service |
| app.kubernetes.io/component=confd | Identifies the confd service |
| app.kubernetes.io/component=frontend | Identifies the GUI (frontend) service |
| app.kubernetes.io/component=gateway | Identifies the API gateway service |
| app.kubernetes.io/component=grafana | Identifies the Grafana monitoring service |
| app.kubernetes.io/component=metrics-aggregator | Identifies the metrics aggregator service |
| app.kubernetes.io/component=mib-frontend | Identifies the MIB frontend service |
| app.kubernetes.io/component=server | Identifies the Prometheus server component |
| app.kubernetes.io/component=selection-input | Identifies the selection input service |
| app.kubernetes.io/component=start | Identifies the Zitadel startup/init component |
| app.kubernetes.io/component=primary | Identifies the PostgreSQL primary node |
| app.kubernetes.io/component=controller-eligible | Identifies the Kafka controller-eligible node |
| app.kubernetes.io/component=alertmanager | Identifies the Prometheus Alertmanager |
| app.kubernetes.io/component=master | Identifies the Redis master node |
| app.kubernetes.io/component=replica | Identifies the Redis replica node |
Instance, Name, and Part-of Labels
| Label (key=value) | Description |
|---|---|
| app.kubernetes.io/instance=acd-manager | Helm release instance name (acd-manager) |
| app.kubernetes.io/instance=acd-cluster | Helm release instance name (acd-cluster) |
| app.kubernetes.io/name=acd-manager | Resource name: acd-manager |
| app.kubernetes.io/name=confd | Resource name: confd |
| app.kubernetes.io/name=grafana | Resource name: grafana |
| app.kubernetes.io/name=mib-frontend | Resource name: mib-frontend |
| app.kubernetes.io/name=prometheus | Resource name: prometheus |
| app.kubernetes.io/name=telegraf | Resource name: telegraf |
| app.kubernetes.io/name=zitadel | Resource name: zitadel |
| app.kubernetes.io/name=postgresql | Resource name: postgresql |
| app.kubernetes.io/name=kafka | Resource name: kafka |
| app.kubernetes.io/name=redis | Resource name: redis |
| app.kubernetes.io/name=victoria-metrics-single | Resource name: victoria-metrics-single |
| app.kubernetes.io/part-of=prometheus | Part of the Prometheus stack |
| app.kubernetes.io/part-of=kafka | Part of the Kafka stack |
Restarting a Pod
Since Kubernetes maintains a fixed number of replicas for each Deployment or ReplicaSet, deleting a
pod will cause Kubernetes to immediately recreate it, effectively restarting the pod. For example,
to restart the pod acd-manager-6c85ddd747-5j5gt, run:
kubectl delete pod acd-manager-6c85ddd747-5j5gt
Kubernetes will automatically detach that pod from any associated Service, preventing new connections from reaching it. It then spawns a new instance, which goes through startup, liveness, and readiness probes. Once the new pod passes the readiness probes and is marked as ready, the Service will start forwarding new traffic to it.
If multiple replicas are running, traffic will be distributed among the existing pods while the new pod is initializing, ensuring a seamless, zero-downtime operation.
Stopping and Starting a Deployment
Unlike traditional services, Kubernetes does not have a concept of stopping a service directly. Instead, you can temporarily scale a Deployment to zero replicas, which has the same effect.
For example, to stop the acd-manager Deployment, run:
kubectl scale deployment acd-manager --replicas=0
To restart it later, scale the deployment back to its original number of replicas, e.g.,
kubectl scale deployment acd-manager --replicas=1
If you want to perform a simple restart of all pods within a deployment, you can delete all pods with a specific label, and Kubernetes will automatically recreate them. For example, to restart all pods with the component label “manager,” use:
kubectl delete pod -l app.kubernetes.io/component=manager
This command causes Kubernetes to delete all matching pods, which are then recreated, effectively restarting the service without changing the deployment configuration.
Running command inside a pod
Sometimes it is necessary to run a command inside an existing Pod such as obtaining a bash shell.
Using the kubectl exec -it <podname> -- <command> can be used to do just that. Assuming we need to
run the confcli tool inside the confd pod acd-manager-confd-558f49ffb5-n8dmr that can be accomplished
using the following command:
kubectl exec -it acd-manager-confd-558f49ffb5-n8dmr -- /usr/bin/python3.11 /usr/local/bin/confcli
Note: The confd container does not have a shell, so specifying the python interpreter is necessary on this image.
Monitoring resource usage
Kubernetes includes an internal metrics API which can give some insight into the resource usage of the Pods and of the Nodes.
To list the current usage of the Pods in the cluster issue the following:
kubectl top pods
This will give output similar to the following:
NAME CPU(cores) MEMORY(bytes)
acd-cluster-postgresql-0 3m 44Mi
acd-manager-6c85ddd747-rdlg6 4m 15Mi
acd-manager-confd-558f49ffb5-n8dmr 1m 47Mi
acd-manager-gateway-7594479477-z4bbr 0m 10Mi
acd-manager-grafana-78c76d8c5-c2tl6 18m 144Mi
acd-manager-kafka-controller-0 19m 763Mi
acd-manager-kafka-controller-1 19m 967Mi
acd-manager-kafka-controller-2 25m 1127Mi
acd-manager-metrics-aggregator-f6ff99654-tjbfs 4m 2Mi
acd-manager-mib-frontend-67678c69df-tkklr 1m 26Mi
acd-manager-prometheus-alertmanager-0 2m 25Mi
acd-manager-prometheus-server-768f5d5c-q78xb 5m 53Mi
acd-manager-redis-master-0 12m 18Mi
acd-manager-redis-replicas-0 15m 14Mi
acd-manager-selection-input-844599bc4d-x7dct 3m 3Mi
acd-manager-telegraf-585dfc5ff8-n8m5c 1m 27Mi
acd-manager-victoria-metrics-single-server-0 2m 10Mi
acd-manager-zitadel-69b6546f8f-v9lkp 1m 76Mi
acd-manager-zitadel-69b6546f8f-wwcmx 1m 72Mi
Querying the metrics API for the nodes gives the aggregated totals for each node:
kubectl top nodes
Yields output similar to the following:
NAME CPU(cores) CPU(%) MEMORY(bytes) MEMORY(%)
k3d-local-agent-0 118m 0% 1698Mi 21%
k3d-local-agent-1 120m 0% 661Mi 8%
k3d-local-agent-2 84m 0% 1054Mi 13%
k3d-local-server-0 115m 0% 1959Mi 25%
Taking a node out of service
To temporarily take a node out of service for maintenance, you can do so with minimal downtime, provided there are enough resources on other nodes in the cluster to handle the pods from the target node.
Step 1: Cordon the node.
This prevents new pods from being scheduled on the node:
kubectl cordon <node-name>
Step 2: Drain the node.
This moves existing pods off the node, respecting DaemonSets and local data:
kubectl drain <node-name> --ignore-daemonsets --delete-local-data
- The
--ignore-daemonsetsflag skips DaemonSet-managed pods, which are typically managed separately. - The
--delete-local-dataflag removes any local ephemeral data stored on the node.
Once drained, the node is effectively out of service.
To bring the node back into service:
Uncordon the node with:
kubectl uncordon <node-name>
This allows Kubernetes to schedule new pods on the node. It won’t automatically move existing pods back; you may need to manually restart or reschedule pods if desired. Since the node now has more available resources, Kubernetes will attempt to schedule new pods there to balance the load across the cluster.
Backup and restore of persistent volumes
The Longhorn storage driver, which provides the persistent storage used in the cluster, (See the Storage Guide for more details) provides built-in mechanisms for backup, restore, and snapshotting volumes. This can be performed entirely from within the Longhorn WebUI. See the relevant section of the Storage Guide for details on accessing that UI, since it requires setting up a port forward, which is described there.
See the relevant Longhorn Documentation for how to configure Longhorn and to manage Snapshotting and Backup and Restore.