Metrics & Monitoring Guide
Overview
The CDN Manager includes a comprehensive monitoring stack based on VictoriaMetrics for time-series data storage, Telegraf for metrics collection, and Grafana for visualization. This guide describes the monitoring architecture and how to access and use the monitoring capabilities.
Architecture
Components
| Component | Purpose |
|---|---|
| Telegraf | Metrics collector running on each node, gathering system and application metrics |
| VictoriaMetrics Agent | Metrics scraper and forwarder; scrapes Prometheus endpoints and forwards to VictoriaMetrics |
| VictoriaMetrics (Short-term) | Time-series database for operational dashboards (30-90 day retention) |
| VictoriaMetrics (Long-term) | Time-series database for billing and compliance (1+ year retention) |
| Grafana | Visualization and dashboard platform |
| Alertmanager | Alert routing and notification management |
Metrics Flow
The following diagram illustrates how metrics flow through the monitoring stack:
flowchart TB
subgraph External["External Sources"]
Streamers[Streamers/External Clients]
end
subgraph Cluster["Kubernetes Cluster"]
Telegraf[Telegraf DaemonSet]
subgraph Applications["Application Components"]
Director[CDN Director]
Kafka[Kafka]
Redis[Redis]
Manager[ACD Manager]
Alertmanager[Alertmanager]
end
VMAgent[VictoriaMetrics Agent]
subgraph Storage["Storage"]
VMShort[VictoriaMetrics<br/>Short-term]
VMLong[VictoriaMetrics<br/>Long-term]
end
end
Grafana[Grafana]
Streamers -->|Push metrics| Telegraf
Telegraf -->|remote_write| VMShort
Telegraf -->|remote_write| VMLong
Director -->|Scrape| VMAgent
Kafka -->|Scrape| VMAgent
Redis -->|Scrape| VMAgent
Manager -->|Scrape| VMAgent
Alertmanager -->|Scrape| VMAgent
VMAgent -->|remote_write| VMShort
VMAgent -->|remote_write| VMLong
VMShort -->|Query| Grafana
VMLong -->|Query| GrafanaMetrics Flow Summary:
External metrics ingestion:
- External clients (streamers) push metrics to Telegraf
- Telegraf forwards metrics via
remote_writeto both VictoriaMetrics instances
Internal metrics scraping:
- VictoriaMetrics Agent scrapes Prometheus endpoints from:
- CDN Director instances
- Kafka cluster
- Redis
- ACD Manager components
- Alertmanager
- VMAgent forwards scraped metrics via
remote_writeto both VictoriaMetrics instances
- VictoriaMetrics Agent scrapes Prometheus endpoints from:
Data visualization:
- Grafana queries both VictoriaMetrics databases depending on the dashboard requirements
- Operational dashboards use short-term storage
- Billing and compliance dashboards use long-term storage
Accessing Grafana
Grafana is deployed as part of the metrics stack and accessible via the ingress:
URL: https://<manager-host>/grafana
Default credentials are listed in the Glossary.
Important: Change all default passwords after first login.
Metrics Collection
Application Metrics
Applications expose metrics on Prometheus-compatible endpoints. VictoriaMetrics Agent (VMAgent) scrapes these endpoints and forwards metrics to VictoriaMetrics via remote_write.
System Metrics
Telegraf collects system-level metrics including:
- CPU usage
- Memory utilization
- Disk I/O
- Network statistics
- Process metrics
Kubernetes Metrics
Cluster metrics are collected including:
- Pod resource usage
- Node status
- Deployment status
- Persistent volume usage
Grafana Dashboards
Accessing Dashboards
After logging into Grafana:
- Navigate to Dashboards in the left menu
- Browse available dashboards
- Click on a dashboard to view metrics
Dashboard Types
The included dashboards provide visibility into:
- Cluster Health: Overall cluster resource utilization
- Application Performance: Request rates, latency, error rates
- Component Status: Individual component health indicators
CDN Director Metrics
Director DNS Names in Grafana
CDN Director instances are identified in Grafana by their DNS name, which is derived from the name field in global.hosts.routers:
global:
hosts:
routers:
- name: my-router-1
address: 192.0.2.1
The DNS name used in Grafana dashboards will be: my-router-1.external
This naming convention is automatically applied for all configured directors.
Metrics Retention
VictoriaMetrics is configured with default retention policies. For custom retention settings, modify the VictoriaMetrics configuration in your values.yaml:
acd-metrics:
victoria-metrics-single:
retentionPeriod: "3" # Retention period in months
Troubleshooting
Metrics Not Appearing
If metrics are not appearing in Grafana:
Check Telegraf pods:
kubectl get pods -l app.kubernetes.io/component=telegrafCheck Telegraf logs:
kubectl logs -l app.kubernetes.io/component=telegrafVerify VictoriaMetrics is running:
kubectl get pods -l app.kubernetes.io/component=victoria-metricsCheck application metrics endpoints:
kubectl exec <pod-name> -- curl localhost:8080/metrics
Dashboard Loading Issues
If dashboards fail to load:
Check Grafana pods:
kubectl get pods -l app.kubernetes.io/component=grafanaReview Grafana logs:
kubectl logs -l app.kubernetes.io/component=grafanaVerify datasource configuration in Grafana UI
Next Steps
After setting up monitoring:
- Operations Guide - Day-to-day operational procedures
- Troubleshooting Guide - Resolve monitoring issues
- API Guide - Access metrics via API