Architecture Guide

General Architectural Overview

Kubernetes Architecture

Kubernetes is an open-source container orchestration platform that simplifies the deployment, management, and scaling of containerized applications. It provides a robust framework to run applications reliably across a cluster of machines by abstracting the complexities of the underlying infrastructure. At its core, Kubernetes manages resources through various objects that define how applications are deployed and maintained.

Nodes are the physical or virtual machines that make up the Kubernetes cluster. Each node runs a container runtime, the kubelet agent, and other necessary components to host and manage containers. The smallest deployable units in Kubernetes are Pods, which typically consist of one or more containers sharing storage, network, and a specified way to run the containers. Containers within Pods are the actual runtime instances of the applications.

To manage the lifecycle of applications, Kubernetes offers different controllers such as Deployments and StatefulSets. Deployments are used for stateless applications, enabling easy rolling updates and scaling. StatefulSets, on the other hand, are designed for stateful applications that require persistent storage and stable network identities, like databases. Kubernetes also uses Services to provide a stable network endpoint that abstracts Pods, facilitating reliable communication within the application or from outside the cluster, often distributing traffic load across multiple Pods.

graph TD
    subgraph Cluster
        direction TB
        Node1["Node"]
        Node2["Node"]
    end

    subgraph "Workloads"
        Deployment["Deployment (stateless)"]
        StatefulSet["StatefulSet (stateful)"]
        Pod1["Pod"]
        Pod2["Pod"]
        Container1["Container"]
        Container2["Container"]
    end

    subgraph "Networking"
        Service["Service"]
    end

    Node1 -->|Hosts| Pod1
    Node2 -->|Hosts| Pod2
    Deployment -->|Manages| Pod1
    StatefulSet -->|Manages| Pod2
    Pod1 -->|Contains| Container1
    Pod2 -->|Contains| Container2
    Service -->|Provides endpoint to| Pod1
    Service -->|Provides endpoint to| Pod2

Additional Concepts

Both Deployments and StatefulSets can be scaled by adjusting the number of Pod replicas. In a Deployment, replicas are considered identical clones of the Pod, and a Service typically performs load balancing across them. Each replica in a ReplicaSet is assigned a fixed name, usually following a pattern like <name>-<index>, for example, postgresql-0, postgresql-1, and so on.

Many applications use a fixed number of replicas set through Helm, which remains constant regardless of system load. Alternatively, for more dynamic scaling, a Horizontal Pod Autoscaler (HPA) can be used to automatically adjust the number of replicas between a defined minimum and maximum based on real-time load metrics. In public cloud environments, a Vertical Pod Autoscaler (VPA) may also be employed to dynamically scale the number of nodes, but since this feature is not supported in self-hosted setups and depends on the specific cloud provider’s implementation, it is less commonly used in on-premises environments.

Architectural Diagram

graph TD
    subgraph Cluster
        direction TB
        PostgreSQL[PostgreSQL Database]
        Kafka[kafka-controller Pods]
        Redis[Redis Master & Replicas]
        VictoriaMetrics[VictoriaMetrics]
        Prometheus[Prometheus Server]
        Grafana[Grafana Dashboard]
        Gateway[Nginx Gateway]
        Confd[Confd]
        Manager[ACD-Manager]
        Frontend[MIB Frontend]
        ZITADEL[Zitadel]
        Telegraf[Telegraf]
        AlertManager[Alertmanager]
    end

    PostgreSQL -->|Stores data| Manager
    Kafka -->|Streams data| Manager
    Redis -->|Cache / Message Broker| Manager
    VictoriaMetrics -->|Billing data| Grafana
    Prometheus -->|Billing data| VictoriaMetrics
    Prometheus -->|Monitoring data| Grafana
    Manager -->|Metrics & Monitoring| Prometheus
    Manager -->|Alerting| AlertManager
    Manager -->|User Interface| Frontend
    Manager -->|Authentication| ZITADEL
    Frontend -->|Authentication| Manager
    Confd -->|Config Updates| Manager
    Telegraf -->|System Metrics| Prometheus
    Gateway -->|Proxies| Director[Director APIs]

    style PostgreSQL fill:#f9f,stroke:#333,stroke-width:1px
    style Kafka fill:#ccf,stroke:#333,stroke-width:1px
    style Redis fill:#cfc,stroke:#333,stroke-width:1px
    style VictoriaMetrics fill:#ffc,stroke:#333,stroke-width:1px
    style Prometheus fill:#ccf,stroke:#333,stroke-width:1px
    style Grafana fill:#f99,stroke:#333,stroke-width:1px
    style Gateway fill:#eef,stroke:#333,stroke-width:1px
    style Confd fill:#eef,stroke:#333,stroke-width:1px
    style Manager fill:#eef,stroke:#333,stroke-width:1px
    style Frontend fill:#eef,stroke:#333,stroke-width:1px
    style ZITADEL fill:#eef,stroke:#333,stroke-width:1px
    style Telegraf fill:#eef,stroke:#333,stroke-width:1px
    style AlertManager fill:#eef,stroke:#333,stroke-width:1px

Cluster Scaling

Most components, of the cluster can be horizontally scaled, as long as sufficient resources exist in the cluster to support the additional pods. There are a few exceptions however. The Selection Input service, currently does not support scaling as the order in which Kafka records would no longer be maintained among different consumer group members. Services such as PostgreSQL, Prometheus and VictoriaMetrics also do not support scaling at the present time due to the additional configuration requirements. Most if not all of the other services may be scaled, either by explicitly setting the number of replicas in the configuration or in some cases by enabling and configuring the horizontal pod autoscaler.

The Horizontal Pod Autoscaler, monitors the resource utilization of the Pods in a deployment, and based on some configurable metrics, will manage the scaling between a preset minimum and maximum number of replicas. See the Configuration Guide for more information.

Kubernetes automatically selects which node will run the pods based on several factors including, the resource utilization of the nodes, any pod and node affinity rules, selector labels, among other considerations. By default, all nodes with the ability to run workloads of both Server and Agent roles are considered unless specific configuration for node and pod affinity rules have been defined.

Summary

The acd-manager interacts with core components like PostgreSQL, Kafka, and Redis for data storage, messaging, and caching.
It exposes APIs via the API Gateway and integrates with Zitadel for authentication.
Monitoring and alerting are handled through Prometheus, VictoriaMetrics, Grafana, and Alertmanager.
Supporting services like Confd facilitate configuration management, while Telegraf collects system metrics.