OpenTelemetry Collector come backbone dell'osservabilità enterprise su Kubernetes multi-cluster

OpenTelemetry Collector as the Backbone of Enterprise Observability on Multi-Cluster Kubernetes

2026-05-01

OpenTelemetry Collector come backbone dell'osservabilità enterprise su Kubernetes multi-cluster

Sommario

Nella maggior parte delle organizzazioni enterprise con decine di cluster Kubernetes, lo stack di osservabilità si è costruito per accrezione: Prometheus per le metriche, Jaeger o Zipkin per i trace, Fluent Bit o Fluentd per i log, ciascuno con il proprio agent, il proprio formato e la propria catena di configurazione. OpenTelemetry Collector in modalità gateway introduce un'alternativa concreta per consolidare questa frammentazione, usando OTLP come protocollo unico e abilitando il fan-out verso backend multipli da un singolo punto di raccolta. La convergenza non è gratuita: esige decisioni precise su Semantic Conventions, cardinalità delle metriche e strategia di migrazione da un Prometheus-centric stack, pena incompatibilità silenziosamente distruttive su dashboard e alert rule in produzione.

Il debito operativo dello stack di osservabilità per accrezione

Chi gestisce piattaforme Kubernetes a scala enterprise sa che lo stack di osservabilità non nasce mai da un progetto coerente. Si stratifica per necessità: Prometheus arriva prima perché è lo standard de facto per le metriche Kubernetes, Jaeger o Tempo compaiono quando i log da soli non spiegano più i problemi di latenza, un aggregatore di log segue quando il volume di eventi rende impossibile il debugging senza indicizzazione strutturata.

Il risultato è invariabilmente lo stesso: tre DaemonSet per nodo, tre formati di dati da gestire, tre pipeline di configurazione distribuite tramite Helm o Kustomize, e tre set di regole di governance che raramente dialogano tra loro. In un contesto bancario con 30 o 50 cluster, la situazione degenera: i segnali non si correlano tra loro perché non condividono lo stesso trace_id o gli stessi attributi di risorsa, gli on-call si trovano a navigare tre console diverse durante un incident, e nessun team riesce a governare l'intero stack nella sua interezza.

La risposta istintiva, adottare una piattaforma APM commerciale come Datadog o Elastic APM, sposta il problema verso un vendor lock-in che in contesti regolamentati si scontra con requisiti di sovranità del dato e SLA contrattuali difficili da negoziare con i fornitori cloud. La risposta strutturale è standardizzare il layer di raccolta.

OpenTelemetry Collector in modalità gateway

Il progetto OpenTelemetry ha raggiunto stabilità GA su tutti e tre i segnali — trace, metriche e log — e il Collector è il componente che rende operativa la convergenza. L'architettura si articola in tre primitive composte in pipeline dichiarative:

Receivers: acquisiscono dati da qualsiasi sorgente, inclusi protocolli legacy come Prometheus scrape, StatsD, Zipkin Thrift e Jaeger, oltre al nativo OTLP via gRPC o HTTP
Processors: trasformano, filtrano e arricchiscono i dati prima dell'export (attribute renaming, tail-based sampling, memory limiter, batch)
Exporters: inviano i dati a uno o più backend simultaneamente — Prometheus Remote Write, OTLP, Loki, Elasticsearch, GCP Cloud Monitoring — da una singola pipeline

In un ambiente multi-cluster enterprise, il pattern più robusto è a due livelli. Al primo livello, un DaemonSet del Collector su ogni nodo raccoglie i segnali locali e li forwarda tramite OTLP al gateway. Al secondo livello, un Deployment del Collector per cluster riceve da tutti gli agent, applica i processor centralizzati e distribuisce ai backend. Questa separazione è importante: gli agent rimangono leggeri e stateless, mentre tutta la logica di governance — sampling, routing, enrichment — è concentrata nel gateway e versionata come codice.

Un esempio di configurazione del gateway per instradare i segnali verso backend differenziati per namespace:

service:
  pipelines:
    metrics/payments:
      receivers: [otlp]
      processors: [memory_limiter, filter/payments-ns, batch]
      exporters: [prometheusremotewrite/payments, otlp/siem]

    metrics/audit:
      receivers: [otlp]
      processors: [memory_limiter, filter/audit-ns, batch]
      exporters: [prometheusremotewrite/long-retention, otlp/siem]

Il memory_limiter va sempre inserito come primo processor: durante un incident il volume di telemetria può triplicare, e senza questo controllo il gateway stesso diventa un collo di bottiglia nel momento in cui l'osservabilità serve di più.

La convergenza con Prometheus e il rischio delle Semantic Conventions

Il punto più critico dell'introduzione di OpenTelemetry in uno stack già Prometheus-centric riguarda i nomi delle metriche. Le OpenTelemetry Semantic Conventions 1.x definiscono una nomenclatura diversa da quella consolidata nell'ecosistema Prometheus: http.server.request.duration al posto di http_request_duration_seconds, db.client.operation.duration al posto dei nomi custom di ogni client library. Quando le applicazioni vengono strumentate con gli SDK OpenTelemetry e i dati transitano per il Collector prima di arrivare a Prometheus, le dashboard e le alert rule preesistenti smettono di funzionare senza che ci sia alcun errore esplicito.

Il parametro add_metric_suffixes nell'esporter Prometheus del Collector controlla se aggiungere automaticamente il suffisso _total ai counter e i suffissi di unità di misura. Il comportamento di default cambia tra le versioni del Collector, e questo può rompere selettivamente alcune rule ma non altre, creando inconsistenze difficili da diagnosticare in produzione.

Esistono due strategie pragmatiche per gestire la transizione:

Shadowing parallelo: mantenere il Prometheus scrape preesistente affiancato al nuovo flusso OTLP per alcune settimane, confrontando i valori delle metriche equivalenti su ambienti di staging con query absent() e delta() prima di effettuare il cutover in produzione
Mapping layer nel Collector: usare il processor metricstransform per rinominare esplicitamente le metriche OpenTelemetry alle convenzioni Prometheus già in uso, preservando la compatibilità delle dashboard senza modificare le applicazioni strumentate

La seconda opzione comporta un costo di manutenzione sulla mapping table, ma è l'unica che garantisce impatto zero su SLO e alert durante la migrazione su sistemi critici.

Governance della cardinalità e retention differenziata

In un ambiente bancario, l'osservabilità ha due vincoli che raramente si incontrano in ambienti cloud-native standard.

Il primo è la cardinalità delle metriche. Gli SDK OpenTelemetry, in particolare per Java e .NET, producono metriche con un set ricco di attributi — http.route, http.method, http.response.status_code, db.system, db.name, server.address. Ogni combinazione unica di attributi genera una serie temporale distinta. In un sistema con 200 microservizi, 50 route per servizio e un backend Prometheus non dimensionato per questa cardinalità, la quantità di serie temporali esplode a diversi milioni in poche ore dal rollout. Il processor metricstransform può aggregare o rimuovere attributi ad alta cardinalità prima dell'export, ma richiede una comprensione preventiva di quali attributi hanno effettivo valore diagnostico e quali sono rumore. Questa analisi va fatta prima del rollout, non come remediation.

Il secondo vincolo è la retention differenziata. Alcune metriche — quelle legate a transazioni, audit trail e performance dei sistemi critici — devono essere mantenute per mesi o anni per rispettare i requisiti normativi. Altre, come le metriche di debug dei pod, diventano irrilevanti dopo 7-14 giorni. Il fan-out del gateway Collector permette di implementare questa logica nativamente senza duplicare l'infrastruttura:

exporters:
  prometheusremotewrite/ops:
    endpoint: https://prometheus-ops.internal/api/v1/write
    resource_to_telemetry_conversion:
      enabled: true

  prometheusremotewrite/audit:
    endpoint: https://thanos.internal/api/v1/write
    headers:
      X-Scope-OrgID: "audit-long-retention"

Le metriche operative raggiungono un Prometheus con retention di 15 giorni ottimizzato per query veloci. Le metriche di audit, estratte dalla stessa pipeline tramite un secondo exporter, finiscono in Thanos o Mimir con object storage e retention configurabile per anno, senza richiedere un secondo agent o una seconda strumentazione nelle applicazioni.

Conclusione

OpenTelemetry Collector in modalità gateway non è una sostituzione plug-and-play dello stack esistente — è una decisione architetturale che riduce la complessità operativa nel lungo periodo ma richiede un piano di migrazione preciso e una governance proattiva della cardinalità. Per un team di platform engineering che gestisce decine di cluster in un ambiente regolamentato, il valore principale non è la semplificazione immediata ma la standardizzazione del piano di raccolta: un solo DaemonSet per nodo, un solo protocollo wire (OTLP), un unico punto dove applicare regole di sampling, routing e retention come codice versionato in Git.

Gli errori più comuni da evitare sono due: non pianificare la compatibilità con le Semantic Conventions prima di portare i dati in produzione — dove i nomi delle metriche cambiano silenziosamente senza errori visibili — e non configurare il memory_limiter nel gateway, trasformando lo strumento di osservabilità nell'origine del prossimo incident.

Summary

In most enterprise organizations with dozens of Kubernetes clusters, the observability stack has grown organically: Prometheus for metrics, Jaeger or Zipkin for traces, Fluent Bit or Fluentd for logs, each with its own agent, its own format, and its own configuration chain. OpenTelemetry Collector in gateway mode introduces a concrete alternative to consolidate this fragmentation, using OTLP as a single protocol and enabling fan-out to multiple backends from a single collection point. Convergence is not free: it demands precise decisions on Semantic Conventions, metric cardinality, and migration strategy from a Prometheus-centric stack, otherwise silently destructive incompatibilities will occur on production dashboards and alert rules.

The Operational Debt of an Accretive Observability Stack

Anyone managing Kubernetes platforms at enterprise scale knows that the observability stack never originates from a coherent project. It layers on as needed: Prometheus arrives first because it’s the de facto standard for Kubernetes metrics, Jaeger or Tempo appear when logs alone no longer explain latency issues, a log aggregator follows when the event volume makes debugging impossible without structured indexing.

The result is invariably the same: three DaemonSets per node, three data formats to manage, three configuration pipelines distributed via Helm or Kustomize, and three sets of governance rules that rarely talk to each other. In a banking context with 30 or 50 clusters, the situation deteriorates: signals don’t correlate because they don’t share the same trace_id or the same resource attributes, on-call engineers find themselves navigating three different consoles during an incident, and no team can govern the entire stack as a whole.

The instinctive response, adopting a commercial APM platform like Datadog or Elastic APM, shifts the problem to vendor lock-in, which in regulated contexts clashes with data sovereignty requirements and difficult-to-negotiate contractual SLAs with cloud providers. The structural response is to standardize the collection layer.

OpenTelemetry Collector in Gateway Mode

The OpenTelemetry project has reached GA stability on all three signals – traces, metrics, and logs – and the Collector is the component that makes convergence operational. The architecture is articulated in three primitives composed in declarative pipelines:

Receivers: acquire data from any source, including legacy protocols like Prometheus scrape, StatsD, Zipkin Thrift, and Jaeger, as well as native OTLP via gRPC or HTTP
Processors: transform, filter, and enrich data before export (attribute renaming, tail-based sampling, memory limiter, batch)
Exporters: send data to one or more backends simultaneously – Prometheus Remote Write, OTLP, Loki, Elasticsearch, GCP Cloud Monitoring – from a single pipeline

In a multi-cluster enterprise environment, the most robust pattern is a two-tier one. At the first tier, a Collector DaemonSet on each node collects local signals and forwards them via OTLP to the gateway. At the second tier, a Collector Deployment per cluster receives from all agents, applies centralized processors, and distributes to backends. This separation is important: the agents remain lightweight and stateless, while all governance logic – sampling, routing, enrichment – is concentrated in the gateway and versioned as code.

An example of a gateway configuration to route signals to differentiated backends per namespace:

service:
  pipelines:
    metrics/payments:
      receivers: [otlp]
      processors: [memory_limiter, filter/payments-ns, batch]
      exporters: [prometheusremotewrite/payments, otlp/siem]

    metrics/audit:
      receivers: [otlp]
      processors: [memory_limiter, filter/audit-ns, batch]
      exporters: [prometheusremotewrite/long-retention, otlp/siem]

The memory_limiter should always be inserted as the first processor: during an incident, the volume of telemetry can triple, and without this control the gateway itself becomes a bottleneck when observability is needed most.

Convergence with Prometheus and the Risk of Semantic Conventions

The most critical point of introducing OpenTelemetry into an already Prometheus-centric stack concerns metric names. The OpenTelemetry Semantic Conventions 1.x define a nomenclature different from that consolidated in the Prometheus ecosystem: http.server.request.duration instead of http_request_duration_seconds, db.client.operation.duration instead of the custom names of each client library. When applications are instrumented with the OpenTelemetry SDKs and data transits through the Collector before reaching Prometheus, pre-existing dashboards and alert rules stop working without any explicit error.

The add_metric_suffixes parameter in the Collector’s Prometheus exporter controls whether to automatically add the _total suffix to counters and unit-of-measure suffixes. The default behavior changes between Collector versions, and this can selectively break some rules but not others, creating inconsistencies that are difficult to diagnose in production.

There are two pragmatic strategies to manage the transition:

Parallel Shadowing: maintain the pre-existing Prometheus scrape alongside the new OTLP flow for a few weeks, comparing the values of equivalent metrics on staging environments with absent() and delta() queries before performing the cutover in production
Mapping Layer in the Collector: use the metricstransform processor to explicitly rename OpenTelemetry metrics to the Prometheus conventions already in use, preserving the compatibility of dashboards without modifying instrumented applications

The second option involves a maintenance cost on the mapping table, but it is the only one that guarantees zero impact on SLOs and alerts during migration on critical systems.

Governance of Cardinality and Differentiated Retention

In a banking environment, observability has two constraints that are rarely encountered in standard cloud-native environments.

The first is the cardinality of metrics. The OpenTelemetry SDKs, particularly for Java and .NET, produce metrics with a rich set of attributes – http.route, http.method, http.response.status_code, db.system, db.name, server.address. Each unique combination of attributes generates a distinct time series. In a system with 200 microservices, 50 routes per service, and a Prometheus backend not dimensioned for this cardinality, the number of time series explodes to several million within hours of the rollout. The metricstransform processor can aggregate or remove high-cardinality attributes before export, but requires a prior understanding of which attributes have actual diagnostic value and which are noise. This analysis must be done before the rollout, not as remediation.

The second constraint is differentiated retention. Some metrics – those related to transactions, audit trails, and performance of critical systems – must be maintained for months or years to meet regulatory requirements. Others, such as debug metrics for pods, become irrelevant after 7-14 days. The Collector gateway fan-out allows implementing this logic natively without duplicating infrastructure:

exporters:
  prometheusremotewrite/ops:
    endpoint: https://prometheus-ops.internal/api/v1/write
    resource_to_telemetry_conversion:
      enabled: true

  prometheusremotewrite/audit:
    endpoint: https://thanos.internal/api/v1/write
    headers:
      X-Scope-OrgID: "audit-long-retention"

Operational metrics reach a Prometheus with a 15-day retention optimized for fast queries. Audit metrics, extracted from the same pipeline via a second exporter, end up in Thanos or Mimir with object storage and configurable retention for a year, without requiring a second agent or second instrumentation in the applications.

Conclusion

OpenTelemetry Collector in gateway mode is not a plug-and-play replacement for the existing stack – it is an architectural decision that reduces operational complexity in the long run but requires a precise migration plan and proactive governance of cardinality. For a platform engineering team managing dozens of clusters in a regulated environment, the main value is not immediate simplification but standardization of the collection plane: a single DaemonSet per node, a single wire protocol (OTLP), a single point where to apply sampling, routing, and retention rules as code versioned in Git.

The most common mistakes to avoid are two: not planning compatibility with Semantic Conventions before bringing data into production – where metric names change silently without visible errors – and not configuring the memory_limiter in the gateway, turning the observability tool into the source of the next incident.