Subscribe to the RSS feed

Modern applications and infrastructure are complex, distributed systems, making comprehensive visibility essential for maintaining performance, reliability, and cost efficiency. Red Hat observability provides the tools and capabilities needed to gain deep insights into your environments.

We're excited to highlight recent advancements in observability across Red Hat OpenShift and Red Hat Advanced Cluster Management for Kubernetes observability components. These updates, aligning with Red Hat OpenShift 4.19 and Red Hat Advanced Cluster Management 2.14 capabilities, introduce enhanced network monitoring, streamlined incident analysis, and intelligent resource optimization, empowering operations teams with better clarity and control.

Let's dive into the details of these new features.

New observability features in Red Hat OpenShift 

The latest updates bring significant enhancements to monitoring and troubleshooting directly within OpenShift.

Cluster observability operator 1.1 and 1.2

The cluster observability operator (COO) is an optional but highly valuable addition that expands the platform's monitoring and observability features considerably, going beyond the standard, preconfigured set offered by the cluster monitoring operator (CMO). The primary purpose of the COO is to support the deployment of standalone, independently configurable monitoring stacks, and to deploy observability UI plugins and related analytics features. 

In the previous blog post, we announced the general availability of COO with the 1.0 release. Since then, 2 major releases have been shipped: COO 1.1, and more recently, COO 1.2.

Key updates in COO 1.1 include:

  • Installation of the monitoring UI plugin is possible using COO
  • Incident detection can be enabled within the monitoring UI plugin
  • TLS support has been implemented for the Thanos web endpoint

Key updates in COO 1.2 include:

  • OTEL support in Observe>Logging UI
  • Perses-powered Accelerators dashboard is now included by default
  • Signal correlation: Troubleshooting UI displays multiple results per graph node
  • Direct navigation to individual incident details is now supported, enabling the Incidents overview in ACM 2.14
  • General availability of the Observe>Traces UI, specifically: Scatter plot, trace table, and Gantt chart
  • Advanced filtering in Observe>Traces UI with dedicated filter bar

OpenShift monitoring

The OpenShift 4.19 release brings a solid set of improvements to the platform’s observability capabilities. At the center of these updates is the integration of Prometheus 3.x, which introduces important changes to the engine’s performance, flexibility, and long-term roadmap. This new version simplifies the codebase by removing deprecated and experimental flags, improves memory efficiency, and lays the groundwork for better alignment with OpenTelemetry standards.

Several enhancements are worth highlighting. Remote Write 2.0 improves how Prometheus streams data to external systems, especially in terms of durability and efficiency. Native histogram support has been expanded to allow out-of-order ingestion, which is particularly helpful in edge deployments or when metrics are exported through OpenTelemetry. On top of that, Prometheus now accepts UTF-8 in metric names and labels.

Scrape profiles, which previously existed behind a feature gate, are now generally available (GA). These profiles give platform teams more control over what gets collected and when, making it easier to reduce noise and optimize Prometheus resource usage. This is especially valuable in environments with varied workloads or strict performance requirements.

Another practical addition is support for sending alerts to external Alertmanager instances using a proxy URL. This makes it simpler to manage centralized alerting across clusters or hybrid environments where direct network access may not be available.

Beyond the major features, the team has also delivered refinements to the existing alert rules. Many have been updated or extended with additional runbooks to help admins take faster, more informed action. Core components of the monitoring stack have also been updated to their latest stable versions to ensure ongoing compatibility and performance.

OpenShift Logging

Red Hat OpenShift Logging in its 6.3 version is also receiving several enhancements, making log management more flexible and powerful. The Cluster Logging Operator will now offer expanded Splunk metadata keys, simplifying log categorization and correlation for users using Splunk as their primary log analysis tool. For cloud environments, Cluster Logging will introduce support for multiple CloudWatch outputs with STS authentication, providing greater flexibility and security for sending logs to Amazon CloudWatch across different accounts or regions. Furthermore, Loki, a key log storage component of OpenShift Logging, will gain virtual host style configuration, allowing for more intuitive and organized management of log streams. In a forward-looking move, Loki will also introduce resource limits in tech preview, allowing administrators to better control resource consumption and ensure stable performance for their logging infrastructure.

OpenTelemetry and tracing

Red Hat OpenTelemetry is enhancing its capabilities with several key components moving to GA. Among these is the Prometheus Receiver, which streamlines metric ingestion by allowing direct integration into the OpenTelemetry pipeline. This is crucial for environments reliant on Prometheus for their metrics collection, allowing consistent data flow for comprehensive observability with monitoring tools supporting Otel protocol. Additionally, the Attributes/ResourceAttributes Processor has reached GA, providing powerful functionality for enriching spans and metrics with additional metadata. This allows users to add contextual information, such as host details, application versions, or environment tags, making it much easier to filter, analyze, and troubleshoot issues. 

The Kafka Exporter is also GA. This component is essential for building scalable and event-driven observability backends, allowing OpenTelemetry data to be reliably exported to Kafka for subsequent processing, analysis, or long-term storage. Beyond these GA releases, Red Hat OpenTelemetry is introducing a Tail-Based Sampling Processor in tech preview. This advanced sampling method allows for intelligent decision-making on which traces to retain after they have been created. By evaluating the full context of a span, such as whether it contains errors or high latency, this processor helps reduce data noise and storage costs while ensuring that the most relevant and actionable traces are preserved for in-depth analysis.

A number of new features have evolved in Distributed Tracing. The most significant new feature is the introduction of fine-grained Role-Based Access Control (RBAC) for trace data. This empowers teams to manage who can access sensitive trace information, a critical capability for maintaining security and compliance in regulated industries or multitenant cloud environments. Furthermore, short-lived token support has been integrated for Tempo backends running on Google Cloud Platform (GCP) and Microsoft Azure. 

Network Observability 1.8 enhancements

Red Hat Network Observability 1.8 introduces a suite of new features, with many optimized for OpenShift 4.18 and requiring OVN-Kubernetes as the CNI. A significant advancement is Packet Translation, now GA, which clarifies traffic flows by adding translated namespace and pod details to flow tables, simplifying the tracing of communication paths. This version also delivers substantial eBPF Resource Reductions (GA), demonstrating CPU and memory savings ranging from 40% to 57% for eBPF itself, and overall Network Observability savings of 11% to 25%. These efficiencies were achieved through optimizations in hash map usage and concurrency algorithms, along with improved data de-duplication.

Further enhancing network insights, Network Observability 1.8 brings several features in various preview stages. Network events for monitoring network policies (tech preview) provides detailed insight into packet behavior, including reasons for drops and policy-based allowances/denials, critical for troubleshooting. eBPF flow filtering enhancements (developer preview) removes previous limitations by allowing up to 16 CIDR-based rules and introduces options for peerCIDR and per-rule sampling rates. For advanced network segmentation, UDN observability (developer preview) adds support for User Defined Networks, enabling filtering by UDN labels and displaying secondary networks in the Topology tab. Last, eBPF manager support (developer preview) allows Network Observability to use the eBPF manager operator for loading eBPF programs, aiming to reduce attack surface and prevent conflicts. 

What's new in Network Observability 1.8

Network Observability CLI 1.8 features

The Network Observability CLI, a kubectl plugin, has been significantly enhanced in version 1.8, offering a lightweight yet powerful way to manage network observability directly from the command line. New features include a background option for running flow or packet captures non-interactively, and customizable namespaces for parallel capture operations using NETOBSERV_NAMESPACE. For OpenShift clusters, the CLI can now label subnets (machines, pods, services) automatically with SrcSubnetLabel and DstSubnetLabel using the --get-subnets option. Enhanced data filtering allows for precise agent deployment via --node-selector and fine-tuning captured data with options like protocol, port, and regex-based filtering on enriched content, supporting up to 16 filter sets. A unified collector user experience ensures consistent output for both flow and packet captures by integrating the flowlogs-pipeline component into eBPF agents. Furthermore, users can now capture metrics on OpenShift using the metrics command, which creates a ServiceMonitor for Prometheus integration, with results visualized in an automatically generated NetObserv / On Demand dashboard. Finally, the enhanced help feature provides comprehensive examples and details for all commands and options.

7 new features in the Network Observability CLI 1.8

Incident detection for OpenShift (technology preview)

Incident detection is now available as a technology preview within the OpenShift web console monitoring UI plug-in. Included with the cluster observability operator version 1.1.0+, this feature helps address alert storms by grouping related alerts into incidents (single cluster). This improves the "employee experience" for on-call engineers and allows faster restoration of service for customers. The grouping is primarily based on temporal correlation, with potential evolution to include other factors over time. The Observe>Incidents UI provides a color-coded timeline by severity and categorizes alerts by affected OpenShift components. Users can filter incidents by severity, state (firing, resolved), and time window. Clicking an incident allows drilling down into alerts via timeline or component views. The feature can be installed via OperatorHub and enabled through a UIPlugin custom resource.

Incident detection for OpenShift tech preview is here

LLM observability with Dynatrace on OpenShift AI

Red Hat currently has multiple efforts in the AI ecosystem, specifically Red Hat OpenShift AI is a platform for managing the lifecycle of predictive and generative AI (gen AI) models, at scale, across hybrid cloud environments. It’s built on top of OpenShift and uses a lot of the observability features from the underlying platform. Together with Dynatrace we’ve worked on improvements of the OpenTelemetry instrumentation for gen AI frameworks and close integration with Dynatrace observability platform to allow users to focus on reliability, cost, safety, and continuous improvement of LLMs, agentic frameworks, and Vector DBs. LLMs introduce unique monitoring challenges due to their complex, stateful nature, resource demands, and probabilistic outputs. In the blog post from Twinkll Sisodia and Pavol Loffay you’ll find more details about the architectural setup, environment configuration, and the use of OpenTelemetry Collector to gather and forward crucial metrics—such as token generation, latency, throughput, and GPU utilization—to Dynatrace for real-time analysis.

Implement LLM observability with Dynatrace on OpenShift AI

New observability features in Red Hat Advanced Cluster Management for Kubernetes

Red Hat Advanced Cluster Management for Kubernetes brings multicluster capabilities to observability, enhancing visibility and management across environments.

Right-Sizing for namespaces (technology preview)

The Right-Sizing recommendation feature for namespaces is available as technology preview with Red Hat Advanced Cluster Management 2.14. This feature helps identify over-provisioned or under-utilized resources in managed clusters. It analyzes resource consumption and suggests optimal CPU and memory values for namespaces/clusters, reducing the risk of dealing with unnecessary costs.

The feature is enabled via the MultiClusterObservability Custom Resource and uses a Grafana dashboard ("ACM Right-Sizing Namespace") for display. Configuration options include placement, PrometheusRule settings (like namespace and label filters, recommendation percentages), and namespace binding. Prerequisites include having Red Hat Advanced Cluster Management and Multicluster Observability installed on the Hub cluster with proper permissions. 

Incident detection (Developer Preview)

Red Hat Advanced Cluster Management 2.14 includes a developer preview of incident detection. This feature helps pinpoint the underlying causes of issues by aggregating related alerts into manageable incidents. This capability helps users make better sense of situations so they can better understand the state of their environment and better determine where changes need to be made instead of them being overwhelmed by alert storms. Currently, manual installation of CustomResourceDefinitions (CRDs) on the Hub cluster is necessary before incident detection can be enabled in the MultiClusterObservability custom resource. After enabling, the cluster observability operator is installed on spoke clusters, and an "Incidents" menu will appear in the Red Hat Advanced Cluster Management Grafana instance. The incidents overview presents a table of active incidents, prioritized by severity, along with a graph illustrating the trend of incidents over time. Detailed incident information can be found in the OpenShift web console (version 4.19 or later, with cluster observability operator version 1.2 or later).

Wrap up

Ready to explore these new features? Visit the redhat.com/observability and documentation pages to learn more and get started with the latest observability tools in OpenShift. The Red Hat Developers Observability page also contains information to help you learn about and implement observability capabilities.

We value your feedback! Share your thoughts and suggestions using the Red Hat OpenShift feedback form.

product trial

Red Hat OpenShift Container Platform | Product Trial

A consistent hybrid cloud foundation for building and scaling containerized applications.

About the authors

Vanessa is a Senior Product Manager in the Observability group at Red Hat, focusing on both OpenShift Analytics and Observability UI. She is particularly interested in turning observability signals into answers. She loves to combine her passions: data and languages.

Read full bio

Jamie Parker is a Product Manager at Red Hat who specializes in Observability, particularly in the Logging and OpenStack areas. At Red Hat, Jamie works with organizations and customers to learn about their needs within the ever changing Observability landscape, and based on their feedback, helps to guide upcoming products within the Red Hat Observability Platform. Jamie enjoys sharing lessons learned to the community by frequently speaking at meetups and conferences, and by blogging.

Read full bio

Deepthi Dharwar is currently a Principal Product Manager at Red Hat, where she is responsible for OpenShift Networking .She leads product development for OpenShift Software-Defined Networking, Ingress/Egress, DNS, and Network Observability, while also shaping the product strategy and roadmap. Deepthi loves exploring emerging technologies and is passionate about how technology can drive innovation and solve real-world problems.

Read full bio

Roger Florén, a dynamic and forward-thinking leader, currently serves as the Principal Product Manager at Red Hat, specializing in Observability. His journey in the tech industry is marked by high performance and ambition, transitioning from a senior developer role to a principal product manager. With a strong foundation in technical skills, Roger is constantly driven by curiosity and innovation. At Red Hat, Roger leads the Observability platform team, working closely with in-cluster monitoring teams and contributing to the development of products like Prometheus, AlertManager, Thanos and Observatorium. His expertise extends to coaching, product strategy, interpersonal skills, technical design, IT strategy and agile project management.

Read full bio

Browse by channel

automation icon

Automation

The latest on IT automation for tech, teams, and environments

AI icon

Artificial intelligence

Updates on the platforms that free customers to run AI workloads anywhere

open hybrid cloud icon

Open hybrid cloud

Explore how we build a more flexible future with hybrid cloud

security icon

Security

The latest on how we reduce risks across environments and technologies

edge icon

Edge computing

Updates on the platforms that simplify operations at the edge

Infrastructure icon

Infrastructure

The latest on the world’s leading enterprise Linux platform

application development icon

Applications

Inside our solutions to the toughest application challenges

Virtualization icon

Virtualization

The future of enterprise virtualization for your workloads on-premise or across clouds