10. Juni 2026

GrafanaCON 2026 in Barcelona: Back Where It All Began

From April 20 to 22, we were in Barcelona for GrafanaCON 2026, Grafana’s biggest community event of the year. The venue choice was fitting: in 2013, Grafana co-founder Torkel Ödegaard wrote the project’s first commit while sick in a Barcelona hotel room during Christmas vacation. More than a decade later, the conference returned to where it started.

Platform Engineering

Observability

Reto Kupferschmid

Platform Architect

Simon Schweizer

Apprentice in Platform Engineering

Inhalt:

Hands-On Lab Keynote Grafana 13 Loki’s New Foundation Scaling Grafana Assistant Further Highlights Conclusion

Hands-On Lab

We arrived in Barcelona on Sunday evening to be ready for Monday’s hands-on lab, a dedicated session called “Alloy in Action: Build telemetry pipelines with OpenTelemetry and Prometheus.” Alloy is Grafana’s open source distribution of the OpenTelemetry Collector.

In this lab, we took control of the operation center for a network of agents, configuring our setup to receive and centralize their telemetry. To get there, the lab walked through building a full telemetry pipeline from scratch: receiving, transforming, and exporting metrics, logs, and traces from a sample application to the respective backends. For anyone who has spent time wiring together separate scrape configs, remote_write targets, and OTel exporters by hand, Alloy’s unified configuration model makes the process much easier to manage. As if learning how to build a proper telemetry pipeline was not enough, we also left with one of the coolest stickers of the entire conference. The workshop material is open source and available on GitHub for anyone who wants to work through it independently.

Keynote

The full recording of the opening keynote is available on YouTube, and Grafana Labs published a detailed write-up of everything announced.
The keynote focused on three themes: easier onboarding, scaling large deployments, and broader platform availability. Grafana Labs also shared updated company metrics, including 35 million active users, over one million companies using Grafana, and $400 million in annual recurring revenue.

The following sections cover the announcements and talks that stood out most to us.

Grafana 13

Grafana 13 was the main product release of the conference. Several features are immediately practical for anyone running Grafana for a team.

Suggested Dashboards

When you connect a data source in Grafana 13, it can now surface pre-built community dashboards compatible with that data source, alongside a compatibility score showing how well each dashboard will work with your specific data. You can preview any of these against your live data before importing. For anyone who has had to start from the community library manually every time a new data source is connected, this removes a meaningful step.

Dashboard Templates

Grafana 13 ships built-in templates based on common methodology frameworks: DORA metrics, RED method, USE method, golden signals, and service health. The integration with Grafana Assistant is what turns these from simple starter templates into something genuinely practical: it can scan your connected data sources, identify relevant metrics, and populate the template with actual queries automatically. The result still requires refinement, but the heavy lifting of mapping your data to a known structure has already been done. Org-defined templates are also coming as an experimental feature, allowing platform teams to publish their own templates for the rest of their Grafana instance.

GitSync

GitSync enables a bidirectional integration between Grafana and a Git repository. Dashboard changes made in the UI can generate a pull request that your team reviews. When a change is merged in the repository, Grafana picks it up and applies it automatically. This makes a proper GitOps workflow for dashboards practical without external tooling or workarounds. GitSync now supports GitHub, GitLab, Bitbucket, and bare Git repositories, and is marked as production-ready in Grafana 13. As a side effect, it also serves as a disaster recovery mechanism: the full state of your dashboards is always in version control.

Dynamic Dashboards

Dynamic dashboards, introduced in the previous release, are now generally available. They support tab-based layouts within a single dashboard, which lets multiple audiences work from the same URL and the same shared context. An engineer, a product manager, and an executive can open the same link during an incident and navigate to the tab relevant to them, with shared filters applying across tabs automatically. Section-level variables let you scope filters to specific tabs or rows rather than the entire dashboard. Combined with auto-grid layout, which removes the manual panel resize work, building a multi-audience dashboard is noticeably less tedious.

The full deep-dive session is available here.

Loki’s New Foundation

The Loki sessions were some of the most technically significant of the conference and focused largely on the architectural direction toward Loki 4.0. Loki’s original architecture was optimized for low-cost log storage by avoiding full indexing and scanning chunks at query time. That model worked well for years, but the shift toward structured logging and high-cardinality queries has pushed the architecture toward its limits.

A major change is the introduction of a new columnar storage format called DataObjects. Instead of storing timestamps, metadata, and raw log lines together, Loki can now read only the specific fields needed for a query. Combined with a redesigned query engine, Grafana Labs reported internal results of up to 20x less data scanned and 10x faster analytical queries.

The ingestion pipeline is also changing. Loki now supports Kafka-backed ingestion to separate reads from writes and prevent expensive queries from impacting ingestion performance. The team was candid about the trade-off: Kafka increases system complexity, but at their scale the previous architecture had already become difficult to operate reliably.

Grafana Labs also announced the acquisition of Logline, a company focused on secondary indexing for log search. Their technology reportedly reduces bytes scanned for full-text search by up to 99%, addressing one of Loki’s long-standing weaknesses.

The full architecture talk is available here.

Scaling Grafana Assistant

The AI session from Ivana Huckova and Yasir Ekinci opened by joking that it was “the mandatory AI talk,” then turned into one of the more technically grounded sessions of the conference. The talk covered the engineering challenges of running a production AI agent at scale, which is a more useful conversation than a feature announcement.

Context Engineering

The Grafana Assistant team structured their approach around three layers of context. The first is what your environment contains: dashboards, data sources, running services. A new “memories” capability handles this layer by running weekly background scans of connected observability data sources, discovering services, grouping them into logical domains, and building a persistent map of what is monitored and how. This means the assistant no longer starts from zero with every conversation. The second layer is what your team and organization knows: runbooks, triage procedures, internal APIs, and institutional knowledge that lives in someone’s head. A new “Skills” feature lets teams encode this as repeatable workflows. When someone asks a relevant question, the assistant searches the skill library and applies matching procedures automatically, even if the user never mentioned a runbook by name. The third layer is what you are doing right now, addressed through context hooks in the UI: one-click actions such as “explain this trace,” the ability to point to a specific panel, and image attachments for describing dashboards visually.

Context window management is handled through deferred tool loading for less common capabilities, context compaction for long conversations, and output summarization for large query results.

Self-Improvement Loop and o11y-bench

The team built a benchmark suite called o11y-bench, published as open source, that runs the assistant against a real Grafana stack and grades it on a defined set of observability tasks. The raw pass rate was acceptable, but consistency across multiple runs of the same task dropped significantly: a problem they described as the “flaky test” problem applied to AI behavior. Their solution was to use coding agents to propose changes to prompts and tool instructions, verify the changes against the benchmark, and merge only improvements. This closed loop makes it possible to iterate on assistant behavior while keeping regressions visible before they reach production.

AI Observability

The product for monitoring AI agents in production started as an internal tool for the Grafana Assistant team. It tracks inputs, outputs, execution traces, latency, cost, and output quality for agent runs in production, and is now in public preview for external use. The gcx CLI tool was also announced, giving engineers a way to query Grafana Cloud from within agentic coding environments like Claude Code or Cursor without switching to a browser tab.

The full talk is available here.

Further Highlights

LEGO’s Dashboard Framework

Paul Farver and Lorenzo Setale from the LEGO Group presented the observability dashboard framework their platform team built using the Grafana Foundation SDK. We already know Paul from the Swiss Cloud Native Day 2025 on the Gurten, and the presentation style is unchanged: technically precise, self-aware, and funnier than most conference talks. His bio on the slide read “YAML Engineer & Minifigure Poser.”

Many platform teams run into the same problem: Pre-built dashboards cannot cover the diversity of what product teams actually want to visualize, but giving every team full dashboard freedom leads to inconsistency, dashboards that cannot be compared across the organization, and a steady stream of “can you fix my dashboard” requests. Their framework is opinionated about how data is presented (layout, panel types, labels, thresholds) but leaves what to present entirely to the consuming team. Teams describe their telemetry requirements, and the framework assembles dashboards from those inputs using the Foundation SDK. The platform team owns the visualization conventions; the product team owns the data.

The clean separation of concerns is what makes it work in practice. Platform teams stop being the bottleneck for dashboard changes, and product teams stop arguing about panel placement and color thresholds, allowing both sides to focus on where they add the most value. The video of the talk has not yet been released at the time of writing.

When Everything Went Wrong (and Ended Up Right)

David Andersson and Nick Moore from Grafana Labs gave one of the more unusual talks at any conference: a candid walkthrough of a real security incident that hit Grafana Labs in April 2025. A misconfigured GitHub Actions workflow using the pull_request_target trigger exposed CI credentials to a forked repository pull request, which was then exploited. The incident came to light when two security team members noticed an unusual alert triggered by a canary token, indicating that all GitHub Secrets had been exfiltrated and that there had been a complete CI/CD compromise.

The tooling used throughout the response included Loki for log analysis, Grafana Cloud IRM for incident coordination, and TruffleHog for credential scanning. Both Gato-X and Zizmor, the open source tools used for GitHub Actions workflow auditing, were also central to the response. Zizmor is worth noting specifically: it is a static analysis tool for GitHub Actions that flags pull_request_target misuse and other workflow patterns that expose secrets to forked pull requests. Adding it to CI is straightforward, and the class of issues it catches is real. The confirmed outcome was no customer data affected, but the more valuable part of the talk is the methodology: transparent tooling, early involvement of open source maintainers during the investigation rather than after the fact, and a detailed account shared publicly. The talk is worth watching for anyone operating public GitHub repositories.

Irish Rail: Open Source Over Proprietary Promises

Irish Rail evaluated multiple vendor solutions for modernizing infrastructure monitoring across their 2,400-kilometer national railway network. Each vendor failed the same way: rigid dashboards, vendor lock-in, and systems that performed well in demonstrations but could not meet actual operational requirements. They ended up building IRIS themselves, on Grafana, MQTT Unified Namespace, and TimescaleDB.

The platform is ISA-95 compliant and integrated with SAP and ServiceNow, monitoring everything from track-side pumps and bridge sensors to platform infrastructure across Irish Rail’s national network. Reduced Mean Time to Notification for safety-critical events means operators are alerted faster, stopping trains before harm occurs and scheduling maintenance before components fail. Built and maintained by a team of three engineers, what began as a single-sensor proof of concept is now the operational foundation of Irish Rail’s monitoring platform.

The talk is a useful counterpoint to the assumption that open source tooling cannot meet enterprise-grade operational requirements. In practice, the open source stack met the operational requirements more effectively than the commercial products Irish Rail evaluated. The recording is available here.

Alloy as an Official OTel Collector Distribution

Alloy now ships with a native OpenTelemetry engine alongside the existing Alloy engine, making it an officially compliant OTel Collector distribution. Both engines run from a single Alloy instance in parallel: the default engine for Prometheus-native pipelines, the OTel engine for OTLP-native workloads. Fleet Management, Grafana’s centralized collector management product, is also being extended via the OpenTelemetry OpAMP protocol to support any OTel Collector distribution, not just Alloy. The service will be free to use. The talk is available here.

Pyroscope 2.0

Pyroscope 2.0 is a rearchitecture that removes write-path replication in favor of direct object storage writes, with stateless querying that scales independently of the write path. In production, Grafana Cloud measured up to 74% reduction in infrastructure cost, primarily in memory and persistent volume. New features enabled by the new architecture include profile exemplars, a heatmap visualization for span-level profiling, and direct navigation from a profiling result to matching traces in Tempo. The release is available now in open source. The talk is available here.

Mimir and Tempo 3.0

Marty Disibio and Marco Pracucci walked through three real production incidents from operating Mimir and Tempo at scale, and the architectural decisions that came out of them. The incidents themselves were instructive: a 40-kilobyte regular expression in a PromQL query brought down ingesters, a write pattern with high-cardinality JSON caused dictionary-encoding memory bloat in Tempo, and slow store-gateway queries blocked fast ingester queries for the same tenant. Both Mimir 3.0 and Tempo 3.0 introduce Kafka-based architectures that decouple the read and write paths to prevent this class of failure. A streaming query engine in Mimir 3.0 processes data without fully loading it into memory, reducing latency on long time-range and high-cardinality queries. The talk is available here.

Community Party

Tuesday evening was the community party, and we ended up staying out later than planned, as tends to happen at these things. Barcelona makes it easy. After a full day of sessions, it was good to step away from slides and spend time talking with people from across the community. We ran into familiar faces from the broader community, and had the kind of conversations that do not fit neatly into a conference agenda, but are often the reason you actually show up in person.

Conclusion

GrafanaCON 2026 felt less like a conference built around announcements and more like a reflection of an ecosystem maturing under real-world scale. The strongest talks focused on practical engineering problems: what breaks, which trade-offs stop working, and how systems evolve once they reach production at significant scale.

That theme appeared everywhere, from Loki’s redesign and Grafana Assistant to community talks from LEGO and Irish Rail. The strongest talks were not marketing presentations; they were detailed accounts of systems, constraints, and lessons learned in production.

The GrafanaCON talks will be published gradually on the Grafana events page. If any of the sessions above caught your attention, you can sign up for recording notifications directly on the agenda page.

Mehr Wissen