IT Brief Canada - Technology news for CIOs & IT decision-makers
Flux result a88bd669 0a78 4797 8121 1e5bd933fdcd

Komodor to host AI SRE summit on ops & reliability

Thu, 23rd Apr 2026 (Today)

Komodor will host an online AI SRE Summit focused on how artificial intelligence is being used in site reliability engineering. Speakers will include representatives from AWS, Salesforce, Honeycomb, Man Group, Smarsh and other companies.

The summit will examine where AI is delivering measurable results in production operations and where it is adding complexity instead. The agenda will focus on incident response, observability, platform design, cost control and self-healing operations.

The event reflects a broader debate across cloud and infrastructure teams about AI's practical role in day-to-day operations. Engineering groups are under pressure to manage more complex cloud-native environments, rising volumes of telemetry and faster software release cycles without increasing headcount at the same pace.

Several sessions are built around the tension between automation and operational discipline. One panel, "AI in SRE: Hype vs. Reality," will bring together Stefana Muller, VP of Infrastructure and Operations at Salesforce; Charity Majors, CTO and co-founder of Honeycomb; Itiel Shwartz, CTO and co-founder of Komodor; and Sharone Zitzman, DevRel at RTFM Please.

The discussion will cover AI's role in incident response, root cause analysis and remediation. Another talk, led by Corey Quinn, chief cloud economist at Duckbill, will examine the relationship between AI spending, reliability and the practical challenge of scaling AI systems in production.

Cost and complexity

The agenda also highlights concerns that AI tools can worsen operational problems when deployed on weak foundations. Brittany Woods, head of systems engineering at Man Group, is scheduled to discuss the limits of layering AI onto fragmented internal platforms. Other sessions will look at observability for AI data pipelines, the design of AI agents and questions of ownership in production systems as AI-generated code becomes more common.

These topics point to a maturing discussion in the infrastructure market. Early enthusiasm around AI in operations often centred on broad promises of automation, but platform teams are increasingly asking more specific questions about reliability, governance and cost.

Komodor, which develops software for cloud-native operations, is positioning the summit around those practical concerns. The programme is intended for SREs, platform engineers, DevOps teams, cloud architects, engineering managers and operations teams assessing how AI should be used in production environments.

Other speakers include David Aronchick, CEO of Expanso; Viktor Farcic, DevRel at Upbound; Guy Menahem, solutions architect at AWS; Blake Sherwood, technology and product executive at Smarsh; Alan Shimel, CEO of Techstrong Group; Andrew Espira, founding engineer and co-founder at Kustode; and Parakh Jaggi, senior infrastructure engineer at Tavily, now Nebius.

Operational questions

The session titles suggest a focus on unresolved operational questions rather than a simple endorsement of AI tools. Themes include the observability requirements of AI data pipelines, the trade-off between infrastructure cost and latency, the construction of AI agents for operational work, and the role of context in efforts to create self-healing systems.

That emphasis comes as many organisations are still deciding how much authority to hand to AI systems in production settings. In SRE environments, where outages and misconfigurations can have immediate financial and customer consequences, the threshold for trust is high and the need for clear accountability remains central.

Komodor positions the event as a source of practical insight rather than broad claims of transformation. The company says it has raised USD $90 million in venture funding and sells tools designed to help enterprises manage uptime, cloud costs and operational workflows across cloud-native infrastructure.

The summit's speaker list also shows how discussion of AI in operations now spans cloud providers, software vendors, financial services groups and specialist infrastructure companies. That range reflects how far SRE practices have moved from a niche engineering discipline to a core concern for large organisations running distributed software systems.

By focusing on incident response, observability and platform readiness, the event is likely to appeal to teams trying to separate useful automation from additional tooling overhead. The central question running through the programme is not whether AI has a role in SRE, but under what conditions it can reduce manual work without creating new operational risk.