Staff Software Engineer, Agentic Applications - Grafana Ops, AI/ML
This is a remote opportunity and we would be interested in applicants from USA time zones only at this time.
Staff Engineer – Agentic Applications
The Opportunity:
At Grafana, we build observability tools that help users understand, respond to, and improve their systems – regardless of scale, complexity, or tech stack. The Grafana AI teams play a key role in this mission by helping users make sense of complex observability data through AI-driven features. These capabilities reduce toil, lower the barrier of domain expertise, and surface meaningful signals from noisy environments.
We're looking for an experienced engineer with a background in agentic applications to help us redefine how users interact with observability systems. In this role, you’ll develop AI-powered workflows that can: Support incident response by querying telemetry and other analysis tools to investigate alerts and suggest actions. Evaluate overall infrastructure and/or observability quality and help automate meaningful improvements. Expand the capabilities of agents across the observability stack.
What You’ll Be Doing:
Design and evolve systems that use intelligent agents that assist users in detecting, triaging, and resolving incidents using observability data and tools.
Build end-to-end workflows for incident lifecycle management, powered by LLMs or agentic technologies.
Implement agentic systems to automate bulk workloads (such as large-scale analyses and report generation), suggest improvements, and surface likely root causes of failures.
Integrate agentic components with internal tools, runbooks, alerting systems, incident platforms, and development environments.
What Makes You a Great Fit:
Experience working in environments with rapid iteration and experimental development.
A pragmatic mindset that values developer experience, reproducibility, and thoughtful trade-offs when scaling GenAI systems.
A passion for minimizing human toil and building AI systems that actively support engineers.
Bonus Points For:
Build APIs and orchestration logic for chaining agent actions into multi-step workflows.
Integration of agentic applications with existing platforms and third-party tools.
Experience with multi-agent systems is a plus.
Compensation & Rewards:
In the United States, the Base compensation range for this role is USD 168,256 - USD 201,907. Actual compensation may vary based on level, experience, and skillset as assessed in the interview process. Benefits include equity, bonus (if applicable) and other benefits listed here.
All of our roles include Restricted Stock Units (RSUs), giving every team member ownership in Grafana Labs' success. We believe in shared outcomes—RSUs help us stay aligned and invested as we scale globally.
About the job
Apply for this position
Staff Software Engineer, Agentic Applications - Grafana Ops, AI/ML
This is a remote opportunity and we would be interested in applicants from USA time zones only at this time.
Staff Engineer – Agentic Applications
The Opportunity:
At Grafana, we build observability tools that help users understand, respond to, and improve their systems – regardless of scale, complexity, or tech stack. The Grafana AI teams play a key role in this mission by helping users make sense of complex observability data through AI-driven features. These capabilities reduce toil, lower the barrier of domain expertise, and surface meaningful signals from noisy environments.
We're looking for an experienced engineer with a background in agentic applications to help us redefine how users interact with observability systems. In this role, you’ll develop AI-powered workflows that can: Support incident response by querying telemetry and other analysis tools to investigate alerts and suggest actions. Evaluate overall infrastructure and/or observability quality and help automate meaningful improvements. Expand the capabilities of agents across the observability stack.
What You’ll Be Doing:
Design and evolve systems that use intelligent agents that assist users in detecting, triaging, and resolving incidents using observability data and tools.
Build end-to-end workflows for incident lifecycle management, powered by LLMs or agentic technologies.
Implement agentic systems to automate bulk workloads (such as large-scale analyses and report generation), suggest improvements, and surface likely root causes of failures.
Integrate agentic components with internal tools, runbooks, alerting systems, incident platforms, and development environments.
What Makes You a Great Fit:
Experience working in environments with rapid iteration and experimental development.
A pragmatic mindset that values developer experience, reproducibility, and thoughtful trade-offs when scaling GenAI systems.
A passion for minimizing human toil and building AI systems that actively support engineers.
Bonus Points For:
Build APIs and orchestration logic for chaining agent actions into multi-step workflows.
Integration of agentic applications with existing platforms and third-party tools.
Experience with multi-agent systems is a plus.
Compensation & Rewards:
In the United States, the Base compensation range for this role is USD 168,256 - USD 201,907. Actual compensation may vary based on level, experience, and skillset as assessed in the interview process. Benefits include equity, bonus (if applicable) and other benefits listed here.
All of our roles include Restricted Stock Units (RSUs), giving every team member ownership in Grafana Labs' success. We believe in shared outcomes—RSUs help us stay aligned and invested as we scale globally.