Staff Software Engineer - Grafana Cloud k6
This is a remote opportunity, and we would be interested in applicants in United States time zones.
The Opportunity
We are the team behind Grafana k6, Grafana Cloud k6, and Grafana Cloud Synthetics, used by teams globally to ensure resilient, high-performing systems. This opportunity is with the Grafana Cloud k6 squad, who build and operate our performance testing product. Grafana Cloud k6 is built around the OSS k6 and targeted at users looking to run performance tests at scale. Our enterprise and SaaS offerings allow customers to load test their systems by running distributed tests from 15+ regions worldwide, using hundreds of thousands of virtual users sending millions of requests per second. We ingest huge volumes of data generated by k6, which can be used to view, correlate and analyze metrics from each test.
k6 is a product used by other engineers, and as such, we are looking for people enthusiastic about building high-quality tools they would want to use themselves. Due to our small teams and fast development pace, you will have a substantial and immediate impact on how the end product is architected, developed, and how the engineering team operates.
Your role will focus on establishing and scaling a cross-team culture of engineering excellence by setting standards and guiding adoption of strong DevOps/SRE practices that improve reliability, availability, and operational ownership. As this foundation matures, the role is expected to expand into broader application and product development leadership, contributing architectural and technical depth beyond operational excellence.
What will you be doing?
Build and scale a strong culture of operational excellence by defining standards and coaching teams to own reliability and availability.
Drive mature DevOps/SRE practices, including incident response and PIRs, on-call readiness, runbooks, alerting, observability, and release/change management.
Establish reliability frameworks such as SLIs/SLOs and error budgets, and use them to guide prioritization and engineering trade-offs.
Provide visibility into system health through clear operational metrics and reliability reporting.
Guide teams in the design, development, evolution, and operation of large-scale, distributed cloud systems.
Influence product and system direction through design reviews, architectural discussions, and cross-team collaboration.
Share knowledge through clear, high-quality documentation and technical communication—internally and, where appropriate, externally—to help teams build and operate systems more effectively.
As the reliability foundation matures, grow into broader application and product development leadership, contributing architectural and technical depth beyond operations.
We invest heavily in developer productivity. You can use modern AI coding assistants as part of your daily workflow (your choice of tools, within security guidelines), backed by a company-funded usage budget so you can iterate quickly without unnecessary friction. We encourage pragmatic AI-assisted development: faster prototyping, test generation, refactors, documentation, and incident follow-ups—always paired with strong code review and quality standards. You’ll also have access to frontier models (e.g., GPT-Codex 5/3, Claude Opus 4.6, Gemini 3 Pro).
Requirements:
Strong experience with DevOps/SRE practices, including operating and evolving production systems at scale
Strong programming background in a modern language (Python and Go are primary, but prior experience is not required)
Experience designing, building, and operating large-scale distributed systems
Strong understanding of reliability engineering concepts (e.g. incident management, observability, and failure modes)
Experience with test automation, including performance and functional testing
Ability to influence engineering practices through clear technical communication, reviews, and collaboration
Strong interpersonal skills and ability to work effectively across teams
Familiarity with modern software engineering processes and delivery practices
Self-driven and comfortable operating with a high degree of autonomy and ambiguity
Bonus Points For:
Experience with containerized and cloud-native systems (Docker, Kubernetes, AWS)
Familiarity with observability tooling and platforms (e.g. the Grafana stack)
Experience working with Python, Go, JavaScript and/or Jsonnet
Experience building or operating event-driven or asynchronous systems
Experience defining or applying SLIs/SLOs, error budgets, or reliability metrics
Interest in, or experience with, building testing frameworks or developer tooling
Compensation & Rewards:
In the US, the Base compensation range for this role is $174,986 - $209,983. Actual compensation may vary based on level, experience, and skillset as assessed throughout the interview process. All of our roles include Restricted Stock Units (RSUs), giving every team member ownership in Grafana Labs' success. We believe in shared outcomes—RSUs help us stay aligned and invested as we scale globally.
About the job
Apply for this position
Staff Software Engineer - Grafana Cloud k6
This is a remote opportunity, and we would be interested in applicants in United States time zones.
The Opportunity
We are the team behind Grafana k6, Grafana Cloud k6, and Grafana Cloud Synthetics, used by teams globally to ensure resilient, high-performing systems. This opportunity is with the Grafana Cloud k6 squad, who build and operate our performance testing product. Grafana Cloud k6 is built around the OSS k6 and targeted at users looking to run performance tests at scale. Our enterprise and SaaS offerings allow customers to load test their systems by running distributed tests from 15+ regions worldwide, using hundreds of thousands of virtual users sending millions of requests per second. We ingest huge volumes of data generated by k6, which can be used to view, correlate and analyze metrics from each test.
k6 is a product used by other engineers, and as such, we are looking for people enthusiastic about building high-quality tools they would want to use themselves. Due to our small teams and fast development pace, you will have a substantial and immediate impact on how the end product is architected, developed, and how the engineering team operates.
Your role will focus on establishing and scaling a cross-team culture of engineering excellence by setting standards and guiding adoption of strong DevOps/SRE practices that improve reliability, availability, and operational ownership. As this foundation matures, the role is expected to expand into broader application and product development leadership, contributing architectural and technical depth beyond operational excellence.
What will you be doing?
Build and scale a strong culture of operational excellence by defining standards and coaching teams to own reliability and availability.
Drive mature DevOps/SRE practices, including incident response and PIRs, on-call readiness, runbooks, alerting, observability, and release/change management.
Establish reliability frameworks such as SLIs/SLOs and error budgets, and use them to guide prioritization and engineering trade-offs.
Provide visibility into system health through clear operational metrics and reliability reporting.
Guide teams in the design, development, evolution, and operation of large-scale, distributed cloud systems.
Influence product and system direction through design reviews, architectural discussions, and cross-team collaboration.
Share knowledge through clear, high-quality documentation and technical communication—internally and, where appropriate, externally—to help teams build and operate systems more effectively.
As the reliability foundation matures, grow into broader application and product development leadership, contributing architectural and technical depth beyond operations.
We invest heavily in developer productivity. You can use modern AI coding assistants as part of your daily workflow (your choice of tools, within security guidelines), backed by a company-funded usage budget so you can iterate quickly without unnecessary friction. We encourage pragmatic AI-assisted development: faster prototyping, test generation, refactors, documentation, and incident follow-ups—always paired with strong code review and quality standards. You’ll also have access to frontier models (e.g., GPT-Codex 5/3, Claude Opus 4.6, Gemini 3 Pro).
Requirements:
Strong experience with DevOps/SRE practices, including operating and evolving production systems at scale
Strong programming background in a modern language (Python and Go are primary, but prior experience is not required)
Experience designing, building, and operating large-scale distributed systems
Strong understanding of reliability engineering concepts (e.g. incident management, observability, and failure modes)
Experience with test automation, including performance and functional testing
Ability to influence engineering practices through clear technical communication, reviews, and collaboration
Strong interpersonal skills and ability to work effectively across teams
Familiarity with modern software engineering processes and delivery practices
Self-driven and comfortable operating with a high degree of autonomy and ambiguity
Bonus Points For:
Experience with containerized and cloud-native systems (Docker, Kubernetes, AWS)
Familiarity with observability tooling and platforms (e.g. the Grafana stack)
Experience working with Python, Go, JavaScript and/or Jsonnet
Experience building or operating event-driven or asynchronous systems
Experience defining or applying SLIs/SLOs, error budgets, or reliability metrics
Interest in, or experience with, building testing frameworks or developer tooling
Compensation & Rewards:
In the US, the Base compensation range for this role is $174,986 - $209,983. Actual compensation may vary based on level, experience, and skillset as assessed throughout the interview process. All of our roles include Restricted Stock Units (RSUs), giving every team member ownership in Grafana Labs' success. We believe in shared outcomes—RSUs help us stay aligned and invested as we scale globally.
