Senior Software Engineer - Platform Monitoring
This is a remote position and we're considering candidates in Sweden.
The Opportunity:
We are hiring for the Platform Monitoring Squad, which is responsible for managing cloud service provider resources and observability (often referred to as o11y). This includes tasks such as managing costs and creating dashboards. The design for cloud observability incorporates critical elements like Prometheus, global Alertmanager, and meta-monitoring. Additionally, the squad will work on customer stack configuration management libraries and tooling, client tooling for configuration management, and the management of business-critical dashboards.
Ultimately, we support teams developing key products like Grafana, Mimir, Loki, and Tempo.
What You’ll Be Doing:
The squad is responsible for setting its own roadmap, and as a part of the team you'll have a part to play in that process. You'll help us maintain, improve and extend what we already have. You'll be involved in choosing what we focus on next and, just as importantly, when and how to gracefully sunset systems which are no longer needed. Your responsibilities will also include helping the team to design, compare, and choose appropriate solutions for (at least some) of the following things:
Cloud service provider resource observability
Create cost alerts
Help with improving cloud cost margins
Improve the reliability of autoscaling tools
Investigate CSP unallocated resources
Business-critical dashboard management
Examples of projects you could be working on:
Allocation management and utilization monitoring
Customer stack config management libraries and builders tooling
Terraform and Crossplane Providers for Grafana Cloud
What Makes You a Great Fit:
You have experience working in a Platform group delivering services to internal users and customers and interest in implementing, integrating, and maintaining observability systems and processes.
You are comfortable working in a remote-first company; communication is key. For us, working together means being collaborative, friendly, kind, and respectful. We operate by consensus, you can contribute to a discussion but then commit to the team decision.
You are eager to learn and grow. There is a lot of room for growth and development, and the team has quite a lot of knowledge to share for those who are wanting to learn.
You approach development holistically. The team owns the full life cycle of our code; from writing design docs, looking at developer feedback, testing and deployment, all the way through to decommissioning. We appreciate engineers who enjoy looking at the big picture, and also notice the details of the brush strokes. You're a flexible software engineer. In a typical day, we might spend time responding to incidents, integrating existing systems, or designing and implementing our own systems. While our primary language is Go, we value flexibility and we'll choose the best tool for the job, even if that is a shell script.
Engineering/software development experience within a Platform group delivering services to internal engineering teams
Experience working in a cloud environment
Infrastructure as Code with Terraform/Crossplane.
Familiarity with Kubernetes administration - very cool if experience with Tanka.
Experience/Interest in implementing, integrating, and maintaining observability systems and processes
In Sweden, the base compensation range for this role is SEK 738,518 - SEK 886,222. Actual compensation may vary based on level, experience, and skillset as assessed throughout the interview process. All of our roles include Restricted Stock Units (RSUs), giving every team member ownership in Grafana Labs' success. We believe in shared outcomes—RSUs help us stay aligned and invested as we scale globally.
About the job
Apply for this position
Senior Software Engineer - Platform Monitoring
This is a remote position and we're considering candidates in Sweden.
The Opportunity:
We are hiring for the Platform Monitoring Squad, which is responsible for managing cloud service provider resources and observability (often referred to as o11y). This includes tasks such as managing costs and creating dashboards. The design for cloud observability incorporates critical elements like Prometheus, global Alertmanager, and meta-monitoring. Additionally, the squad will work on customer stack configuration management libraries and tooling, client tooling for configuration management, and the management of business-critical dashboards.
Ultimately, we support teams developing key products like Grafana, Mimir, Loki, and Tempo.
What You’ll Be Doing:
The squad is responsible for setting its own roadmap, and as a part of the team you'll have a part to play in that process. You'll help us maintain, improve and extend what we already have. You'll be involved in choosing what we focus on next and, just as importantly, when and how to gracefully sunset systems which are no longer needed. Your responsibilities will also include helping the team to design, compare, and choose appropriate solutions for (at least some) of the following things:
Cloud service provider resource observability
Create cost alerts
Help with improving cloud cost margins
Improve the reliability of autoscaling tools
Investigate CSP unallocated resources
Business-critical dashboard management
Examples of projects you could be working on:
Allocation management and utilization monitoring
Customer stack config management libraries and builders tooling
Terraform and Crossplane Providers for Grafana Cloud
What Makes You a Great Fit:
You have experience working in a Platform group delivering services to internal users and customers and interest in implementing, integrating, and maintaining observability systems and processes.
You are comfortable working in a remote-first company; communication is key. For us, working together means being collaborative, friendly, kind, and respectful. We operate by consensus, you can contribute to a discussion but then commit to the team decision.
You are eager to learn and grow. There is a lot of room for growth and development, and the team has quite a lot of knowledge to share for those who are wanting to learn.
You approach development holistically. The team owns the full life cycle of our code; from writing design docs, looking at developer feedback, testing and deployment, all the way through to decommissioning. We appreciate engineers who enjoy looking at the big picture, and also notice the details of the brush strokes. You're a flexible software engineer. In a typical day, we might spend time responding to incidents, integrating existing systems, or designing and implementing our own systems. While our primary language is Go, we value flexibility and we'll choose the best tool for the job, even if that is a shell script.
Engineering/software development experience within a Platform group delivering services to internal engineering teams
Experience working in a cloud environment
Infrastructure as Code with Terraform/Crossplane.
Familiarity with Kubernetes administration - very cool if experience with Tanka.
Experience/Interest in implementing, integrating, and maintaining observability systems and processes
In Sweden, the base compensation range for this role is SEK 738,518 - SEK 886,222. Actual compensation may vary based on level, experience, and skillset as assessed throughout the interview process. All of our roles include Restricted Stock Units (RSUs), giving every team member ownership in Grafana Labs' success. We believe in shared outcomes—RSUs help us stay aligned and invested as we scale globally.