Staff Engineer, GitLab Delivery - Operate
An overview of this role
As a Staff Engineer on the GitLab Operate team, you’ll lead the technical direction for GitLab’s self-managed deployment strategy so customers can deploy, upgrade, and run GitLab reliably in their own infrastructure without downtime. You’ll be the technical anchor for a team that owns and evolves our core deployment tooling for self-managed GitLab deployments. You’ll focus on solving complex challenges. These include zero-downtime upgrades, database and application lifecycle management, and operational excellence at scale across environments from single-node virtual machines to large Kubernetes clusters.
You’ll work closely with your engineering manager, product manager, and partners across SRE, Release, Security, and Development to define cloud-native, operator-driven deployment patterns that reduce operational complexity and upgrade friction for thousands of self-managed customers. You’ll shape the architecture for zero-downtime upgrades and strengthen observability and reliability practices for self-managed deployments. You’ll also mentor other engineers as the team builds the next generation of deployment automation.
Some examples of our projects:
Evolving GitLab Operator and Helm charts to support zero-downtime upgrades for complex, stateful GitLab installations
Advancing the GitLab Environment Toolkit to simplify large-scale, production-ready self-managed deployments
What you’ll do
Lead the technical vision and architecture for GitLab's cloud-native, self-managed deployments and zero-downtime upgrades, focusing on reliability, simplicity, and operational excellence at scale.
Establish and champion operational maturity standards, service integration patterns, and deployment models so development teams can own the end-to-end lifecycle of their components with reduced upgrade risk.
Design, implement, and maintain production-grade Kubernetes Operators, Helm charts, and upgrade orchestration tooling that automate lifecycle management for self-managed GitLab deployments across a range of environments.
Develop integration and automation frameworks that handle database migrations, rolling deployments, compatibility checks, and rollbacks to enable predictable, repeatable release outcomes for product teams.
Define and evolve database and application lifecycle strategies, including safe PostgreSQL migrations, compatibility layers, and validation checks, to minimize downtime, failed upgrades, and rollbacks.
Collaborate with Product Management, GitLab.com Site Reliability Engineering, GitLab Dedicated, and development teams to align deployment patterns, operational practices, and self-managed customer needs.
Mentor and support engineers and customer-facing teams through design reviews, code reviews, documentation, and runbooks that improve deployment reliability, supportability, and customer success.
Define and implement observability, testing, performance, and resilience practices for self-managed deployments, and contribute to incident response and post-mortems to continually improve reliability and mean time to recovery.
What you’ll bring
Strong software engineering background with experience designing and delivering production systems that customers install and operate in their own infrastructure.
Proficiency in Go for large, complex codebases, with familiarity with Ruby on Rails and Rails application architecture as a plus.
Hands-on experience operating Kubernetes in production, including building and maintaining Operators, designing Helm charts for complex stateful applications, and working with Custom Resource Definitions (CRDs), admission controllers, and controller patterns.
Knowledge of cloud-native architectures and tooling, such as service mesh, observability stacks, infrastructure as code, and infrastructure automation tools like Terraform or Ansible.
Experience with stateful workloads and databases, including PostgreSQL schema design and migrations, persistent volumes, storage classes, and strategies for minimizing downtime during upgrades.
Understanding of Linux systems and production operations, including package management, systemd, system-level debugging, observability practices, incident response, and on-call participation.
Ability to provide technical leadership across teams, including writing clear technical proposals and documentation, mentoring engineers, and influencing without direct authority.
Interest in or experience with open source infrastructure or deployment tooling, and the ability to explain complex technical concepts to both technical and non-technical audiences, with openness to candidates who bring relevant, transferable skills from adjacent domains.
About the team
We sit within GitLab Delivery and focus on delivering GitLab to self-managed users through supported, validated deployment tooling. We maintain and evolve the GitLab Omnibus package, Helm charts, GitLab Operator, and the GitLab Environment Toolkit (GET), and we partner closely with Site Reliability Engineering, Release, Security, and Development teams across regions in an all-remote, asynchronous way. As a group, we're building the capabilities needed to deliver GitLab’s expanding service architecture to self-managed customers. Our priorities include enabling zero-downtime upgrades, reducing operational complexity at scale, supporting a cloud-native transition while serving existing deployments, and improving upgrade velocity for customers running GitLab in diverse environments. For more on how we work, see our Team Handbook Page.
About the job
Apply for this position
Staff Engineer, GitLab Delivery - Operate
An overview of this role
As a Staff Engineer on the GitLab Operate team, you’ll lead the technical direction for GitLab’s self-managed deployment strategy so customers can deploy, upgrade, and run GitLab reliably in their own infrastructure without downtime. You’ll be the technical anchor for a team that owns and evolves our core deployment tooling for self-managed GitLab deployments. You’ll focus on solving complex challenges. These include zero-downtime upgrades, database and application lifecycle management, and operational excellence at scale across environments from single-node virtual machines to large Kubernetes clusters.
You’ll work closely with your engineering manager, product manager, and partners across SRE, Release, Security, and Development to define cloud-native, operator-driven deployment patterns that reduce operational complexity and upgrade friction for thousands of self-managed customers. You’ll shape the architecture for zero-downtime upgrades and strengthen observability and reliability practices for self-managed deployments. You’ll also mentor other engineers as the team builds the next generation of deployment automation.
Some examples of our projects:
Evolving GitLab Operator and Helm charts to support zero-downtime upgrades for complex, stateful GitLab installations
Advancing the GitLab Environment Toolkit to simplify large-scale, production-ready self-managed deployments
What you’ll do
Lead the technical vision and architecture for GitLab's cloud-native, self-managed deployments and zero-downtime upgrades, focusing on reliability, simplicity, and operational excellence at scale.
Establish and champion operational maturity standards, service integration patterns, and deployment models so development teams can own the end-to-end lifecycle of their components with reduced upgrade risk.
Design, implement, and maintain production-grade Kubernetes Operators, Helm charts, and upgrade orchestration tooling that automate lifecycle management for self-managed GitLab deployments across a range of environments.
Develop integration and automation frameworks that handle database migrations, rolling deployments, compatibility checks, and rollbacks to enable predictable, repeatable release outcomes for product teams.
Define and evolve database and application lifecycle strategies, including safe PostgreSQL migrations, compatibility layers, and validation checks, to minimize downtime, failed upgrades, and rollbacks.
Collaborate with Product Management, GitLab.com Site Reliability Engineering, GitLab Dedicated, and development teams to align deployment patterns, operational practices, and self-managed customer needs.
Mentor and support engineers and customer-facing teams through design reviews, code reviews, documentation, and runbooks that improve deployment reliability, supportability, and customer success.
Define and implement observability, testing, performance, and resilience practices for self-managed deployments, and contribute to incident response and post-mortems to continually improve reliability and mean time to recovery.
What you’ll bring
Strong software engineering background with experience designing and delivering production systems that customers install and operate in their own infrastructure.
Proficiency in Go for large, complex codebases, with familiarity with Ruby on Rails and Rails application architecture as a plus.
Hands-on experience operating Kubernetes in production, including building and maintaining Operators, designing Helm charts for complex stateful applications, and working with Custom Resource Definitions (CRDs), admission controllers, and controller patterns.
Knowledge of cloud-native architectures and tooling, such as service mesh, observability stacks, infrastructure as code, and infrastructure automation tools like Terraform or Ansible.
Experience with stateful workloads and databases, including PostgreSQL schema design and migrations, persistent volumes, storage classes, and strategies for minimizing downtime during upgrades.
Understanding of Linux systems and production operations, including package management, systemd, system-level debugging, observability practices, incident response, and on-call participation.
Ability to provide technical leadership across teams, including writing clear technical proposals and documentation, mentoring engineers, and influencing without direct authority.
Interest in or experience with open source infrastructure or deployment tooling, and the ability to explain complex technical concepts to both technical and non-technical audiences, with openness to candidates who bring relevant, transferable skills from adjacent domains.
About the team
We sit within GitLab Delivery and focus on delivering GitLab to self-managed users through supported, validated deployment tooling. We maintain and evolve the GitLab Omnibus package, Helm charts, GitLab Operator, and the GitLab Environment Toolkit (GET), and we partner closely with Site Reliability Engineering, Release, Security, and Development teams across regions in an all-remote, asynchronous way. As a group, we're building the capabilities needed to deliver GitLab’s expanding service architecture to self-managed customers. Our priorities include enabling zero-downtime upgrades, reducing operational complexity at scale, supporting a cloud-native transition while serving existing deployments, and improving upgrade velocity for customers running GitLab in diverse environments. For more on how we work, see our Team Handbook Page.
