Senior Backend Engineer (Ruby and/or Go), Tenant Scale; Cells Infrastructure
An overview of this role
As a Senior Backend Engineer, Cells Infrastructure, you'll help us build the foundation that lets GitLab.com scale horizontally through our Cells architecture. You'll work on two core parts of that system: edge routing services that direct traffic across a fleet of independent Cell clusters, and the Topology Service that manages and serves cluster topology information as the source of truth for the rest of the platform. Your work will make routing reliable and low-latency across protocols, and ensure GitLab teams can build Cell-aware features with confidence as we grow.
Some examples of our projects:
Building and operating routing services in TypeScript that direct requests to the correct Cell using cluster topology data
Developing and maintaining the systems that store, update, and serve cluster topology information that routing, resource assignment, and Cell lifecycle decisions depend on
What you'll do
Design and implement edge traffic routing that directs requests to the correct Cell in a way that's transparent to users.
Build and evolve the Topology Service that serves as the authoritative source of cluster state for routing, resource assignment, and Cell lifecycle decisions.
Collaborate across the GitLab Rails monolith and supporting services to make features and data models Cell-aware with feature teams across the product.
Operate and improve the routing and topology systems you build by participating in tier-2 on-call, responding to escalated incidents, and strengthening observability and operational tooling.
Author Architecture Decision Records (ADRs), operational runbooks, and documentation so other teams can understand, adopt, and extend the Cells platform.
Review merge requests from GitLab team members and community contributors, maintaining high standards for correctness, performance, and security across the stack.
What you'll bring
Experience building observable, resilient production services using Go or Ruby on Rails (TypeScript experience is a plus).
Background delivering and operating production systems in high-scale environments, including incident response and operational ownership.
Ability to reason about distributed systems, including consistency models, partitioning strategies, failure modes, and operational tradeoffs.
Experience building high-throughput networking services (gRPC and protocol buffers knowledge is a plus).
Familiarity working in large, multi-team codebases and coordinating changes across teams and services, including making features and data models Cell-aware.
Knowledge of observability practices such as metrics, tracing, and alerting, with an approach focused on building systems you'd be confident operating on-call.
Strong written communication skills for an async-first, globally distributed team, including documenting decisions (for example, architecture decision records) and runbooks.
Experience working with relational databases in production, including schema design, migrations, and query performance tuning (PostgreSQL experience is a plus).
About the team
We're the Cells Infrastructure team within the Tenant Scale group in Infrastructure Platforms. We're a globally distributed, all-remote group of Backend Engineers and Site Reliability Engineers working asynchronously across multiple time zones. We own foundational services for GitLab's Cells architecture, including edge routing that directs requests to the right Cell and our topology systems that act as the source of truth for cluster state. Our challenge is making GitLab.com scale horizontally in a way that's reliable, low-latency, and operable, so every request reaches the right cluster and we can keep growing safely. For more on how we work, see Tenant Scale Group Handbook Page.
About the job
Apply for this position
Senior Backend Engineer (Ruby and/or Go), Tenant Scale; Cells Infrastructure
An overview of this role
As a Senior Backend Engineer, Cells Infrastructure, you'll help us build the foundation that lets GitLab.com scale horizontally through our Cells architecture. You'll work on two core parts of that system: edge routing services that direct traffic across a fleet of independent Cell clusters, and the Topology Service that manages and serves cluster topology information as the source of truth for the rest of the platform. Your work will make routing reliable and low-latency across protocols, and ensure GitLab teams can build Cell-aware features with confidence as we grow.
Some examples of our projects:
Building and operating routing services in TypeScript that direct requests to the correct Cell using cluster topology data
Developing and maintaining the systems that store, update, and serve cluster topology information that routing, resource assignment, and Cell lifecycle decisions depend on
What you'll do
Design and implement edge traffic routing that directs requests to the correct Cell in a way that's transparent to users.
Build and evolve the Topology Service that serves as the authoritative source of cluster state for routing, resource assignment, and Cell lifecycle decisions.
Collaborate across the GitLab Rails monolith and supporting services to make features and data models Cell-aware with feature teams across the product.
Operate and improve the routing and topology systems you build by participating in tier-2 on-call, responding to escalated incidents, and strengthening observability and operational tooling.
Author Architecture Decision Records (ADRs), operational runbooks, and documentation so other teams can understand, adopt, and extend the Cells platform.
Review merge requests from GitLab team members and community contributors, maintaining high standards for correctness, performance, and security across the stack.
What you'll bring
Experience building observable, resilient production services using Go or Ruby on Rails (TypeScript experience is a plus).
Background delivering and operating production systems in high-scale environments, including incident response and operational ownership.
Ability to reason about distributed systems, including consistency models, partitioning strategies, failure modes, and operational tradeoffs.
Experience building high-throughput networking services (gRPC and protocol buffers knowledge is a plus).
Familiarity working in large, multi-team codebases and coordinating changes across teams and services, including making features and data models Cell-aware.
Knowledge of observability practices such as metrics, tracing, and alerting, with an approach focused on building systems you'd be confident operating on-call.
Strong written communication skills for an async-first, globally distributed team, including documenting decisions (for example, architecture decision records) and runbooks.
Experience working with relational databases in production, including schema design, migrations, and query performance tuning (PostgreSQL experience is a plus).
About the team
We're the Cells Infrastructure team within the Tenant Scale group in Infrastructure Platforms. We're a globally distributed, all-remote group of Backend Engineers and Site Reliability Engineers working asynchronously across multiple time zones. We own foundational services for GitLab's Cells architecture, including edge routing that directs requests to the right Cell and our topology systems that act as the source of truth for cluster state. Our challenge is making GitLab.com scale horizontally in a way that's reliable, low-latency, and operable, so every request reaches the right cluster and we can keep growing safely. For more on how we work, see Tenant Scale Group Handbook Page.
