Staff Backend Engineer - AI Infra
Daily Adventures and Responsibilities
Lead design and implementation of highly scalable, reliable LLM infrastructure that interfaces with multiple providers (Anthropic, OpenAI, GCP)
Architect and build unified abstractions and frameworks for the organization's AI platform needs including:
Provider API abstractions, authentication systems, and sophisticated usage policies
Advanced provider routing with circuit breakers, retries, and graceful fallbacks
Implement guardrails, content filters, and standardized prompt templating systems
Design comprehensive observability for latency, cost, and quality metrics across AI systems
Drive critical technical decisions across engineering organization for AI infrastructure adoption
Lead development of core AI platform services from architecture to production, including developing robust async worker patterns and task orchestration systems
Define and implement CI/CD best practices: establish pre-commit standards, reduce test flakiness, optimize test execution, harden GitHub Actions, and implement preview environments
Architect cloud infrastructure and Kubernetes deployments with focus on security, scalability and cost efficiency
Establish SLOs/SLIs and implement monitoring systems with Sentry, alert policies, metrics, logs, and dashboards
Collaborate with cross-functional leadership to drive platform adoption and de-risk major launches
Mentor engineers on platform best practices and guide architectural decisions across teams
Competencies
Strategic thinking with ability to influence and align engineering decisions across the organization
Exceptional communication skills for collaborating with engineering leadership, product, and business stakeholders
Self-motivated with demonstrated ability to lead complex, multi-team initiatives
Excellent analytical skills to evaluate and solve complex technical challenges
Strong technical leadership with proven track record of mentoring engineers
Demonstrated experience establishing engineering best practices and systems thinking
Exceptional attention to detail and commitment to high-quality deliverables
Ability to navigate ambiguity and drive technical clarity in complex problem spaces
Proficiency in leveraging AI tools to enhance productivity and engineering workflows
Skills & Relevant Experience
Required:
8+ years of experience in platform engineering, distributed systems, or AI infrastructure
Proven track record of designing and implementing large-scale, production-grade systems
Experience leading technical initiatives that span multiple teams and organizations
Deep understanding of software architecture principles and design patterns
Strong knowledge of cloud platforms, container orchestration, and microservices
Bachelor's degree in Computer Science, Engineering, or related technical field
Preferred:
Experience building production AI/ML infrastructure and platforms
Expertise with Python, FastAPI, and modern async programming patterns
Strong background in Kubernetes, infrastructure as code, and cloud-native architectures
Experience with observability systems, performance optimization, and reliability engineering
Knowledge of security best practices for AI systems and data handling
The listed Pay Range reflects base salary range, except for sales roles, the range provided is the role’s On Target Earnings ('OTE') range, meaning that the range includes both the sales commission/sales bonus targets and annual base salary for the role. This pay range may be inclusive of several career levels at Apollo and will be narrowed during the interview process based on a number of factors, including the candidate’s experience, qualifications, and location. Applicants interested in this role and who are not located in the US may request the annual salary range for their location during the interview process.
Additional benefits for this role may include equity; company bonus or sales commissions/bonuses; 401(k) plan; at least 10 paid holidays per year, flex PTO, and parental leave; employee assistance program and wellbeing benefits; global travel coverage; life/AD&D/STD/LTD insurance; FSA/HSA and medical, dental, and vision benefits.
Annual Pay Range
$166,000—$260,000 USD
About the job
Apply for this position
Staff Backend Engineer - AI Infra
Daily Adventures and Responsibilities
Lead design and implementation of highly scalable, reliable LLM infrastructure that interfaces with multiple providers (Anthropic, OpenAI, GCP)
Architect and build unified abstractions and frameworks for the organization's AI platform needs including:
Provider API abstractions, authentication systems, and sophisticated usage policies
Advanced provider routing with circuit breakers, retries, and graceful fallbacks
Implement guardrails, content filters, and standardized prompt templating systems
Design comprehensive observability for latency, cost, and quality metrics across AI systems
Drive critical technical decisions across engineering organization for AI infrastructure adoption
Lead development of core AI platform services from architecture to production, including developing robust async worker patterns and task orchestration systems
Define and implement CI/CD best practices: establish pre-commit standards, reduce test flakiness, optimize test execution, harden GitHub Actions, and implement preview environments
Architect cloud infrastructure and Kubernetes deployments with focus on security, scalability and cost efficiency
Establish SLOs/SLIs and implement monitoring systems with Sentry, alert policies, metrics, logs, and dashboards
Collaborate with cross-functional leadership to drive platform adoption and de-risk major launches
Mentor engineers on platform best practices and guide architectural decisions across teams
Competencies
Strategic thinking with ability to influence and align engineering decisions across the organization
Exceptional communication skills for collaborating with engineering leadership, product, and business stakeholders
Self-motivated with demonstrated ability to lead complex, multi-team initiatives
Excellent analytical skills to evaluate and solve complex technical challenges
Strong technical leadership with proven track record of mentoring engineers
Demonstrated experience establishing engineering best practices and systems thinking
Exceptional attention to detail and commitment to high-quality deliverables
Ability to navigate ambiguity and drive technical clarity in complex problem spaces
Proficiency in leveraging AI tools to enhance productivity and engineering workflows
Skills & Relevant Experience
Required:
8+ years of experience in platform engineering, distributed systems, or AI infrastructure
Proven track record of designing and implementing large-scale, production-grade systems
Experience leading technical initiatives that span multiple teams and organizations
Deep understanding of software architecture principles and design patterns
Strong knowledge of cloud platforms, container orchestration, and microservices
Bachelor's degree in Computer Science, Engineering, or related technical field
Preferred:
Experience building production AI/ML infrastructure and platforms
Expertise with Python, FastAPI, and modern async programming patterns
Strong background in Kubernetes, infrastructure as code, and cloud-native architectures
Experience with observability systems, performance optimization, and reliability engineering
Knowledge of security best practices for AI systems and data handling
The listed Pay Range reflects base salary range, except for sales roles, the range provided is the role’s On Target Earnings ('OTE') range, meaning that the range includes both the sales commission/sales bonus targets and annual base salary for the role. This pay range may be inclusive of several career levels at Apollo and will be narrowed during the interview process based on a number of factors, including the candidate’s experience, qualifications, and location. Applicants interested in this role and who are not located in the US may request the annual salary range for their location during the interview process.
Additional benefits for this role may include equity; company bonus or sales commissions/bonuses; 401(k) plan; at least 10 paid holidays per year, flex PTO, and parental leave; employee assistance program and wellbeing benefits; global travel coverage; life/AD&D/STD/LTD insurance; FSA/HSA and medical, dental, and vision benefits.
Annual Pay Range
$166,000—$260,000 USD