Senior Manager, Critical Operations & Reliability Engineering
Apply for this position → Go ad-free with PremiumAt Netflix, our mission is to entertain the world. Together, we are writing the next episode - pushing the boundaries of storytelling, global fandom and making the unimaginable a reality. We are a dream team obsessed with the uncomfortable excitement of discovering what happens when you merge creativity, intuition and cutting-edge technology. Come be a part of what’s next.
Role Overview:
We are looking for a Senior Manager of Site Reliability Engineering to lead one of the most consequential infrastructure organizations at Netflix. This role owns two intersecting mandates: setting the reliability standards that the entire engineering organization builds to, and leading the SRE team supporting our streaming architecture.
Netflix’s infrastructure is undergoing a fundamental shift. Infrastructure is quickly evolving towards a millions-of-agents ecosystem, with AI agents increasingly embedded in how we detect, diagnose, and remediate incidents; how we plan capacity; and how we evolve our reliability posture over time.
Core Responsibilities
Strategic Leadership: Build and scale a world-class SRE function, defining the operating model for how SREs partner with product and infrastructure teams.
Reliability Governance: Establish and socialize company-wide standards (SLIs/SLOs, Error Budgets) and publish transparent reliability scorecards to drive engineering accountability.
Resilience Operations: Standardize chaos engineering, fault injection, and proactive risk modeling (dependency mapping, traffic simulation) across the Netflix stack.
Cross-Functional Partnership: Collaborate with CDN, Playback, and Ads teams to eliminate systemic failures and translate technical reliability data into actionable business risk for executives.
Qualifications
Experience: 12+ years in software/infrastructure, with 5+ years in senior SRE leadership.
Technical Mastery: Deep fluency in cloud-native scale (AWS/GCP, Containers, Service Mesh) and modern observability (Metrics, Tracing, Logging).
AI/ML Fluency: Practical experience building or implementing AIOps, anomaly detection, or agentic infrastructure systems.
Organizational Influence: Proven ability to drive technical adoption across complex, decentralized organizations through influence rather than mandate.
Communication: Ability to navigate the 'human-machine' boundary of automation and clearly articulate technical trade-offs to non-technical stakeholders.
Preferred (Nice to Have)
Experience in streaming media, ad-tech, or high-scale gaming backends.
Hands-on design of LLM-based autonomous agents in production.
Familiarity with Netflix’s OSS ecosystem (Spinnaker, Atlas, Mantis) or Chaos Monkey.
Netflix provides comprehensive benefits including Health Plans, Mental Health support, a 401(k) Retirement Plan with employer match, Stock Option Program, Disability Programs, Health Savings and Flexible Spending Accounts, Family-forming benefits, and Life and Serious Injury Benefits. We also offer paid leave of absence programs. Full-time hourly employees accrue 35 days annually for paid time off to be used for vacation, holidays, and sick paid time off. Full-time salaried employees are immediately entitled to flexible time off. See more details about our Benefits here.
Netflix is a unique culture and environment. Learn more here.
Inclusionis a Netflix value and we strive to host a meaningful interview experience for all candidates. If you want an accommodation/adjustment for a disability or any other reason during the hiring process, please send a request to your recruiting partner.
We are an equal-opportunity employer and celebrate diversity, recognizing that diversity builds stronger teams. We approach diversity and inclusion seriously and thoughtfully. We do not discriminate on the basis of race, religion, color, ancestry, national origin, caste, sex, sexual orientation, gender, gender identity or expression, age, disability, medical condition, pregnancy, genetic makeup, marital status, or military service.
Job is open for no less than 7 days and will be removed when the position is filled.
Similar Jobs
Principal Product Manager, Data & AI
phData · USA
Senior Manager, Data & AI
PLACE Corporate Careers · USA
Chief Technology Officer
Access Softek · USA
Director, Technical Product Management — Software Platforms
Ralliant · USA
Program Manager, Community Support
Airbnb · USA
Senior Manager, Critical Operations & Reliability Engineering
At Netflix, our mission is to entertain the world. Together, we are writing the next episode - pushing the boundaries of storytelling, global fandom and making the unimaginable a reality. We are a dream team obsessed with the uncomfortable excitement of discovering what happens when you merge creativity, intuition and cutting-edge technology. Come be a part of what’s next.
Role Overview:
We are looking for a Senior Manager of Site Reliability Engineering to lead one of the most consequential infrastructure organizations at Netflix. This role owns two intersecting mandates: setting the reliability standards that the entire engineering organization builds to, and leading the SRE team supporting our streaming architecture.
Netflix’s infrastructure is undergoing a fundamental shift. Infrastructure is quickly evolving towards a millions-of-agents ecosystem, with AI agents increasingly embedded in how we detect, diagnose, and remediate incidents; how we plan capacity; and how we evolve our reliability posture over time.
Core Responsibilities
Strategic Leadership: Build and scale a world-class SRE function, defining the operating model for how SREs partner with product and infrastructure teams.
Reliability Governance: Establish and socialize company-wide standards (SLIs/SLOs, Error Budgets) and publish transparent reliability scorecards to drive engineering accountability.
Resilience Operations: Standardize chaos engineering, fault injection, and proactive risk modeling (dependency mapping, traffic simulation) across the Netflix stack.
Cross-Functional Partnership: Collaborate with CDN, Playback, and Ads teams to eliminate systemic failures and translate technical reliability data into actionable business risk for executives.
Qualifications
Experience: 12+ years in software/infrastructure, with 5+ years in senior SRE leadership.
Technical Mastery: Deep fluency in cloud-native scale (AWS/GCP, Containers, Service Mesh) and modern observability (Metrics, Tracing, Logging).
AI/ML Fluency: Practical experience building or implementing AIOps, anomaly detection, or agentic infrastructure systems.
Organizational Influence: Proven ability to drive technical adoption across complex, decentralized organizations through influence rather than mandate.
Communication: Ability to navigate the 'human-machine' boundary of automation and clearly articulate technical trade-offs to non-technical stakeholders.
Preferred (Nice to Have)
Experience in streaming media, ad-tech, or high-scale gaming backends.
Hands-on design of LLM-based autonomous agents in production.
Familiarity with Netflix’s OSS ecosystem (Spinnaker, Atlas, Mantis) or Chaos Monkey.
Netflix provides comprehensive benefits including Health Plans, Mental Health support, a 401(k) Retirement Plan with employer match, Stock Option Program, Disability Programs, Health Savings and Flexible Spending Accounts, Family-forming benefits, and Life and Serious Injury Benefits. We also offer paid leave of absence programs. Full-time hourly employees accrue 35 days annually for paid time off to be used for vacation, holidays, and sick paid time off. Full-time salaried employees are immediately entitled to flexible time off. See more details about our Benefits here.
Netflix is a unique culture and environment. Learn more here.
Inclusionis a Netflix value and we strive to host a meaningful interview experience for all candidates. If you want an accommodation/adjustment for a disability or any other reason during the hiring process, please send a request to your recruiting partner.
We are an equal-opportunity employer and celebrate diversity, recognizing that diversity builds stronger teams. We approach diversity and inclusion seriously and thoughtfully. We do not discriminate on the basis of race, religion, color, ancestry, national origin, caste, sex, sexual orientation, gender, gender identity or expression, age, disability, medical condition, pregnancy, genetic makeup, marital status, or military service.
Job is open for no less than 7 days and will be removed when the position is filled.
Similar Jobs
Principal Product Manager, Data & AI
phData · USA
Senior Manager, Data & AI
PLACE Corporate Careers · USA
Chief Technology Officer
Access Softek · USA
Director, Technical Product Management — Software Platforms
Ralliant · USA
Program Manager, Community Support
Airbnb · USA