MENU
  • Remote Jobs
  • Companies
  • Go Premium
  • Job Alerts
  • Post a Job
  • Log in
  • Sign up
Working Nomads logo Working Nomads
  • Remote Jobs
  • Companies
  • Post Jobs
  • Go Premium
  • Get Free Job Alerts
  • Log in

Director - Reliability Engineering

HubSpot

Full-time
Ireland
director
aws
architecture
cloud
security
Apply for this position

POS-31619

Director, Reliability Engineering

Role Summary

Our mission at HubSpot is to help millions of organizations grow better. HubSpot’s engineering organization has grown to more than 2,000 engineers shipping across thousands of services and deploying thousands of times per day. As HubSpot has become core infrastructure for over 200,000 customers worldwide, reliability isn’t just a priority — it’s foundational to customer trust and business growth.

Our Reliability Engineering team has matured from an early SRE function into a strategic pillar within Platform Infrastructure. The team has driven a 76% reduction in critical incidents while the platform scaled 19x in deployables, established company-wide SLO frameworks, and built the incident management practices that keep HubSpot running.

Now we’re entering the next phase: leveraging AI and agentic approaches to fundamentally transform how we detect, respond to, and prevent outages. As Director of Reliability Engineering, you’ll lead this evolution — deepening our reliability capabilities, pioneering AI-assisted operations, and ensuring HubSpot remains a platform customers can confidently bet their business on.

What You’ll Do

Lead and Develop the Team

  • Lead a team of ~20 reliability engineers, fostering a culture of operational excellence, continuous learning, and customer obsession

  • Attract, develop, and retain top talent; build career paths that keep engineers engaged and growing

Own Reliability Strategy

  • Define and drive HubSpot's reliability roadmap, balancing proactive resilience investments with reactive incident reduction

  • Partner with Infrastructure leadership to prioritize reliability initiatives alongside cost, performance, and platform evolution

  • Set and evolve SLO standards that align engineering effort with customer experience

Pioneer AI-Driven Operations

  • Lead the strategy for integrating AI and agentic approaches into incident detection, diagnosis, and mitigation-reducing time-to-resolution and human toil

  • Explore and implement AI-assisted tooling for pattern recognition across incidents, automated runbook execution, and predictive reliability insights

  • Build intelligent systems that learn from our operational history, proactively surface risks, and recommend-or execute-mitigation actions

  • Balance automation with human judgment-designing systems where AI augments engineers rather than creating blind spots

Drive Company-Wide Impact

  • Own incident management end-to-end: response coordination, executive communication during major incidents, and blameless post-incident reviews that drive systemic improvement

  • Influence engineering culture across 100+ product teams-evangelizing reliability practices without compromising team autonomy

  • Identify systemic risks across the platform and drive cross-functional mitigation efforts

Represent Reliability at the Executive Level

  • Serve as the voice of reliability in leadership forums, translating technical risk into business terms

  • Communicate transparently with customers and stakeholders during and after operational incidents

  • Partner with peer directors across Infrastructure, Product Engineering, and Security to align on shared priorities

What You’ll Bring

Required Qualifications

  • 10+ years of experience in software engineering, SRE, or infrastructure, with 5+ years leading teams

  • Track record of building and scaling reliability functions at companies with significant operational complexity

  • Deep technical fluency-you can dive into architecture discussions, incident analysis, and system design with credibility

  • Curiosity and vision for how AI/ML can transform operations; experience with or strong interest in AIOps, agentic automation, or ML-driven observability is a plus

  • Proven ability to drive cultural and process change across a large engineering organization without top-down mandates

  • Strong executive communication skills; comfortable leading incident bridges, presenting to leadership, and representing reliability externally

  • Experience with modern cloud infrastructure (AWS preferred), observability tooling, and incident management practices

  • A philosophy that balances reliability with velocity-you understand that the goal is sustainable speed, not gates

Why This Role

This is a high-visibility, high-impact leadership role at an inflection point. You'll own one of Infrastructure's four core pillars at a company where platform stability directly enables customer growth. You'll have the mandate to shape how AI transforms operational practices-not just at 

Apply for this position
Bookmark Report

About the job

Full-time
Ireland
Senior Level
Posted 6 days ago
director
aws
architecture
cloud
security

Apply for this position

Bookmark
Report
Enhancv advertisement
+ 1,284 new jobs added today
30,000+
Remote Jobs

Don't miss out — new listings every hour

Join Premium

Director - Reliability Engineering

HubSpot

POS-31619

Director, Reliability Engineering

Role Summary

Our mission at HubSpot is to help millions of organizations grow better. HubSpot’s engineering organization has grown to more than 2,000 engineers shipping across thousands of services and deploying thousands of times per day. As HubSpot has become core infrastructure for over 200,000 customers worldwide, reliability isn’t just a priority — it’s foundational to customer trust and business growth.

Our Reliability Engineering team has matured from an early SRE function into a strategic pillar within Platform Infrastructure. The team has driven a 76% reduction in critical incidents while the platform scaled 19x in deployables, established company-wide SLO frameworks, and built the incident management practices that keep HubSpot running.

Now we’re entering the next phase: leveraging AI and agentic approaches to fundamentally transform how we detect, respond to, and prevent outages. As Director of Reliability Engineering, you’ll lead this evolution — deepening our reliability capabilities, pioneering AI-assisted operations, and ensuring HubSpot remains a platform customers can confidently bet their business on.

What You’ll Do

Lead and Develop the Team

  • Lead a team of ~20 reliability engineers, fostering a culture of operational excellence, continuous learning, and customer obsession

  • Attract, develop, and retain top talent; build career paths that keep engineers engaged and growing

Own Reliability Strategy

  • Define and drive HubSpot's reliability roadmap, balancing proactive resilience investments with reactive incident reduction

  • Partner with Infrastructure leadership to prioritize reliability initiatives alongside cost, performance, and platform evolution

  • Set and evolve SLO standards that align engineering effort with customer experience

Pioneer AI-Driven Operations

  • Lead the strategy for integrating AI and agentic approaches into incident detection, diagnosis, and mitigation-reducing time-to-resolution and human toil

  • Explore and implement AI-assisted tooling for pattern recognition across incidents, automated runbook execution, and predictive reliability insights

  • Build intelligent systems that learn from our operational history, proactively surface risks, and recommend-or execute-mitigation actions

  • Balance automation with human judgment-designing systems where AI augments engineers rather than creating blind spots

Drive Company-Wide Impact

  • Own incident management end-to-end: response coordination, executive communication during major incidents, and blameless post-incident reviews that drive systemic improvement

  • Influence engineering culture across 100+ product teams-evangelizing reliability practices without compromising team autonomy

  • Identify systemic risks across the platform and drive cross-functional mitigation efforts

Represent Reliability at the Executive Level

  • Serve as the voice of reliability in leadership forums, translating technical risk into business terms

  • Communicate transparently with customers and stakeholders during and after operational incidents

  • Partner with peer directors across Infrastructure, Product Engineering, and Security to align on shared priorities

What You’ll Bring

Required Qualifications

  • 10+ years of experience in software engineering, SRE, or infrastructure, with 5+ years leading teams

  • Track record of building and scaling reliability functions at companies with significant operational complexity

  • Deep technical fluency-you can dive into architecture discussions, incident analysis, and system design with credibility

  • Curiosity and vision for how AI/ML can transform operations; experience with or strong interest in AIOps, agentic automation, or ML-driven observability is a plus

  • Proven ability to drive cultural and process change across a large engineering organization without top-down mandates

  • Strong executive communication skills; comfortable leading incident bridges, presenting to leadership, and representing reliability externally

  • Experience with modern cloud infrastructure (AWS preferred), observability tooling, and incident management practices

  • A philosophy that balances reliability with velocity-you understand that the goal is sustainable speed, not gates

Why This Role

This is a high-visibility, high-impact leadership role at an inflection point. You'll own one of Infrastructure's four core pillars at a company where platform stability directly enables customer growth. You'll have the mandate to shape how AI transforms operational practices-not just at 

Working Nomads

Post Jobs
Premium Subscription
Sponsorship
Reviews
Job Alerts

Job Skills
Jobs by Location
Jobs by Experience Level
Jobs by Position Type
Jobs by Salary
API
Scam Alert
FAQ
Privacy policy
Terms and conditions
Contact us
About us

Jobs by Category

Remote Administration jobs
Remote Consulting jobs
Remote Customer Success jobs
Remote Development jobs
Remote Design jobs
Remote Education jobs
Remote Finance jobs
Remote Legal jobs
Remote Healthcare jobs
Remote Human Resources jobs
Remote Management jobs
Remote Marketing jobs
Remote Sales jobs
Remote System Administration jobs
Remote Writing jobs

Jobs by Position Type

Remote Full-time jobs
Remote Part-time jobs
Remote Contract jobs

Jobs by Region

Remote jobs Anywhere
Remote jobs North America
Remote jobs Latin America
Remote jobs Europe
Remote jobs Middle East
Remote jobs Africa
Remote jobs APAC

Jobs by Skill

Remote Accounting jobs
Remote Assistant jobs
Remote Copywriting jobs
Remote Cyber Security jobs
Remote Data Analyst jobs
Remote Data Entry jobs
Remote English jobs
Remote Entry Level jobs
Remote Spanish jobs
Remote Project Management jobs
Remote QA jobs
Remote SEO jobs

Jobs by Country

Remote jobs Australia
Remote jobs Argentina
Remote jobs Belgium
Remote jobs Brazil
Remote jobs Canada
Remote jobs Colombia
Remote jobs France
Remote jobs Germany
Remote jobs Ireland
Remote jobs India
Remote jobs Japan
Remote jobs Mexico
Remote jobs Netherlands
Remote jobs New Zealand
Remote jobs Philippines
Remote jobs Poland
Remote jobs Portugal
Remote jobs Singapore
Remote jobs Spain
Remote jobs UK
Remote jobs USA


Working Nomads curates remote digital jobs from around the web.

© 2026 Working Nomads.