MENU
  • Remote Jobs
  • Companies
  • Go Premium
  • Job Alerts
  • Post a Job
  • Log in
  • Sign up
Working Nomads logo Working Nomads
  • Remote Jobs
  • Companies
  • Post Jobs
  • Go Premium
  • Get Free Job Alerts
  • Log in

Lead Site Reliability Engineer

hims & hers

Full-time
USA
$150k-$175k per year
engineer
java
python
docker
sql
The job listing has expired. Unfortunately, the hiring company is no longer accepting new applications.

To see similar active jobs please follow this link: Remote Development jobs

About the Role:

We are seeking a Lead Site Reliability Engineer to help build a reliable web experience for our users. We believe that moving fast is our competitive advantage, and enables us to better serve our users. We also know that the faster we move, the more likely we are to break things.

You Will:

  • Design and implement SRE practices ensuring availability, scalability and observability of production systems with a strong focus on excellent customer experience

  • Actively seek and identify opportunities to improve the availability and performance of the system by applying the learnings from monitoring and observation

  • Use automation extensively to design, configure, manage, and monitor systems in support of our product development teams

  • Understanding of Infrastructure and infra automation (Infrastructure as Code)

  • Manage incidents and emergency response, track outages, ensure data integrity and engineer releases to promote safe, efficient and rapid deployments

  • Handle emergency response either by being on-call or by reacting to symptoms according to monitoring and escalation when needed

  • Improve the codebase by resolving logic issues, deprecating unused code, etc.

  • Implement monitoring, logging, alerting and SLO Reporting

  • Identify Service Level Indicators (SLIs) that will align the team to meet the availability and performance objectives

  • Perform and run blameless RCAs on incidents and outages aggressively looking for answers that will prevent incident reoccurrence

  • Provides reviews on design documents from internal and external teams

  • Performs more-complex tasks using highly-specialized knowledge and advanced business experience

  • Resolves complex tickets in creative manners

  • Develops and leads large and highly-complex cross-functional projects or programs 

  • Determines solutions to blockers, identify tasks, and developing solutions as appropriate

  • Responsible for at least for 1 major delivery domain and accountable for all the aspects of SRE for that domain

  • Develops standards, tools, and knowledge requirements for skill and career development

You Have:

  • 10+ years as a software engineer, shipping production code

  • 5+ years of experience as a Site Reliability Engineer or Production support Engineer

  • Bachelor's degree in Computer Science, Engineering, or related field, or relevant years of work experience

  • Experience with service-oriented architectures and microservices at scale

  • Strong proficiency with RDBMS databases (PostgreSQL, MySQL, SQL Server, etc.)

  • Strong proficiency in SQL scripting

  • Proficiency developing in one or more languages such as Java, Kotlin, Python, and/or others

  • Ability to use containers and orchestration frameworks (Kubernetes, Docker, Container registries etc.)

  • Knowledge of CDN, typescript frameworks, and GQL.

  • Knowledge and good understanding of any pub/sub / Queue messaging systems 

  • Proficiency in Git or other VCS

  • Experience with configuring, customizing, and extending monitoring tools (Datadog, Prometheus, New Relic etc.)

  • Excellent debugging and troubleshooting skills

  • Strong technical competency, with a data-driven analytical approach towards solving complex challenges

  • Have a systematic problem-solving approach, coupled with strong and effective communication skills and a sense of drive

    • Nice-to-have: Experience with Terraform or other IAC tools such as Chef, Puppet or Ansible

Our Benefits (there are more but here are some highlights):

  • Competitive salary & equity compensation for full-time roles

  • Unlimited PTO, company holidays, and quarterly mental health days

  • Comprehensive health benefits including medical, dental & vision, and parental leave

  • Employee Stock Purchase Program (ESPP)

  • Employee discounts on hims & hers & Apostrophe online products

  • 401k benefits with employer matching contribution

  • Offsite team retreats

 

#LI-Remote

 

About the job

Full-time
USA
$150k-$175k per year
Posted 1 year ago
engineer
java
python
docker
sql
Enhancv advertisement
+ 1,284 new jobs added today
30,000+
Remote Jobs

Don't miss out — new listings every hour

Join Premium

Lead Site Reliability Engineer

hims & hers
The job listing has expired. Unfortunately, the hiring company is no longer accepting new applications.

To see similar active jobs please follow this link: Remote Development jobs

About the Role:

We are seeking a Lead Site Reliability Engineer to help build a reliable web experience for our users. We believe that moving fast is our competitive advantage, and enables us to better serve our users. We also know that the faster we move, the more likely we are to break things.

You Will:

  • Design and implement SRE practices ensuring availability, scalability and observability of production systems with a strong focus on excellent customer experience

  • Actively seek and identify opportunities to improve the availability and performance of the system by applying the learnings from monitoring and observation

  • Use automation extensively to design, configure, manage, and monitor systems in support of our product development teams

  • Understanding of Infrastructure and infra automation (Infrastructure as Code)

  • Manage incidents and emergency response, track outages, ensure data integrity and engineer releases to promote safe, efficient and rapid deployments

  • Handle emergency response either by being on-call or by reacting to symptoms according to monitoring and escalation when needed

  • Improve the codebase by resolving logic issues, deprecating unused code, etc.

  • Implement monitoring, logging, alerting and SLO Reporting

  • Identify Service Level Indicators (SLIs) that will align the team to meet the availability and performance objectives

  • Perform and run blameless RCAs on incidents and outages aggressively looking for answers that will prevent incident reoccurrence

  • Provides reviews on design documents from internal and external teams

  • Performs more-complex tasks using highly-specialized knowledge and advanced business experience

  • Resolves complex tickets in creative manners

  • Develops and leads large and highly-complex cross-functional projects or programs 

  • Determines solutions to blockers, identify tasks, and developing solutions as appropriate

  • Responsible for at least for 1 major delivery domain and accountable for all the aspects of SRE for that domain

  • Develops standards, tools, and knowledge requirements for skill and career development

You Have:

  • 10+ years as a software engineer, shipping production code

  • 5+ years of experience as a Site Reliability Engineer or Production support Engineer

  • Bachelor's degree in Computer Science, Engineering, or related field, or relevant years of work experience

  • Experience with service-oriented architectures and microservices at scale

  • Strong proficiency with RDBMS databases (PostgreSQL, MySQL, SQL Server, etc.)

  • Strong proficiency in SQL scripting

  • Proficiency developing in one or more languages such as Java, Kotlin, Python, and/or others

  • Ability to use containers and orchestration frameworks (Kubernetes, Docker, Container registries etc.)

  • Knowledge of CDN, typescript frameworks, and GQL.

  • Knowledge and good understanding of any pub/sub / Queue messaging systems 

  • Proficiency in Git or other VCS

  • Experience with configuring, customizing, and extending monitoring tools (Datadog, Prometheus, New Relic etc.)

  • Excellent debugging and troubleshooting skills

  • Strong technical competency, with a data-driven analytical approach towards solving complex challenges

  • Have a systematic problem-solving approach, coupled with strong and effective communication skills and a sense of drive

    • Nice-to-have: Experience with Terraform or other IAC tools such as Chef, Puppet or Ansible

Our Benefits (there are more but here are some highlights):

  • Competitive salary & equity compensation for full-time roles

  • Unlimited PTO, company holidays, and quarterly mental health days

  • Comprehensive health benefits including medical, dental & vision, and parental leave

  • Employee Stock Purchase Program (ESPP)

  • Employee discounts on hims & hers & Apostrophe online products

  • 401k benefits with employer matching contribution

  • Offsite team retreats

 

#LI-Remote

 

Working Nomads

Post Jobs
Premium Subscription
Sponsorship
Reviews
Job Alerts

Job Skills
Jobs by Location
Jobs by Experience Level
Jobs by Position Type
Jobs by Salary
API
Scam Alert
FAQ
Privacy policy
Terms and conditions
Contact us
About us

Jobs by Category

Remote Administration jobs
Remote Consulting jobs
Remote Customer Success jobs
Remote Development jobs
Remote Design jobs
Remote Education jobs
Remote Finance jobs
Remote Legal jobs
Remote Healthcare jobs
Remote Human Resources jobs
Remote Management jobs
Remote Marketing jobs
Remote Sales jobs
Remote System Administration jobs
Remote Writing jobs

Jobs by Position Type

Remote Full-time jobs
Remote Part-time jobs
Remote Contract jobs

Jobs by Region

Remote jobs Anywhere
Remote jobs North America
Remote jobs Latin America
Remote jobs Europe
Remote jobs Middle East
Remote jobs Africa
Remote jobs APAC

Jobs by Skill

Remote Accounting jobs
Remote Assistant jobs
Remote Copywriting jobs
Remote Cyber Security jobs
Remote Data Analyst jobs
Remote Data Entry jobs
Remote English jobs
Remote Entry Level jobs
Remote Spanish jobs
Remote Project Management jobs
Remote QA jobs
Remote SEO jobs

Jobs by Country

Remote jobs Australia
Remote jobs Argentina
Remote jobs Belgium
Remote jobs Brazil
Remote jobs Canada
Remote jobs Colombia
Remote jobs France
Remote jobs Germany
Remote jobs Ireland
Remote jobs India
Remote jobs Japan
Remote jobs Mexico
Remote jobs Netherlands
Remote jobs New Zealand
Remote jobs Philippines
Remote jobs Poland
Remote jobs Portugal
Remote jobs Singapore
Remote jobs Spain
Remote jobs UK
Remote jobs USA


Working Nomads curates remote digital jobs from around the web.

© 2026 Working Nomads.