Senior Site Reliability Engineer

Olo
Full-time
USA
Posted 3 years ago
The job listing has expired. Unfortunately, the hiring company is no longer accepting new applications.

To see similar active jobs please follow this link: Remote System Administration jobs

Headquarters: New York, NY 
URL: https://www.olo.com/

You like food and you like reliability, this is the job for you!

 

Olo is experiencing tremendous growth and as we enhance our platform to support increased demand, it must be positioned for continued stability, reliability and resiliency. Reporting to the Engineering Manager of Site Reliability, the Site Reliability Engineer will partner with Engineering and Product Managers to learn, improve system availability and sharpen our execution skills to provide an amazing experience for our customers. 

Olo is a remote-first company, offering all full time employees the option to work from anywhere in the U.S. Additionally, our NYC office will remain available for those that prefer to go in. 

What You'll Do

  • Guide observability and SLIs/SLOs to Incident Response to postmortems and follow-up actions.
  • Implement and tailor our incident response tools to minimize outage durations.
  • Build collaborative monitoring solutions with members across multiple product teams.
  • Contribute insights across teams to help us improve or re-architect existing systems to support scale, performance and extensibility.
  • Rethink our observability tooling to improve architecture, knowledge models, user experience, performance and stability.
  • Analyze and mature our processes around Incident Response, Observability, Postmortems and Predictive Monitoring.
  • Influence an engineering culture of reliability, observability, and availability.
  • Mentor engineering teams through game days, SRE boot camps and other training and feedback channels.


What We'll Expect From You

  • 3+ years of professional experience building scalable, efficient, and resilient systems.
  • Experience with monitoring tools like Datadog, Sumo Logic, Raygun, New Relic or similar.
  • Fluency in Incident Management using tools such as FireHydrant, OpsGenie, PagerDuty, VictorOps or similar.
  • Experience with build and deploy tools (ie. Jenkins, TeamCity, Octopus, or CircleCI).
  • Prior hands-on software development experience.
About the Job
Full-time
USA
Posted 3 years ago
Check if your resume is a good fit
25/100
Get Full Report
+ 1,284 new jobs added today
30,000+
Remote Jobs

Don't miss out — new listings every hour

Join Premium

Senior Site Reliability Engineer

Olo
The job listing has expired. Unfortunately, the hiring company is no longer accepting new applications.

To see similar active jobs please follow this link: Remote System Administration jobs

Headquarters: New York, NY 
URL: https://www.olo.com/

You like food and you like reliability, this is the job for you!

 

Olo is experiencing tremendous growth and as we enhance our platform to support increased demand, it must be positioned for continued stability, reliability and resiliency. Reporting to the Engineering Manager of Site Reliability, the Site Reliability Engineer will partner with Engineering and Product Managers to learn, improve system availability and sharpen our execution skills to provide an amazing experience for our customers. 

Olo is a remote-first company, offering all full time employees the option to work from anywhere in the U.S. Additionally, our NYC office will remain available for those that prefer to go in. 

What You'll Do

  • Guide observability and SLIs/SLOs to Incident Response to postmortems and follow-up actions.
  • Implement and tailor our incident response tools to minimize outage durations.
  • Build collaborative monitoring solutions with members across multiple product teams.
  • Contribute insights across teams to help us improve or re-architect existing systems to support scale, performance and extensibility.
  • Rethink our observability tooling to improve architecture, knowledge models, user experience, performance and stability.
  • Analyze and mature our processes around Incident Response, Observability, Postmortems and Predictive Monitoring.
  • Influence an engineering culture of reliability, observability, and availability.
  • Mentor engineering teams through game days, SRE boot camps and other training and feedback channels.


What We'll Expect From You

  • 3+ years of professional experience building scalable, efficient, and resilient systems.
  • Experience with monitoring tools like Datadog, Sumo Logic, Raygun, New Relic or similar.
  • Fluency in Incident Management using tools such as FireHydrant, OpsGenie, PagerDuty, VictorOps or similar.
  • Experience with build and deploy tools (ie. Jenkins, TeamCity, Octopus, or CircleCI).
  • Prior hands-on software development experience.