Senior Site Reliability Engineer
To comply with U.S. federal government requirements, U.S. citizenship is required for this position.
Who we are...
ScienceLogic is redefining IT operations for the modern enterprise. Our AIOps platform empowers organizations to achieve Autonomic IT — where systems are self-healing, self-optimizing, and seamlessly aligned with business outcomes. We help enterprises and service providers gain unified visibility across hybrid and multi-cloud environments, automate workflows, and unlock performance at scale. We’re accelerating digital transformation through the power of automation, AI, and analytics — giving IT and business leaders the tools to deliver superior customer experiences, drive efficiency, and innovate with confidence.
What we’re looking for…
We are looking for a Senior or Principal Site Reliability Engineer who is well versed in building cloud technologies in a secure manner, has an automation mindset, and is an ardent follower of the SRE discipline. If this sounds like you, then our team will benefit from your skillset!
What you’ll be doing…
Lead design reviews and buildout of secure systems for delivering new Artificial Intelligence Product in SaaS, aiming for 99.99% uptime.
Design, automate, test, and monitor the use of cloud native technologies as a foundation for a service platform.
Spend 75% of your time on forward looking priorities designing and building SaaS systems while remaining on supporting the Operations and Maintenance of the current SaaS infrastructure.
Investigate and resolve customer and operational issues with the mentality of fixing and not just mitigating issues.
Identify and automate measurement of operations SLAs and SLOs
Triage incident response, document SOPs, Runbooks, and train NOC team members
Writing automation can be easily supported and extended by others.
Collaborate across the organization to design, build and operationalize SaaS services conforming to various security standards like FedRAMP, SOC2, ISO etc.
Participate in the on-call rotation as assigned.
Take full responsibility for the availability and performance of the platform.
Work on special projects as assigned.
Qualities you possess…
8-12 years of site reliability engineering, cloud operations or equivalent experience
Proven experience in managing complex Kubernetes environments in multiple Production systems.
Working with Cloud Automation tools like CloudFormation, Terraform, aws-cli/CDK, Cloudformation
Scripting languages like Python, Bash, Perl etc.
Exposure to Linux administration skills.
Proven track record of operating production SaaS environments within security standards like FedRAMP, SOC2, ISO, PCI.
Skilled at problem solving, algorithms, and data structures conforming to the modern SaaS security requirements.
Building tools and scripting frameworks from scratch.
Familiarity with basic networking, security and cloud engineering concepts
Highly collaborative with effective written and verbal communication skills
Ability to work against tight deadlines and occasionally after-hours, part of on-call scheduling.
Occasionally work during off-hours and participate in weekly on-call schedule.
Take full responsibility for the availability and performance of the platform.
Bachelors or Master's degree in Computer Science, Information Systems or similar field.
Benefits & Perks
A remote flexible workplace.
Comprehensive medical, dental and vision plans.
401(k) plan with employer match.
Flexible Paid Time Off (FTO) so that you can take the time that you need to re-energize.
Volunteer Time Off (VTO) - take two days off per calendar year to volunteer with your preferred charitable organization.
5-year Service Milestone Sabbatical.
Paid parental leave.
Generous employee referral bonus program.
Pet insurance.
HQ Office centrally located in Reston Town Center featuring a well-stocked kitchen with rotating snacks and beverages, and catered lunch on Thursdays.
Regular virtual company-wide events, including cooking classes, yoga, meditation and more.
The opportunity to learn and develop from some of the best and brightest minds in the industry!
Don’t meet every single requirement? Studies have shown that women and people of color are less likely to apply to jobs unless they meet every single qualification. At ScienceLogic, we are dedicated to building a diverse, inclusive and authentic workplace, so if you’re excited about this role but your past experience doesn’t align perfectly with every qualification in the job description, we encourage you to apply anyway. You may be just the right candidate for this or other roles.
All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, or any other applicable legally protected characteristics in the location in which you are applying
About ScienceLogic
ScienceLogic empowers intelligent, automated IT operations, freeing up time and resources, and driving business outcomes with actionable insights. ScienceLogic’s AIOps platform sees broadly across clouds and on-premises, enabling business service visibility with relationship mapping, and workflow automation to eliminate manual tasks. Trusted by thousands of organizations across the globe, ScienceLogic’s technology has been proven for scale by the world’s largest service providers, enterprises and government agencies.
All ScienceLogic employees have the responsibility to protect information assets, adhere to access controls, report suspicious activity, and comply with security and privacy policies.
#LI-Remote
Senior Site Reliability Engineer
To comply with U.S. federal government requirements, U.S. citizenship is required for this position.
Who we are...
ScienceLogic is redefining IT operations for the modern enterprise. Our AIOps platform empowers organizations to achieve Autonomic IT — where systems are self-healing, self-optimizing, and seamlessly aligned with business outcomes. We help enterprises and service providers gain unified visibility across hybrid and multi-cloud environments, automate workflows, and unlock performance at scale. We’re accelerating digital transformation through the power of automation, AI, and analytics — giving IT and business leaders the tools to deliver superior customer experiences, drive efficiency, and innovate with confidence.
What we’re looking for…
We are looking for a Senior or Principal Site Reliability Engineer who is well versed in building cloud technologies in a secure manner, has an automation mindset, and is an ardent follower of the SRE discipline. If this sounds like you, then our team will benefit from your skillset!
What you’ll be doing…
Lead design reviews and buildout of secure systems for delivering new Artificial Intelligence Product in SaaS, aiming for 99.99% uptime.
Design, automate, test, and monitor the use of cloud native technologies as a foundation for a service platform.
Spend 75% of your time on forward looking priorities designing and building SaaS systems while remaining on supporting the Operations and Maintenance of the current SaaS infrastructure.
Investigate and resolve customer and operational issues with the mentality of fixing and not just mitigating issues.
Identify and automate measurement of operations SLAs and SLOs
Triage incident response, document SOPs, Runbooks, and train NOC team members
Writing automation can be easily supported and extended by others.
Collaborate across the organization to design, build and operationalize SaaS services conforming to various security standards like FedRAMP, SOC2, ISO etc.
Participate in the on-call rotation as assigned.
Take full responsibility for the availability and performance of the platform.
Work on special projects as assigned.
Qualities you possess…
8-12 years of site reliability engineering, cloud operations or equivalent experience
Proven experience in managing complex Kubernetes environments in multiple Production systems.
Working with Cloud Automation tools like CloudFormation, Terraform, aws-cli/CDK, Cloudformation
Scripting languages like Python, Bash, Perl etc.
Exposure to Linux administration skills.
Proven track record of operating production SaaS environments within security standards like FedRAMP, SOC2, ISO, PCI.
Skilled at problem solving, algorithms, and data structures conforming to the modern SaaS security requirements.
Building tools and scripting frameworks from scratch.
Familiarity with basic networking, security and cloud engineering concepts
Highly collaborative with effective written and verbal communication skills
Ability to work against tight deadlines and occasionally after-hours, part of on-call scheduling.
Occasionally work during off-hours and participate in weekly on-call schedule.
Take full responsibility for the availability and performance of the platform.
Bachelors or Master's degree in Computer Science, Information Systems or similar field.
Benefits & Perks
A remote flexible workplace.
Comprehensive medical, dental and vision plans.
401(k) plan with employer match.
Flexible Paid Time Off (FTO) so that you can take the time that you need to re-energize.
Volunteer Time Off (VTO) - take two days off per calendar year to volunteer with your preferred charitable organization.
5-year Service Milestone Sabbatical.
Paid parental leave.
Generous employee referral bonus program.
Pet insurance.
HQ Office centrally located in Reston Town Center featuring a well-stocked kitchen with rotating snacks and beverages, and catered lunch on Thursdays.
Regular virtual company-wide events, including cooking classes, yoga, meditation and more.
The opportunity to learn and develop from some of the best and brightest minds in the industry!
Don’t meet every single requirement? Studies have shown that women and people of color are less likely to apply to jobs unless they meet every single qualification. At ScienceLogic, we are dedicated to building a diverse, inclusive and authentic workplace, so if you’re excited about this role but your past experience doesn’t align perfectly with every qualification in the job description, we encourage you to apply anyway. You may be just the right candidate for this or other roles.
All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, or any other applicable legally protected characteristics in the location in which you are applying
About ScienceLogic
ScienceLogic empowers intelligent, automated IT operations, freeing up time and resources, and driving business outcomes with actionable insights. ScienceLogic’s AIOps platform sees broadly across clouds and on-premises, enabling business service visibility with relationship mapping, and workflow automation to eliminate manual tasks. Trusted by thousands of organizations across the globe, ScienceLogic’s technology has been proven for scale by the world’s largest service providers, enterprises and government agencies.
All ScienceLogic employees have the responsibility to protect information assets, adhere to access controls, report suspicious activity, and comply with security and privacy policies.
#LI-Remote
