MENU
  • Remote Jobs
  • Companies
  • Go Premium
  • Job Alerts
  • Post a Job
  • Log in
  • Sign up
Working Nomads logo Working Nomads
  • Remote Jobs
  • Companies
  • Post Jobs
  • Go Premium
  • Get Free Job Alerts
  • Log in

Site Reliability Engineer

Zepz

Full-time
South Africa
engineer
devops
java
nodejs
python
The job listing has expired. Unfortunately, the hiring company is no longer accepting new applications.

To see similar active jobs please follow this link: Remote System Administration jobs

About the role

Working in the Site Reliability Engineering team, you’ll be helping ensure the stability, resilience and scale of our services through automation, observability and infrastructure engineering. The work is varied; from helping engineering teams deploy monitoring, to designing and implementing new SRE tools and techniques, our team is proactive and always involved. We are a fast moving team operating in a growing Fintech company, supporting engineers on three continents. We use a modern DevOps and SRE tech stack –Github Actions, K8s, ArgoCD, Grafana, AWS, Terraform, and Agile working practices to get the job done. As a member of Zepz’s SRE team you will aim high, embrace challenges and always do what’s right; acting with integrity and building trust as you contribute to the company’s technical direction and long term decision making.

Reporting to the SRE Manager you will:

  • Use code to solve problems. configuration, infrastructure, tooling, and automation, everything must be solved by writing high quality code that performs and scales.

  • Using best practices and standards in regards to Observability, Monitoring, Alerting, Capacity Planning, availability, performance/latency, change, troubleshooting for all our Tech services.

  • Work closely with feature teams to ensure that services are correctly monitored, change is delivered in a safe and secure way, resilience is built into our product and our standards and best practices adopted.

  • Lead or be involved in the troubleshooting of complex incidents and problems.

  • Have visibility on end to end service to our customers and ensure their journey is stable and consistent across all the microservices and 3rd party dependencies with the observability tool you will have implemented with the Engineering teams.

  • Helping the team meet its strategic goals; to maintain the highest level of observability, maximize developer velocity while keeping our product reliable, and ensure that we can deliver the highest quality experience to our customers.

  • Growing together. You’ll review others' work and happily seek feedback on yours to ensure we build a better codebase and sharpen each other's skills.

What we’re looking for from you

  • A skilled Engineer. At least 5 years in SRE, DevOps or Engineer role with a keen interest in solving problems using automation.

  • Understand SRE and DevOps methodologies. You understand the build and deployment cycle of an application, and how to operate a resilient system.

  • A focus on observability. Observability is key to operating a truly reliable and scalable system. We are looking for engineers who can 'Monitor Everything & Measure Everything', driving a culture of observability. Experience with Grafana, Loki and Prometheus.

  • Holistic view on application delivery. You understand the use of many systems; monitoring, logging, alerting, and scaling. To build a robust platform which can respond to varying demands from both external sources (traffic) and internal sources (feature team delivery) in a safe and controlled manner. You have experience supporting or developing applications written in Java, Python or node.js.

  • Systematic problem-solving approach. You should have an understanding of how to analyze, and troubleshoot large-scale distributed systems.

  • Happy in the Clouds. Our Cloud Native platform is hosted on AWS. You’ll be comfortable working with a system that supports users from around the world, at scale. 

  • Bias for action. You see a problem, you fix a problem. You get buy-in for your solutions and keep tickets moving. We’re always looking for ways to ship at pace.   

  • Growth mindset. A willingness to use your skills and experience to mentor less-experienced engineers. A desire to learn from others and make yourself better every day. 

  • Agile outlook. You need to be excited about working in a fast-changing environment. Products, tools, frameworks and processes change, we evolve and take the best bits with us. The teams drive the evolution.

  • Disciplined and self managed. You need to own your role and be disciplined about adhering to protocols and processes. As a senior you will always ensure you are bringing value to the team and driving tasks to completion without being actively managed.

Bonus points if you:

  • Have experience working in a FinTech space

  • Have experience working in a distributed team across different geographies and timezones

About the job

Full-time
South Africa
13 Applicants
Posted 2 months ago
engineer
devops
java
nodejs
python
Enhancv advertisement

30,000+
REMOTE JOBS

Unlock access to our database and
kickstart your remote career
Join Premium

Site Reliability Engineer

Zepz
The job listing has expired. Unfortunately, the hiring company is no longer accepting new applications.

To see similar active jobs please follow this link: Remote System Administration jobs

About the role

Working in the Site Reliability Engineering team, you’ll be helping ensure the stability, resilience and scale of our services through automation, observability and infrastructure engineering. The work is varied; from helping engineering teams deploy monitoring, to designing and implementing new SRE tools and techniques, our team is proactive and always involved. We are a fast moving team operating in a growing Fintech company, supporting engineers on three continents. We use a modern DevOps and SRE tech stack –Github Actions, K8s, ArgoCD, Grafana, AWS, Terraform, and Agile working practices to get the job done. As a member of Zepz’s SRE team you will aim high, embrace challenges and always do what’s right; acting with integrity and building trust as you contribute to the company’s technical direction and long term decision making.

Reporting to the SRE Manager you will:

  • Use code to solve problems. configuration, infrastructure, tooling, and automation, everything must be solved by writing high quality code that performs and scales.

  • Using best practices and standards in regards to Observability, Monitoring, Alerting, Capacity Planning, availability, performance/latency, change, troubleshooting for all our Tech services.

  • Work closely with feature teams to ensure that services are correctly monitored, change is delivered in a safe and secure way, resilience is built into our product and our standards and best practices adopted.

  • Lead or be involved in the troubleshooting of complex incidents and problems.

  • Have visibility on end to end service to our customers and ensure their journey is stable and consistent across all the microservices and 3rd party dependencies with the observability tool you will have implemented with the Engineering teams.

  • Helping the team meet its strategic goals; to maintain the highest level of observability, maximize developer velocity while keeping our product reliable, and ensure that we can deliver the highest quality experience to our customers.

  • Growing together. You’ll review others' work and happily seek feedback on yours to ensure we build a better codebase and sharpen each other's skills.

What we’re looking for from you

  • A skilled Engineer. At least 5 years in SRE, DevOps or Engineer role with a keen interest in solving problems using automation.

  • Understand SRE and DevOps methodologies. You understand the build and deployment cycle of an application, and how to operate a resilient system.

  • A focus on observability. Observability is key to operating a truly reliable and scalable system. We are looking for engineers who can 'Monitor Everything & Measure Everything', driving a culture of observability. Experience with Grafana, Loki and Prometheus.

  • Holistic view on application delivery. You understand the use of many systems; monitoring, logging, alerting, and scaling. To build a robust platform which can respond to varying demands from both external sources (traffic) and internal sources (feature team delivery) in a safe and controlled manner. You have experience supporting or developing applications written in Java, Python or node.js.

  • Systematic problem-solving approach. You should have an understanding of how to analyze, and troubleshoot large-scale distributed systems.

  • Happy in the Clouds. Our Cloud Native platform is hosted on AWS. You’ll be comfortable working with a system that supports users from around the world, at scale. 

  • Bias for action. You see a problem, you fix a problem. You get buy-in for your solutions and keep tickets moving. We’re always looking for ways to ship at pace.   

  • Growth mindset. A willingness to use your skills and experience to mentor less-experienced engineers. A desire to learn from others and make yourself better every day. 

  • Agile outlook. You need to be excited about working in a fast-changing environment. Products, tools, frameworks and processes change, we evolve and take the best bits with us. The teams drive the evolution.

  • Disciplined and self managed. You need to own your role and be disciplined about adhering to protocols and processes. As a senior you will always ensure you are bringing value to the team and driving tasks to completion without being actively managed.

Bonus points if you:

  • Have experience working in a FinTech space

  • Have experience working in a distributed team across different geographies and timezones

Working Nomads

Post Jobs
Premium Subscription
Sponsorship
Free Job Alerts

Job Skills
API
FAQ
Privacy policy
Terms and conditions
Contact us
About us

Jobs by Category

Remote Administration jobs
Remote Consulting jobs
Remote Customer Success jobs
Remote Development jobs
Remote Design jobs
Remote Education jobs
Remote Finance jobs
Remote Legal jobs
Remote Healthcare jobs
Remote Human Resources jobs
Remote Management jobs
Remote Marketing jobs
Remote Sales jobs
Remote System Administration jobs
Remote Writing jobs

Jobs by Position Type

Remote Full-time jobs
Remote Part-time jobs
Remote Contract jobs

Jobs by Region

Remote jobs Anywhere
Remote jobs North America
Remote jobs Latin America
Remote jobs Europe
Remote jobs Middle East
Remote jobs Africa
Remote jobs APAC

Jobs by Skill

Remote Accounting jobs
Remote Assistant jobs
Remote Copywriting jobs
Remote Cyber Security jobs
Remote Data Analyst jobs
Remote Data Entry jobs
Remote English jobs
Remote Spanish jobs
Remote Project Management jobs
Remote QA jobs
Remote SEO jobs

Jobs by Country

Remote jobs Australia
Remote jobs Argentina
Remote jobs Brazil
Remote jobs Canada
Remote jobs Colombia
Remote jobs France
Remote jobs Germany
Remote jobs Ireland
Remote jobs India
Remote jobs Japan
Remote jobs Mexico
Remote jobs Netherlands
Remote jobs New Zealand
Remote jobs Philippines
Remote jobs Poland
Remote jobs Portugal
Remote jobs Singapore
Remote jobs Spain
Remote jobs UK
Remote jobs USA


Working Nomads curates remote digital jobs from around the web.

© 2025 Working Nomads.