MENU
  • Remote Jobs
  • Companies
  • Go Premium
  • Job Alerts
  • Post a Job
  • Log in
  • Sign up
Working Nomads logo Working Nomads
  • Remote Jobs
  • Companies
  • Post Jobs
  • Go Premium
  • Get Free Job Alerts
  • Log in

Senior Software Engineer, Infrastructure

CentML

Full-time
USA
software engineering
java
python
vmware
docker
The job listing has expired. Unfortunately, the hiring company is no longer accepting new applications.

To see similar active jobs please follow this link: Remote Development jobs

About Us

We believe AI will fundamentally transform how people live and work. CentML's mission is to massively reduce the cost of developing and deploying ML models so we can enable anyone to harness the power of AI and everyone to benefit from its potential.

Our founding team is made up of experts in AI, compilers, and ML hardware and has led efforts at companies like Amazon, Google, Microsoft Research, Nvidia, Intel, Qualcomm, and IBM. Our co-founder and CEO, Gennady Pekhimenko, is a world-renowned expert in ML systems who holds multiple academic and industry research awards from Google, Amazon, Facebook, and VMware.

Position Overview: 

We are seeking a highly motivated and skilled senior infrastructure engineer to join our team in a key role focused on designing, developing, and maintaining the CentML platform that offers a cost effective infrastructure for serving and training large scale machine learning models. As an infrastructure engineer, you will be responsible for laying out the design of a deployment infrastructure for ML training and inference jobs over GPU clusters that spans across multiple cloud service providers like AWS, GCP, Azure, Coreweave, and OCI. You should also be responsible for leading a team of engineers and building a scalable, performant, and reliable platform, enabling our customers to seamlessly access and utilize a comprehensive suite of ML services that we offer.

Responsibilities

  • Design and lead the development of the deployment infrastructure of the CentML platform. The deployment infrastructure manages the hardware resources necessary to deploy the ML training and inference applications.

  • Implementing GPU cluster scheduling solutions for large scale ML training and inference workloads to efficiently utilize the hardware resources in the GPU cluster.

  • Communicate with our product teams and define new features and goals for improving the CentML platform.

Qualifications

  • 4+ years of experience working with containerized deployment systems (e.g, kubernetes, openshift, terraform etc.).

  • A big plus if you have contributed to kubernetes and have expertise in container runtime technologies like docker engine, containerd, or CRI-O

  • Experience with deploying and managing cloud infrastructure on AWS, GCP, Azure

  • Past experience in building GPU clusters for large scale ML training and inference is desirable.

  • Knowledge in GPU architecture and Nvidia GPU virtualization technologies is highly desirable.

  • Strong coding skills in languages like Python, Java, Go, and/or C/C++.

Benefits & Perks

- An open and inclusive culture and work environment

- Fully stocked kitchen at the office

- Full health and dental benefits

- Parental Leave top-up for 6 months

- Continuous education budget

- Generous vacation - we're not saying unlimited, but if you need extra time to recharge, just ask

At CentML, we celebrate our differences and value cultivating an inclusive environment for all. We welcome applications of all kinds and are committed to providing an equal opportunity process.

About the job

Full-time
USA
Posted 1 year ago
software engineering
java
python
vmware
docker
Enhancv advertisement
+ 1,284 new jobs added today
30,000+
Remote Jobs

Don't miss out — new listings every hour

Join Premium

Senior Software Engineer, Infrastructure

CentML
The job listing has expired. Unfortunately, the hiring company is no longer accepting new applications.

To see similar active jobs please follow this link: Remote Development jobs

About Us

We believe AI will fundamentally transform how people live and work. CentML's mission is to massively reduce the cost of developing and deploying ML models so we can enable anyone to harness the power of AI and everyone to benefit from its potential.

Our founding team is made up of experts in AI, compilers, and ML hardware and has led efforts at companies like Amazon, Google, Microsoft Research, Nvidia, Intel, Qualcomm, and IBM. Our co-founder and CEO, Gennady Pekhimenko, is a world-renowned expert in ML systems who holds multiple academic and industry research awards from Google, Amazon, Facebook, and VMware.

Position Overview: 

We are seeking a highly motivated and skilled senior infrastructure engineer to join our team in a key role focused on designing, developing, and maintaining the CentML platform that offers a cost effective infrastructure for serving and training large scale machine learning models. As an infrastructure engineer, you will be responsible for laying out the design of a deployment infrastructure for ML training and inference jobs over GPU clusters that spans across multiple cloud service providers like AWS, GCP, Azure, Coreweave, and OCI. You should also be responsible for leading a team of engineers and building a scalable, performant, and reliable platform, enabling our customers to seamlessly access and utilize a comprehensive suite of ML services that we offer.

Responsibilities

  • Design and lead the development of the deployment infrastructure of the CentML platform. The deployment infrastructure manages the hardware resources necessary to deploy the ML training and inference applications.

  • Implementing GPU cluster scheduling solutions for large scale ML training and inference workloads to efficiently utilize the hardware resources in the GPU cluster.

  • Communicate with our product teams and define new features and goals for improving the CentML platform.

Qualifications

  • 4+ years of experience working with containerized deployment systems (e.g, kubernetes, openshift, terraform etc.).

  • A big plus if you have contributed to kubernetes and have expertise in container runtime technologies like docker engine, containerd, or CRI-O

  • Experience with deploying and managing cloud infrastructure on AWS, GCP, Azure

  • Past experience in building GPU clusters for large scale ML training and inference is desirable.

  • Knowledge in GPU architecture and Nvidia GPU virtualization technologies is highly desirable.

  • Strong coding skills in languages like Python, Java, Go, and/or C/C++.

Benefits & Perks

- An open and inclusive culture and work environment

- Fully stocked kitchen at the office

- Full health and dental benefits

- Parental Leave top-up for 6 months

- Continuous education budget

- Generous vacation - we're not saying unlimited, but if you need extra time to recharge, just ask

At CentML, we celebrate our differences and value cultivating an inclusive environment for all. We welcome applications of all kinds and are committed to providing an equal opportunity process.

Working Nomads

Post Jobs
Premium Subscription
Sponsorship
Reviews
Job Alerts

Job Skills
Jobs by Location
Jobs by Experience Level
Jobs by Position Type
Jobs by Salary
API
Scam Alert
FAQ
Privacy policy
Terms and conditions
Contact us
About us

Jobs by Category

Remote Administration jobs
Remote Consulting jobs
Remote Customer Success jobs
Remote Development jobs
Remote Design jobs
Remote Education jobs
Remote Finance jobs
Remote Legal jobs
Remote Healthcare jobs
Remote Human Resources jobs
Remote Management jobs
Remote Marketing jobs
Remote Sales jobs
Remote System Administration jobs
Remote Writing jobs

Jobs by Position Type

Remote Full-time jobs
Remote Part-time jobs
Remote Contract jobs

Jobs by Region

Remote jobs Anywhere
Remote jobs North America
Remote jobs Latin America
Remote jobs Europe
Remote jobs Middle East
Remote jobs Africa
Remote jobs APAC

Jobs by Skill

Remote Accounting jobs
Remote Assistant jobs
Remote Copywriting jobs
Remote Cyber Security jobs
Remote Data Analyst jobs
Remote Data Entry jobs
Remote English jobs
Remote Entry Level jobs
Remote Spanish jobs
Remote Project Management jobs
Remote QA jobs
Remote SEO jobs

Jobs by Country

Remote jobs Australia
Remote jobs Argentina
Remote jobs Belgium
Remote jobs Brazil
Remote jobs Canada
Remote jobs Colombia
Remote jobs France
Remote jobs Germany
Remote jobs Ireland
Remote jobs India
Remote jobs Japan
Remote jobs Mexico
Remote jobs Netherlands
Remote jobs New Zealand
Remote jobs Philippines
Remote jobs Poland
Remote jobs Portugal
Remote jobs Singapore
Remote jobs Spain
Remote jobs UK
Remote jobs USA


Working Nomads curates remote digital jobs from around the web.

© 2026 Working Nomads.