MENU
  • Remote Jobs
  • Companies
  • Go Premium
  • Job Alerts
  • Post a Job
  • Log in
  • Sign up
Working Nomads logo Working Nomads
  • Remote Jobs
  • Companies
  • Post Jobs
  • Go Premium
  • Get Free Job Alerts
  • Log in

Senior Machine Learning Engineer - ML Training Platform

Reddit

Full-time
USA
$217k-$303k per year
machine learning
engineer
software engineering
python
docker
Apply for this position

Who We Are: The Machine Learning Platform team at Reddit is a high-impact team that owns the infrastructure that powers recommendations, content discovery, user and content quantification, while directly impacting other teams such as Growth, Ads, Feeds, and Core Machine Learning teams.

What You’ll Do: As a Senior Software Engineer, Machine Learning Platform (Training Platform), you will be instrumental in architecting, implementing, and maintaining foundational Machine Learning (ML) infrastructure that powers Feeds Ranking, Content Understanding, Recommendations and much more to fulfill Reddit’s mission of bringing community and belonging to everyone in the world.  You will deliver a self service ML platform that enables the continuous iteration and improvement of systems that use ML techniques including Deep Learning, Natural Language Processing, Recommendation Systems, Representation Learning and Computer Vision.

  • Lead the building, testing, and maintenance of ML training infrastructure at Reddit.

  • Play a pivotal role in designing, building, and optimizing the infrastructure and tooling required to support large-scale machine learning workflows.

  • Evolve the MLE experience, from provisioning interactive GPU environments through large-scale training, supporting on-demand and self-service workflows.

  • Kubernetes Automation: Write custom Kubernetes Controllers and Operators to manage the lifecycle of interactive Jupyter workspaces and long-running ML training jobs, handle auto-idling, and ensure fault tolerance.

  • GPU Orchestration: Work with the underlying compute team to ensure MLEs have efficient access to training hardware resources and handle resource contention gracefully.

  • Developer Experience (DevX): Treat internal MLEs as your customers. Conduct user research, reduce friction in the 'Idea-to-Prototype' loop, and standardize software environments (Docker images, Python dependency management).

Who You Might Be:

  • 5+ years of software engineering experience, with a focus on Platform Engineering, ML Infrastructure, or Backend Systems.

  • Deep Kubernetes Expertise: You know K8s beyond just 'deploying pods.' You understand CRDs, Controllers and the Operator pattern.

  • Jupyter Ecosystem Knowledge: Experience customizing JupyterHub, JupyterLab extensions, or building similar interactive computing platforms.

  • Strong Coding Skills: Proficiency in Python (for the ML ecosystem) and Go (for Kubernetes controllers/infrastructure tooling).

  • GPU Experience: Hands-on practice with CUDA environments, GPU virtualization/containerization, and doing it all within Kubernetes.

  • Cloud Provider Experience: Familiarity with both managed ML offerings (Vertex AI, Sagemaker, etc) and building custom ML components in AWS and/or GCP.

  • Experience working with distributed training frameworks, including Ray and Kubernetes.

  • Comfortable with distributed systems, big data (Petabyte scale) and data-intensive systems.

  • Strong focus on scalability, reliability, performance, and ease of use. You are an undying advocate for platform users and have a deep intuition for the machine learning development lifecycle.

  • Strong organizational & communication skills.

Benefits

  • Comprehensive Healthcare Benefits and Income Replacement Programs

  • 401k Match

  • Family Planning Support

  • Gender-Affirming Care

  • Mental Health & Coaching Benefits

  • Flexible Vacation & Reddit Global Days off

  • Generous paid Parental Leave  

  • Paid Volunteer time off

#LI-Remote

Pay Transparency:

This job posting may span more than one career level.

In addition to base salary, this job is eligible to receive equity in the form of restricted stock units, and depending on the position offered, it may also be eligible to receive a commission. Additionally, Reddit offers a wide range of benefits to U.S.-based employees, including medical, dental, and vision insurance, 401(k) program with employer match, generous time off for vacation, and parental leave. To learn more, please visit https://www.redditinc.com/careers/.

To provide greater transparency to candidates, we share base salary ranges for all US-based job postings regardless of state. We set standard base pay ranges for all roles based on function, level, and country location, benchmarked against similar stage growth companies. Final offer amounts are determined by multiple factors including, skills, depth of work experience and relevant licenses/credentials, and may vary from the amounts listed below.

The base salary range for this position is:

$216,700—$303,400 USD

Apply for this position
Bookmark Report

About the job

Full-time
USA
Senior Level
$217k-$303k per year
Posted 6 days ago
machine learning
engineer
software engineering
python
docker

Apply for this position

Bookmark
Report
Enhancv advertisement
+ 1,284 new jobs added today
30,000+
Remote Jobs

Don't miss out — new listings every hour

Join Premium

Senior Machine Learning Engineer - ML Training Platform

Reddit

Who We Are: The Machine Learning Platform team at Reddit is a high-impact team that owns the infrastructure that powers recommendations, content discovery, user and content quantification, while directly impacting other teams such as Growth, Ads, Feeds, and Core Machine Learning teams.

What You’ll Do: As a Senior Software Engineer, Machine Learning Platform (Training Platform), you will be instrumental in architecting, implementing, and maintaining foundational Machine Learning (ML) infrastructure that powers Feeds Ranking, Content Understanding, Recommendations and much more to fulfill Reddit’s mission of bringing community and belonging to everyone in the world.  You will deliver a self service ML platform that enables the continuous iteration and improvement of systems that use ML techniques including Deep Learning, Natural Language Processing, Recommendation Systems, Representation Learning and Computer Vision.

  • Lead the building, testing, and maintenance of ML training infrastructure at Reddit.

  • Play a pivotal role in designing, building, and optimizing the infrastructure and tooling required to support large-scale machine learning workflows.

  • Evolve the MLE experience, from provisioning interactive GPU environments through large-scale training, supporting on-demand and self-service workflows.

  • Kubernetes Automation: Write custom Kubernetes Controllers and Operators to manage the lifecycle of interactive Jupyter workspaces and long-running ML training jobs, handle auto-idling, and ensure fault tolerance.

  • GPU Orchestration: Work with the underlying compute team to ensure MLEs have efficient access to training hardware resources and handle resource contention gracefully.

  • Developer Experience (DevX): Treat internal MLEs as your customers. Conduct user research, reduce friction in the 'Idea-to-Prototype' loop, and standardize software environments (Docker images, Python dependency management).

Who You Might Be:

  • 5+ years of software engineering experience, with a focus on Platform Engineering, ML Infrastructure, or Backend Systems.

  • Deep Kubernetes Expertise: You know K8s beyond just 'deploying pods.' You understand CRDs, Controllers and the Operator pattern.

  • Jupyter Ecosystem Knowledge: Experience customizing JupyterHub, JupyterLab extensions, or building similar interactive computing platforms.

  • Strong Coding Skills: Proficiency in Python (for the ML ecosystem) and Go (for Kubernetes controllers/infrastructure tooling).

  • GPU Experience: Hands-on practice with CUDA environments, GPU virtualization/containerization, and doing it all within Kubernetes.

  • Cloud Provider Experience: Familiarity with both managed ML offerings (Vertex AI, Sagemaker, etc) and building custom ML components in AWS and/or GCP.

  • Experience working with distributed training frameworks, including Ray and Kubernetes.

  • Comfortable with distributed systems, big data (Petabyte scale) and data-intensive systems.

  • Strong focus on scalability, reliability, performance, and ease of use. You are an undying advocate for platform users and have a deep intuition for the machine learning development lifecycle.

  • Strong organizational & communication skills.

Benefits

  • Comprehensive Healthcare Benefits and Income Replacement Programs

  • 401k Match

  • Family Planning Support

  • Gender-Affirming Care

  • Mental Health & Coaching Benefits

  • Flexible Vacation & Reddit Global Days off

  • Generous paid Parental Leave  

  • Paid Volunteer time off

#LI-Remote

Pay Transparency:

This job posting may span more than one career level.

In addition to base salary, this job is eligible to receive equity in the form of restricted stock units, and depending on the position offered, it may also be eligible to receive a commission. Additionally, Reddit offers a wide range of benefits to U.S.-based employees, including medical, dental, and vision insurance, 401(k) program with employer match, generous time off for vacation, and parental leave. To learn more, please visit https://www.redditinc.com/careers/.

To provide greater transparency to candidates, we share base salary ranges for all US-based job postings regardless of state. We set standard base pay ranges for all roles based on function, level, and country location, benchmarked against similar stage growth companies. Final offer amounts are determined by multiple factors including, skills, depth of work experience and relevant licenses/credentials, and may vary from the amounts listed below.

The base salary range for this position is:

$216,700—$303,400 USD

Working Nomads

Post Jobs
Premium Subscription
Sponsorship
Reviews
Job Alerts

Job Skills
Jobs by Location
Jobs by Experience Level
Jobs by Position Type
Jobs by Salary
API
Scam Alert
FAQ
Privacy policy
Terms and conditions
Contact us
About us

Jobs by Category

Remote Administration jobs
Remote Consulting jobs
Remote Customer Success jobs
Remote Development jobs
Remote Design jobs
Remote Education jobs
Remote Finance jobs
Remote Legal jobs
Remote Healthcare jobs
Remote Human Resources jobs
Remote Management jobs
Remote Marketing jobs
Remote Sales jobs
Remote System Administration jobs
Remote Writing jobs

Jobs by Position Type

Remote Full-time jobs
Remote Part-time jobs
Remote Contract jobs

Jobs by Region

Remote jobs Anywhere
Remote jobs North America
Remote jobs Latin America
Remote jobs Europe
Remote jobs Middle East
Remote jobs Africa
Remote jobs APAC

Jobs by Skill

Remote Accounting jobs
Remote Assistant jobs
Remote Copywriting jobs
Remote Cyber Security jobs
Remote Data Analyst jobs
Remote Data Entry jobs
Remote English jobs
Remote Entry Level jobs
Remote Spanish jobs
Remote Project Management jobs
Remote QA jobs
Remote SEO jobs

Jobs by Country

Remote jobs Australia
Remote jobs Argentina
Remote jobs Belgium
Remote jobs Brazil
Remote jobs Canada
Remote jobs Colombia
Remote jobs France
Remote jobs Germany
Remote jobs Ireland
Remote jobs India
Remote jobs Japan
Remote jobs Mexico
Remote jobs Netherlands
Remote jobs New Zealand
Remote jobs Philippines
Remote jobs Poland
Remote jobs Portugal
Remote jobs Singapore
Remote jobs Spain
Remote jobs UK
Remote jobs USA


Working Nomads curates remote digital jobs from around the web.

© 2026 Working Nomads.