Staff Machine Learning Engineer - ML Platform
Who We Are: The Machine Learning Platform team at Reddit is a high-impact team that owns the infrastructure that powers recommendations, content discovery, user and content quantification, while directly impacting other teams such as Growth, Ads, Feeds, and Core Machine Learning teams.
What You’ll Do: As a Staff ML Infrastructure Engineer, you will lead development of a platform for large scale ML models at Reddit.
Design end-to-end model lifecycle patterns (MLOps) to boost velocity of development for ML engineers, including data preparation, model management, experiment tracking, and more
Zero-to-one development and support of a graph ML codebase and platform that abstracts away common patterns and enables greater model scalability and iteration
Collaborate with ML engineers on performance tuning, including improving model training time, efficiency, and GPU training costs in a large, distributed ML training environment
Optimize batch data processing within a data warehouse and with tools such as Apache Beam, Apache Spark, Ray Data, and more
Architect pipelines to build and maintain massive graph data structures on the order of billions of nodes and tens of billions of edges
Who You Might Be:
7+ years of experience in ML infrastructure, including model training and model deployments
Hands-on experience with ML optimization, including memory and GPU profiling
Deep experience with cloud-based technologies for supporting an ML platform, including tools like GCP BigQuery, Google Cloud Storage, infrastructure-as-code (Terraform), and more
Hands-on experience administering and integrating MLOps tools for experiment tracking, model serving, and model registries (e.g. MLflow or Wandb)
Proficiency with the common programming languages and frameworks of ML, such as Python, PyTorch, Tensorflow, etc.
Deep experience working with distributed training frameworks, including Ray and Kubernetes
Strong focus on scalability, reliability, performance, and ease of use. You are an undying advocate for platform users and have a deep intuition for the machine learning development lifecycle.
Strong organizational & communication skills
Experience working with graph databases (Neo4j, JanusGraph, TigerGraph) is a big plus
Experience working with graph neural networks (GNNs) and associated graph ML frameworks (PyTorch Geometric, Deep Graph Library) is a big plus
Benefits:
Comprehensive Healthcare Benefits and Income Replacement Programs
401k Match
Family Planning Support
Gender-Affirming Care
Mental Health & Coaching Benefits
Flexible Vacation & Reddit Global Days off
Generous paid Parental Leave
Paid Volunteer time off
Pay Transparency:
This job posting may span more than one career level.
In addition to base salary, this job is eligible to receive equity in the form of restricted stock units, and depending on the position offered, it may also be eligible to receive a commission. Additionally, Reddit offers a wide range of benefits to U.S.-based employees, including medical, dental, and vision insurance, 401(k) program with employer match, generous time off for vacation, and parental leave. To learn more, please visit https://www.redditinc.com/careers/.
To provide greater transparency to candidates, we share base pay ranges for all US-based job postings regardless of state. We set standard base pay ranges for all roles based on function, level, and country location, benchmarked against similar stage growth companies. Final offer amounts are determined by multiple factors including, skills, depth of work experience and relevant licenses/credentials, and may vary from the amounts listed below.
The base pay range for this position is:
$230,000—$322,000 USD
About the job
Apply for this position
Staff Machine Learning Engineer - ML Platform
Who We Are: The Machine Learning Platform team at Reddit is a high-impact team that owns the infrastructure that powers recommendations, content discovery, user and content quantification, while directly impacting other teams such as Growth, Ads, Feeds, and Core Machine Learning teams.
What You’ll Do: As a Staff ML Infrastructure Engineer, you will lead development of a platform for large scale ML models at Reddit.
Design end-to-end model lifecycle patterns (MLOps) to boost velocity of development for ML engineers, including data preparation, model management, experiment tracking, and more
Zero-to-one development and support of a graph ML codebase and platform that abstracts away common patterns and enables greater model scalability and iteration
Collaborate with ML engineers on performance tuning, including improving model training time, efficiency, and GPU training costs in a large, distributed ML training environment
Optimize batch data processing within a data warehouse and with tools such as Apache Beam, Apache Spark, Ray Data, and more
Architect pipelines to build and maintain massive graph data structures on the order of billions of nodes and tens of billions of edges
Who You Might Be:
7+ years of experience in ML infrastructure, including model training and model deployments
Hands-on experience with ML optimization, including memory and GPU profiling
Deep experience with cloud-based technologies for supporting an ML platform, including tools like GCP BigQuery, Google Cloud Storage, infrastructure-as-code (Terraform), and more
Hands-on experience administering and integrating MLOps tools for experiment tracking, model serving, and model registries (e.g. MLflow or Wandb)
Proficiency with the common programming languages and frameworks of ML, such as Python, PyTorch, Tensorflow, etc.
Deep experience working with distributed training frameworks, including Ray and Kubernetes
Strong focus on scalability, reliability, performance, and ease of use. You are an undying advocate for platform users and have a deep intuition for the machine learning development lifecycle.
Strong organizational & communication skills
Experience working with graph databases (Neo4j, JanusGraph, TigerGraph) is a big plus
Experience working with graph neural networks (GNNs) and associated graph ML frameworks (PyTorch Geometric, Deep Graph Library) is a big plus
Benefits:
Comprehensive Healthcare Benefits and Income Replacement Programs
401k Match
Family Planning Support
Gender-Affirming Care
Mental Health & Coaching Benefits
Flexible Vacation & Reddit Global Days off
Generous paid Parental Leave
Paid Volunteer time off
Pay Transparency:
This job posting may span more than one career level.
In addition to base salary, this job is eligible to receive equity in the form of restricted stock units, and depending on the position offered, it may also be eligible to receive a commission. Additionally, Reddit offers a wide range of benefits to U.S.-based employees, including medical, dental, and vision insurance, 401(k) program with employer match, generous time off for vacation, and parental leave. To learn more, please visit https://www.redditinc.com/careers/.
To provide greater transparency to candidates, we share base pay ranges for all US-based job postings regardless of state. We set standard base pay ranges for all roles based on function, level, and country location, benchmarked against similar stage growth companies. Final offer amounts are determined by multiple factors including, skills, depth of work experience and relevant licenses/credentials, and may vary from the amounts listed below.
The base pay range for this position is:
$230,000—$322,000 USD
