Staff Software Engineer - Capacity Engineering
Pinterest is seeking a Staff Software Engineer, Capacity Engineering focused on managing and optimizing the ML infrastructure. The team is responsible for efficiently managing one of the largest-scale cloud-native infrastructures in the world. This role is highly impactful, as efficiency is an ongoing strategic priority for Pinterest. The role has direct visibility across Pinterest Engineering and with Engineering and company leadership. The team is looking for a candidate with a strong background in ML Infrastructure, focusing on efficiency and optimization.
What you’ll do
Manage the ML hardware capacity that powers the models running at Pinterest
Improve the efficiency of ML Infrastructure at Pinterest
Build develop and mature profiling and optimization capabilities for ML Infrastructure at Pinterest scale
Collaborate with ML Platform, Infrastructure Engineering and SRE teams in their mission to deliver highly available, resilient, secure and efficient ML foundations for Pinterest’s tech stack
What we’re looking for:
Deep understanding of GPU Architectures, Pytorch, etc.
Deep understanding of supporting parts of ML software stack like Scheduling, Data and Storage
Hands on experience with shared platforms like Kubernetes
Strong technical and performance engineering skills to collaborate with stakeholders on complex and ambiguous technical challenges
Experience building and managing highly available distributed applications at scale
Proficiency in software development languages such as Java, Python and C++
Excellent skills in communicating complex technical issues
Understanding of ML Models, Kernels and optimization opportunities
Hands-on experience with large, cloud-native multi-tenant platforms at Internet scale
Experience with AWS or similar cloud environments
Deep understanding of infrastructure capacity and performance
Bachelor’s degree in Computer Science, Engineering, or a related field, or equivalent experience.
In-Office Requirement Statement:
We let the type of work you do guide the collaboration style. That means we're not always working in an office, but we continue to gather for key moments of collaboration and connection.
This role will need to be in the office for in-person collaboration 1-2 times/quarter and therefore can be situated anywhere in the country.
Relocation Statement:
This position is not eligible for relocation assistance. Visit our PinFlex page to learn more about our working model.
#LI-REMOTE
#LI-JT1
About the job
Apply for this position
Staff Software Engineer - Capacity Engineering
Pinterest is seeking a Staff Software Engineer, Capacity Engineering focused on managing and optimizing the ML infrastructure. The team is responsible for efficiently managing one of the largest-scale cloud-native infrastructures in the world. This role is highly impactful, as efficiency is an ongoing strategic priority for Pinterest. The role has direct visibility across Pinterest Engineering and with Engineering and company leadership. The team is looking for a candidate with a strong background in ML Infrastructure, focusing on efficiency and optimization.
What you’ll do
Manage the ML hardware capacity that powers the models running at Pinterest
Improve the efficiency of ML Infrastructure at Pinterest
Build develop and mature profiling and optimization capabilities for ML Infrastructure at Pinterest scale
Collaborate with ML Platform, Infrastructure Engineering and SRE teams in their mission to deliver highly available, resilient, secure and efficient ML foundations for Pinterest’s tech stack
What we’re looking for:
Deep understanding of GPU Architectures, Pytorch, etc.
Deep understanding of supporting parts of ML software stack like Scheduling, Data and Storage
Hands on experience with shared platforms like Kubernetes
Strong technical and performance engineering skills to collaborate with stakeholders on complex and ambiguous technical challenges
Experience building and managing highly available distributed applications at scale
Proficiency in software development languages such as Java, Python and C++
Excellent skills in communicating complex technical issues
Understanding of ML Models, Kernels and optimization opportunities
Hands-on experience with large, cloud-native multi-tenant platforms at Internet scale
Experience with AWS or similar cloud environments
Deep understanding of infrastructure capacity and performance
Bachelor’s degree in Computer Science, Engineering, or a related field, or equivalent experience.
In-Office Requirement Statement:
We let the type of work you do guide the collaboration style. That means we're not always working in an office, but we continue to gather for key moments of collaboration and connection.
This role will need to be in the office for in-person collaboration 1-2 times/quarter and therefore can be situated anywhere in the country.
Relocation Statement:
This position is not eligible for relocation assistance. Visit our PinFlex page to learn more about our working model.
#LI-REMOTE
#LI-JT1