Associate Production Support Engineer
About This Role
Are you passionate about delivering delight through operational excellence and solving complex technical puzzles? Do you thrive in fast-paced environments where your work directly impacts millions of Americans during one of life's most stressful experiences—moving?
Join Updater's Production Support team as we revolutionize how technology enables seamless moving experiences. As an Associate Production Support Engineer, you'll be the guardian of our revenue-generating systems while evolving into a reliability engineering partner who prevents incidents rather than just responding to them.
You'll work alongside a dedicated team that takes ownership of critical production systems, embraces a growth mindset in evolving toward Site Reliability Engineering practices, and believes in doing it right through operational excellence and systematic improvement.
The #1 Challenge You'll Solve
Your primary mission over the next 12 months: Directly contribute to the operational capabilities of our Production Support team while mastering the foundational elements that will enable your evolution into a reliability engineering role. You'll support our internal transition from a reactive incident response culture into a proactive reliability engineering practice that scales with Updater's growth.
What You'll Do
Core Responsibilities (70% of time during first 12 months)
Operational Excellence & Revenue Protection:
Monitor critical production systems that directly generate company revenue using DataDog dashboards and synthetic tests
Respond to incidents with speed and precision, following established escalation procedures to minimize business impact
Manage escalations from internal and external partner call center agents
Partner with Updater engineering, support teams, and our providers to resolve escalated issues
Participate in 24x7 on-call rotation, ensuring someone is always watching our systems
Triage and resolve production issues through JIRA workflows, maintaining clear communication with stakeholders
Incident Management & Communication:
Lead incident response for production outages, coordinating across teams to restore service quickly
Document incidents thoroughly and participate in blameless postmortem processes that focus on system improvement
Communicate effectively with internal teams, external partners, and leadership during high-stress situations
Build relationships across the organization, assuming positive intentions and celebrating team successes
Growth & Evolution Responsibilities (30% of time during first 12 months)
Site Reliability Engineering Development:
Learn to implement Service Level Objectives (SLOs) and Service Level Indicators (SLIs) for critical services
Develop basic automation scripts to reduce manual operational tasks (Python, Bash, AWS CLI)
Gain familiarity with Infrastructure as Code concepts using Terraform
Collaborate with the DevOps team to understand CI/CD pipelines and deployment automation
Contribute to developer self-service initiatives that reduce operational dependencies
Platform Engineering Integration:
Support the Platform Engineering mission by identifying opportunities to simplify and standardize operational processes
Help develop documentation and 'golden path' procedures that enable teams to operate more independently
Participate in cross-team collaboration to advance platform maturity and developer experience
Learn observability best practices including custom metrics, alerting, and Real User Monitoring (RUM)
About You
Required Experience & Skills
Technical Foundation:
2+ years of experience troubleshooting production systems, networks, or applications
Understanding of fundamental programming concepts and basic scripting abilities
Experience with SQL and relational databases for data analysis and troubleshooting
Familiarity with API testing tools (Postman) and web service troubleshooting
Basic understanding of Git concepts and collaborative development workflows
AWS Certified Cloud Practitioner level knowledge or equivalent cloud platform experience
Operational Excellence:
Proven ability to work effectively under pressure during system outages or critical incidents
Experience with ticketing systems and structured escalation procedures
Strong problem-solving skills with ability to analyze complex technical issues
Excellent time management and ability to prioritize multiple concurrent issues
Communication & Collaboration:
Professional communication style with both technical and non-technical stakeholders
Experience providing status updates and technical explanations during incident response
Ability to write clear documentation and incident reports
Comfort participating in bridge calls and leading technical discussions
Preferred Qualifications
Emerging SRE Skills:
Basic familiarity with Infrastructure as Code concepts (Terraform)
Experience with monitoring and observability tools (DataDog, Prometheus, Grafana)
Understanding of containerization and Kubernetes fundamentals
Knowledge of CI/CD pipeline concepts and deployment automation
Experience with configuration management or automation tools
Advanced Operational Experience:
Previous experience in customer-facing technical support roles
Background in system administration or DevOps practices
Experience with incident management frameworks (ITIL, SRE practices)
Understanding of SLA/SLO concepts and reliability engineering principles
Education & Experience
Bachelor's degree in Computer Science, Information Technology, Engineering, or related technical field
OR equivalent work experience (additional 2+ years of hands-on technical experience in lieu of degree)
2+ years of experience in technical support, system administration, DevOps, or related operational roles
Demonstrated ability to learn new technologies quickly and adapt to changing technical environments
Compensation: This posting is anticipated to remain open until August 17, 2025. The new hire base salary range if $70,000-$95,000. Factors which may affect the starting pay within this range include skills, experience, and other qualifications aligned with Updater's internal leveling guidelines.
About the job
Apply for this position
Associate Production Support Engineer
About This Role
Are you passionate about delivering delight through operational excellence and solving complex technical puzzles? Do you thrive in fast-paced environments where your work directly impacts millions of Americans during one of life's most stressful experiences—moving?
Join Updater's Production Support team as we revolutionize how technology enables seamless moving experiences. As an Associate Production Support Engineer, you'll be the guardian of our revenue-generating systems while evolving into a reliability engineering partner who prevents incidents rather than just responding to them.
You'll work alongside a dedicated team that takes ownership of critical production systems, embraces a growth mindset in evolving toward Site Reliability Engineering practices, and believes in doing it right through operational excellence and systematic improvement.
The #1 Challenge You'll Solve
Your primary mission over the next 12 months: Directly contribute to the operational capabilities of our Production Support team while mastering the foundational elements that will enable your evolution into a reliability engineering role. You'll support our internal transition from a reactive incident response culture into a proactive reliability engineering practice that scales with Updater's growth.
What You'll Do
Core Responsibilities (70% of time during first 12 months)
Operational Excellence & Revenue Protection:
Monitor critical production systems that directly generate company revenue using DataDog dashboards and synthetic tests
Respond to incidents with speed and precision, following established escalation procedures to minimize business impact
Manage escalations from internal and external partner call center agents
Partner with Updater engineering, support teams, and our providers to resolve escalated issues
Participate in 24x7 on-call rotation, ensuring someone is always watching our systems
Triage and resolve production issues through JIRA workflows, maintaining clear communication with stakeholders
Incident Management & Communication:
Lead incident response for production outages, coordinating across teams to restore service quickly
Document incidents thoroughly and participate in blameless postmortem processes that focus on system improvement
Communicate effectively with internal teams, external partners, and leadership during high-stress situations
Build relationships across the organization, assuming positive intentions and celebrating team successes
Growth & Evolution Responsibilities (30% of time during first 12 months)
Site Reliability Engineering Development:
Learn to implement Service Level Objectives (SLOs) and Service Level Indicators (SLIs) for critical services
Develop basic automation scripts to reduce manual operational tasks (Python, Bash, AWS CLI)
Gain familiarity with Infrastructure as Code concepts using Terraform
Collaborate with the DevOps team to understand CI/CD pipelines and deployment automation
Contribute to developer self-service initiatives that reduce operational dependencies
Platform Engineering Integration:
Support the Platform Engineering mission by identifying opportunities to simplify and standardize operational processes
Help develop documentation and 'golden path' procedures that enable teams to operate more independently
Participate in cross-team collaboration to advance platform maturity and developer experience
Learn observability best practices including custom metrics, alerting, and Real User Monitoring (RUM)
About You
Required Experience & Skills
Technical Foundation:
2+ years of experience troubleshooting production systems, networks, or applications
Understanding of fundamental programming concepts and basic scripting abilities
Experience with SQL and relational databases for data analysis and troubleshooting
Familiarity with API testing tools (Postman) and web service troubleshooting
Basic understanding of Git concepts and collaborative development workflows
AWS Certified Cloud Practitioner level knowledge or equivalent cloud platform experience
Operational Excellence:
Proven ability to work effectively under pressure during system outages or critical incidents
Experience with ticketing systems and structured escalation procedures
Strong problem-solving skills with ability to analyze complex technical issues
Excellent time management and ability to prioritize multiple concurrent issues
Communication & Collaboration:
Professional communication style with both technical and non-technical stakeholders
Experience providing status updates and technical explanations during incident response
Ability to write clear documentation and incident reports
Comfort participating in bridge calls and leading technical discussions
Preferred Qualifications
Emerging SRE Skills:
Basic familiarity with Infrastructure as Code concepts (Terraform)
Experience with monitoring and observability tools (DataDog, Prometheus, Grafana)
Understanding of containerization and Kubernetes fundamentals
Knowledge of CI/CD pipeline concepts and deployment automation
Experience with configuration management or automation tools
Advanced Operational Experience:
Previous experience in customer-facing technical support roles
Background in system administration or DevOps practices
Experience with incident management frameworks (ITIL, SRE practices)
Understanding of SLA/SLO concepts and reliability engineering principles
Education & Experience
Bachelor's degree in Computer Science, Information Technology, Engineering, or related technical field
OR equivalent work experience (additional 2+ years of hands-on technical experience in lieu of degree)
2+ years of experience in technical support, system administration, DevOps, or related operational roles
Demonstrated ability to learn new technologies quickly and adapt to changing technical environments
Compensation: This posting is anticipated to remain open until August 17, 2025. The new hire base salary range if $70,000-$95,000. Factors which may affect the starting pay within this range include skills, experience, and other qualifications aligned with Updater's internal leveling guidelines.