NOC Engineer
Job Title: NOC Engineer
How You'll Make an Impact:
As a NOC Engineer, you’ll be the frontline support for our global infrastructure, playing a key role in ensuring 24/7 operational stability across our AWS-based environment. Your core responsibilities will include monitoring critical systems through platforms such as Datadog, PagerDuty, and CloudWatch, rapidly validating alerts, and escalating verified incidents based on clearly defined protocols.
You’ll execute operational tasks, follow documented procedures for common issues, and manage standard maintenance activities. You'll also have opportunities to collaborate directly with senior engineers across SRE, DevOps, and Infrastructure teams, contributing to the resolution of a wide range of technical challenges and gaining exposure to complex, real-world systems.
Acting as the central communication point during incidents, you’ll maintain clear, timely updates to stakeholders and facilitate smooth transitions between engineering and support teams.
Our Network Operations Team:
You’ll be joining a brand-new team at the ground level, helping shape the future of SaaS operations for a company undergoing exciting growth. Working closely with SRE, DevOps, Security, and various Workload teams, you’ll be at the heart of collaborative problem-solving and operational innovation. It’s a rare chance to build, influence, and grow in a highly visible and impactful role.
This role offers a rare opportunity to gain deep, hands-on experience in cloud operations and incident management while working alongside high-performing engineering teams. You'll build the foundation for growth into specialized areas like SRE, DevOps, or Infrastructure Engineering, with direct exposure to real-world systems at scale.
What You Bring to the Team:
Continuously monitor alerting channels (PagerDuty, DataDog, CloudWatch, Prometheus/Grafana), validate alerts, filter false positives, and provide first-line support for site operations and infrastructure issues
Serve as the communication hub during incidents, providing regular status updates to stakeholders, escalating verified incidents to appropriate on-call teams, and maintaining incident bridges with proper handoffs
Execute documented runbooks and standard operating procedures for common issues, handling infrastructure access requests, basic troubleshooting, and deployment support activities
Investigate initial security alerts, monitor application performance, and process routine change requests, configuration updates, and maintenance tasks across operational teams
Create and maintain operational runbooks, update documentation based on incident learnings, and contribute to post-incident reviews to drive continuous improvement
Assist with monitoring configuration including adding new monitors, adjusting alert thresholds, and optimizing alerting systems to reduce noise and improve signal quality
Work independently during off-hours shifts in a remote, global team environment while maintaining strong collaboration and knowing when to escalate complex issues
What Sets You Up for Success:
3+ years of experience in IT operations, technical support, or related field with hands-on exposure to monitoring tools like DataDog, Prometheus/Grafana, or AWS CloudWatch
Strong understanding of incident response procedures, escalation protocols, and emergency response workflows with experience using ticketing systems (Jira) and project management tools
Foundational networking skills and basic understanding of AWS services including EC2, S3, CloudWatch, and IAM
Exposure to containerization concepts (Docker and Kubernetes) and basic ability to read and understand Terraform
Previous experience in a 24/7 operations environment with hands-on use of PagerDuty or similar alerting systems
Excellent written and verbal communication skills in English with ability to provide clear status updates during high-pressure incidents
Detail-oriented with strong documentation skills and ability to work effectively across multiple teams in a fully remote, global environment
Bachelor's degree in Computer Science or related field, OR equivalent practical experience demonstrating technical aptitude and problem-solving abilities
Bonus Skills to Stand Out (Optional):
Familiarity with web application servers (NGINX), databases (MySQL, PostgreSQL), caching technologies (Redis, Memcached), and Infrastructure as Code tools (Terraform)
Ability to write (not just read) Bash/Python scripts, understanding of CI/CD concepts (GitHub Actions, ArgoCD), and experience with queue technologies (SQS) or application performance monitoring best practices
About the job
Apply for this position
NOC Engineer
Job Title: NOC Engineer
How You'll Make an Impact:
As a NOC Engineer, you’ll be the frontline support for our global infrastructure, playing a key role in ensuring 24/7 operational stability across our AWS-based environment. Your core responsibilities will include monitoring critical systems through platforms such as Datadog, PagerDuty, and CloudWatch, rapidly validating alerts, and escalating verified incidents based on clearly defined protocols.
You’ll execute operational tasks, follow documented procedures for common issues, and manage standard maintenance activities. You'll also have opportunities to collaborate directly with senior engineers across SRE, DevOps, and Infrastructure teams, contributing to the resolution of a wide range of technical challenges and gaining exposure to complex, real-world systems.
Acting as the central communication point during incidents, you’ll maintain clear, timely updates to stakeholders and facilitate smooth transitions between engineering and support teams.
Our Network Operations Team:
You’ll be joining a brand-new team at the ground level, helping shape the future of SaaS operations for a company undergoing exciting growth. Working closely with SRE, DevOps, Security, and various Workload teams, you’ll be at the heart of collaborative problem-solving and operational innovation. It’s a rare chance to build, influence, and grow in a highly visible and impactful role.
This role offers a rare opportunity to gain deep, hands-on experience in cloud operations and incident management while working alongside high-performing engineering teams. You'll build the foundation for growth into specialized areas like SRE, DevOps, or Infrastructure Engineering, with direct exposure to real-world systems at scale.
What You Bring to the Team:
Continuously monitor alerting channels (PagerDuty, DataDog, CloudWatch, Prometheus/Grafana), validate alerts, filter false positives, and provide first-line support for site operations and infrastructure issues
Serve as the communication hub during incidents, providing regular status updates to stakeholders, escalating verified incidents to appropriate on-call teams, and maintaining incident bridges with proper handoffs
Execute documented runbooks and standard operating procedures for common issues, handling infrastructure access requests, basic troubleshooting, and deployment support activities
Investigate initial security alerts, monitor application performance, and process routine change requests, configuration updates, and maintenance tasks across operational teams
Create and maintain operational runbooks, update documentation based on incident learnings, and contribute to post-incident reviews to drive continuous improvement
Assist with monitoring configuration including adding new monitors, adjusting alert thresholds, and optimizing alerting systems to reduce noise and improve signal quality
Work independently during off-hours shifts in a remote, global team environment while maintaining strong collaboration and knowing when to escalate complex issues
What Sets You Up for Success:
3+ years of experience in IT operations, technical support, or related field with hands-on exposure to monitoring tools like DataDog, Prometheus/Grafana, or AWS CloudWatch
Strong understanding of incident response procedures, escalation protocols, and emergency response workflows with experience using ticketing systems (Jira) and project management tools
Foundational networking skills and basic understanding of AWS services including EC2, S3, CloudWatch, and IAM
Exposure to containerization concepts (Docker and Kubernetes) and basic ability to read and understand Terraform
Previous experience in a 24/7 operations environment with hands-on use of PagerDuty or similar alerting systems
Excellent written and verbal communication skills in English with ability to provide clear status updates during high-pressure incidents
Detail-oriented with strong documentation skills and ability to work effectively across multiple teams in a fully remote, global environment
Bachelor's degree in Computer Science or related field, OR equivalent practical experience demonstrating technical aptitude and problem-solving abilities
Bonus Skills to Stand Out (Optional):
Familiarity with web application servers (NGINX), databases (MySQL, PostgreSQL), caching technologies (Redis, Memcached), and Infrastructure as Code tools (Terraform)
Ability to write (not just read) Bash/Python scripts, understanding of CI/CD concepts (GitHub Actions, ArgoCD), and experience with queue technologies (SQS) or application performance monitoring best practices
