DevOps Engineer (AI Inference)
Apply for this position → Go ad-free with PremiumCompany Description
This position is available only under an employment (labor) agreement.
The world’s digital experiences run on something invisible: the infrastructure and software that keep them fast, reliable, and secure. At Gcore, you’ll help design and deliver that foundation for an AI-driven world.
We’re a global provider of infrastructure and software solutions for AI, cloud, network, and security, powering everything from real-time communication and streaming to enterprise AI and secure web applications. With 210+ edge locations, 50+ cloud regions, and thousands of GPUs, your work here can reach users and businesses across the globe.
You’ll collaborate with leading technology partners such as Intel, NVIDIA, Dell, and Equinix, and work on platforms that power digital products used around the world. Our vision is simple: to connect the world to AI, anywhere, anytime.
Want to work on technology that goes beyond a single product or industry? Join a global team of 550+ professionals building infrastructure and software that supports the entire digital ecosystem. We are looking for a talented DevOps Engineer to join our AI Inference Operations Team.
Job Description
As a DevOps Engineer, you will be responsible for designing, deploying, and maintaining infrastructure and services that enable scalable and secure AI inference workloads on-premises.
What You Will Do
Design, develop, and maintain infrastructure for AI inference workloads, including GPU scheduling, model deployment pipelines, and data access patterns in on-prem environments
Build and manage monitoring and observability tools for AI inference platforms, including dashboards, alerts, and runbooks for model health and system performance
Collaborate with ML engineers and platform teams to design system architecture for AI workloads, integrate inference runtimes, and test performance at scale
Qualifications
What We're Looking For
Hands-on experience deploying, operating, and troubleshooting Kubernetes clusters, including Helm, Docker, or CRI-O.
Strong understanding of Linux systems and networking concepts, including troubleshooting connectivity and performance issues.
Ability to develop automation and operational tooling using Python, Go, or Bash.
Experience provisioning and managing infrastructure with tools such as Terraform and Ansible.
Experience designing, implementing, and maintaining CI/CD pipelines using GitLab CI or GitHub Actions.
Preferred Qualifications
Experience operating or administering Slurm clusters.
Experience with Cluster API (CAPI) or other Kubernetes cluster lifecycle management ('Kubeception') technologies.
Deep understanding of Kubernetes internals, including CNI, CSI, Operators, and cluster architecture.
Nice to Have
Experience with Kubernetes ecosystem tools such as Argo CD and Helmfile.
Experience with Prometheus.
Familiarity with other Cloud Native technologies
Additional Information
Benefits
At Gcore, we want you to do your best work and enjoy the journey. Our benefits are designed to support your growth, well-being, and life beyond work:
Competitive compensation
Flexible working hours and hybrid or remote options, depending on your role
Work from anywhere in the world for up to 45 days per year
Private medical insurance for you and your family*
Extra paid vacation and sick leave days*
Support for life’s important moments and celebrations
Language courses to help you connect and grow
Modern, welcoming offices with snacks, drinks, and entertainment*
Team sports and social activities*
*Benefits may vary depending on your location.
Equal Opportunity Employer
We provide equal opportunity to all applicants without regard to race, color, religion, sex, sexual orientation, age, gender identity, gender expression, national origin, disability, or any other legally protected characteristics.
Similar Jobs
Site Reliability Engineer
Dropbox · Poland
Senior AI Compute Infrastructure Engineer
Kraken · Argentina,Brazil,Bulgaria,Canada,Costa Rica,Cyprus,Czechia,Estonia,Hungary,Ireland,Latvia,Lithuania,Mexico,Panama,Peru,Poland,Portugal,Romania,Slovenia,South Africa,Spain,UK
DevOps Engineer
Gcore · Poland,Serbia,Cyprus,Georgia
Site Reliability Engineer IV
OpenX · Poland
Principal Support Engineer (L3, Edge Cloud)
Gcore · Poland,Serbia,Cyprus,Georgia
DevOps Engineer (AI Inference)
Company Description
This position is available only under an employment (labor) agreement.
The world’s digital experiences run on something invisible: the infrastructure and software that keep them fast, reliable, and secure. At Gcore, you’ll help design and deliver that foundation for an AI-driven world.
We’re a global provider of infrastructure and software solutions for AI, cloud, network, and security, powering everything from real-time communication and streaming to enterprise AI and secure web applications. With 210+ edge locations, 50+ cloud regions, and thousands of GPUs, your work here can reach users and businesses across the globe.
You’ll collaborate with leading technology partners such as Intel, NVIDIA, Dell, and Equinix, and work on platforms that power digital products used around the world. Our vision is simple: to connect the world to AI, anywhere, anytime.
Want to work on technology that goes beyond a single product or industry? Join a global team of 550+ professionals building infrastructure and software that supports the entire digital ecosystem. We are looking for a talented DevOps Engineer to join our AI Inference Operations Team.
Job Description
As a DevOps Engineer, you will be responsible for designing, deploying, and maintaining infrastructure and services that enable scalable and secure AI inference workloads on-premises.
What You Will Do
Design, develop, and maintain infrastructure for AI inference workloads, including GPU scheduling, model deployment pipelines, and data access patterns in on-prem environments
Build and manage monitoring and observability tools for AI inference platforms, including dashboards, alerts, and runbooks for model health and system performance
Collaborate with ML engineers and platform teams to design system architecture for AI workloads, integrate inference runtimes, and test performance at scale
Qualifications
What We're Looking For
Hands-on experience deploying, operating, and troubleshooting Kubernetes clusters, including Helm, Docker, or CRI-O.
Strong understanding of Linux systems and networking concepts, including troubleshooting connectivity and performance issues.
Ability to develop automation and operational tooling using Python, Go, or Bash.
Experience provisioning and managing infrastructure with tools such as Terraform and Ansible.
Experience designing, implementing, and maintaining CI/CD pipelines using GitLab CI or GitHub Actions.
Preferred Qualifications
Experience operating or administering Slurm clusters.
Experience with Cluster API (CAPI) or other Kubernetes cluster lifecycle management ('Kubeception') technologies.
Deep understanding of Kubernetes internals, including CNI, CSI, Operators, and cluster architecture.
Nice to Have
Experience with Kubernetes ecosystem tools such as Argo CD and Helmfile.
Experience with Prometheus.
Familiarity with other Cloud Native technologies
Additional Information
Benefits
At Gcore, we want you to do your best work and enjoy the journey. Our benefits are designed to support your growth, well-being, and life beyond work:
Competitive compensation
Flexible working hours and hybrid or remote options, depending on your role
Work from anywhere in the world for up to 45 days per year
Private medical insurance for you and your family*
Extra paid vacation and sick leave days*
Support for life’s important moments and celebrations
Language courses to help you connect and grow
Modern, welcoming offices with snacks, drinks, and entertainment*
Team sports and social activities*
*Benefits may vary depending on your location.
Equal Opportunity Employer
We provide equal opportunity to all applicants without regard to race, color, religion, sex, sexual orientation, age, gender identity, gender expression, national origin, disability, or any other legally protected characteristics.
Similar Jobs
Site Reliability Engineer
Dropbox · Poland
Senior AI Compute Infrastructure Engineer
Kraken · Argentina,Brazil,Bulgaria,Canada,Costa Rica,Cyprus,Czechia,Estonia,Hungary,Ireland,Latvia,Lithuania,Mexico,Panama,Peru,Poland,Portugal,Romania,Slovenia,South Africa,Spain,UK
DevOps Engineer
Gcore · Poland,Serbia,Cyprus,Georgia
Site Reliability Engineer IV
OpenX · Poland
Principal Support Engineer (L3, Edge Cloud)
Gcore · Poland,Serbia,Cyprus,Georgia