MENU
  • Remote Jobs
  • Companies
  • Go Premium
  • Job Alerts
  • Post a Job
  • Log in
  • Sign up
Working Nomads logo Working Nomads
  • Remote Jobs
  • Companies
  • Post Jobs
  • Go Premium
  • Get Free Job Alerts
  • Log in

Senior Machine Learning Engineer - Evaluations (Design Generation)

Canva

Full-time
Australia
engineer
machine learning
python
sql
Apply for this position

Company Description

Join the team redefining how the world experiences design.

Hey, g'day, mabuhay, kia ora,你好, hallo, vítejte!

Thanks for stopping by. We know job hunting can be a little time consuming and you're probably keen to find out what's on offer, so we'll get straight to the point. 

Where and how you can work

Our flagship campus is in Sydney, with a second campus in Melbourne and co-working spaces in Brisbane, Perth, Adelaide, and Auckland, NZ. You have flexibility in how and where you work — whether that's from one of our spaces, from home, or a mix of both. This role is remote-friendly within Australia or New Zealand, so you can choose the setup that empowers you and your team to do your best work.

Job Description

About the team:  

Canva's Design Generation systems use AI to create complete designs from text descriptions - turning user intent into layouts, images, typography, and color palettes. At scale, we generate millions of designs monthly, making reliable quality evaluation critical. The Design Generation Platform team (8 engineers) supports Design Generation infrastructure with focus on developer experience tooling, platform orchestration, and self-service capabilities. We own the plumbing that makes Design Generation systems observable, debuggable, and improvable.

Our philosophy: Platform owns orchestration, not application logic. We build reusable infrastructure that scales across Design Generation rather than solving one-off problems. We emphasize collaborative decision-making, evidence-based approaches, and building capabilities that serve researchers and engineers across Design Generation.

About the role:

As the Design Generation Evaluation owner, you'll build the infrastructure that enables quality monitoring across Design Generation. You're the expert who both understands evaluation methodologies deeply AND builds the infrastructure to scale them across the organization.

This role sits at the intersection of three critical areas:

Evaluation Strategy & Expertise: You'll guide Design Generation teams on how to evaluate their systems effectively - which methods work for different scenarios, how to set up robust test sets, when to use LLM-as-Judge vs. 1st Party Quality Models vs. user signals, and how to balance cost with detection speed. You'll establish evaluation best practices and patterns that teams can adopt.

Infrastructure & Scale: You'll build the platforms that make evaluation accessible and automated - alerting systems, continuous monitoring, production sampling, step-level harnesses. Your goal is enabling teams to run evaluations effortlessly, eventually integrating evaluation checks into Continuous Deployment so quality gates happen automatically.

Ecosystem Integration: Canva has multiple evaluation tools. You'll articulate how these pieces fit together for Design Generation, define clear integration points, and build the connective tissue that makes the evaluation story coherent rather than fragmented.

The challenge: Quality degradation in generative systems is subtle. Designs look slightly off-brand, layouts don't quite work, users stop publishing without clear signals. You'll need deep ML intuition to build detection systems that catch real issues while minimizing false positives. This role requires navigating ambiguity, making pragmatic architecture decisions, and building infrastructure that serves diverse evaluation needs across research and engineering.

What you’ll do (responsibilities)

  • Understand and optimize existing evaluation systems including LLM-as-Judge frameworks, visual quality models , and multi-dimensional scoring approaches - analyzing their strengths, limitations, and trade-offs to identify gaps in coverage and opportunities for improvement across brand adherence, visual appeal, layout quality, and functional correctness

  • Design and implement automated evaluation pipelines that score generated designs at scale, balancing accuracy with computational cost

  • Define evaluation strategies for different scenarios: pre-deployment validation, continuous monitoring, A/B experiment analysis, and model comparison

  • Curate high-quality evaluation datasets and benchmark suites that represent diverse use cases, edge cases, and quality dimensions

  • Integrate evaluation systems into continuous deployment pipelines, creating automated quality gates that catch regressions before production

  • Reduce evaluation cycle time to enable teams to iterate faster on model improvements and launch experiments earlier

  • Partner with research teams to understand evaluation needs for new model architectures and capabilities

  • Define the evaluation ecosystem strategy: how different evaluation tools and methods compose together for Design Generation

  • Guide teams on evaluation best practices, appropriate methodologies for their use cases, and interpretation of results

What we're looking for

  • Strong ML engineering fundamentals with experience building and maintaining production ML systems at scale

  • Proven ability to build robust, scalable infrastructure (not just models) - you're a platform engineer who speaks ML

  • Deep understanding of distributed systems, observability patterns, and monitoring best practices

  • Python proficiency with production-quality coding standards, code reviews, and testing practices

  • Experience with data pipelines, time-series data, and statistical analysis for detecting anomalies

  • SQL fluency for querying and analyzing large datasets across data warehouse and analytics systems

  • Track record of building self-service platforms or developer tooling that gets adoption

  • Excellent collaboration skills - this role requires working across teams to understand needs and deliver solutions

  • Experience with evaluation of Gen AI systems at scale (even better if that’s evaluation of systems with creative outputs!)

Additional Information

Don't tick all the boxes? Don't worry about that - nobody does!  We’d still love to hear from you! At Canva, we know that great engineers come from a variety of backgrounds, and we value passion, curiosity, and a willingness to learn just as much as specific experience. If you're excited about this role but don’t tick every box, we encourage you to apply, you might a great fit in ways you didn’t expect!

What's in it for you?

Achieving our crazy big goals motivates us to work hard - and we do - but you'll experience lots of moments of magic, connectivity and fun woven throughout life at Canva, too. We also offer a stack of benefits to set you up for every success in and outside of work.

Here's a taste of what's on offer:

  • Equity packages - we want our success to be yours too

  • Inclusive parental leave policy that supports all parents & carers

  • An annual Vibe & Thrive allowance to support your wellbeing, social connection, office setup & more

  • Flexible leave options that empower you to be a force for good, take time to recharge and supports you personally

Check out  lifeatcanva.com  for more info.

Other stuff to know

We make hiring decisions based on your experience, skills and passion, as well as how you can enhance Canva and our culture. When you apply, please tell us the pronouns you use and any reasonable adjustments you may need during the interview process.

Apply for this position
Bookmark Report

About the job

Full-time
Australia
Senior Level
Posted 2 hours ago
engineer
machine learning
python
sql

Apply for this position

Bookmark
Report
Enhancv advertisement

30,000+
REMOTE JOBS

Unlock access to our database and
kickstart your remote career
Join Premium

Senior Machine Learning Engineer - Evaluations (Design Generation)

Canva

Company Description

Join the team redefining how the world experiences design.

Hey, g'day, mabuhay, kia ora,你好, hallo, vítejte!

Thanks for stopping by. We know job hunting can be a little time consuming and you're probably keen to find out what's on offer, so we'll get straight to the point. 

Where and how you can work

Our flagship campus is in Sydney, with a second campus in Melbourne and co-working spaces in Brisbane, Perth, Adelaide, and Auckland, NZ. You have flexibility in how and where you work — whether that's from one of our spaces, from home, or a mix of both. This role is remote-friendly within Australia or New Zealand, so you can choose the setup that empowers you and your team to do your best work.

Job Description

About the team:  

Canva's Design Generation systems use AI to create complete designs from text descriptions - turning user intent into layouts, images, typography, and color palettes. At scale, we generate millions of designs monthly, making reliable quality evaluation critical. The Design Generation Platform team (8 engineers) supports Design Generation infrastructure with focus on developer experience tooling, platform orchestration, and self-service capabilities. We own the plumbing that makes Design Generation systems observable, debuggable, and improvable.

Our philosophy: Platform owns orchestration, not application logic. We build reusable infrastructure that scales across Design Generation rather than solving one-off problems. We emphasize collaborative decision-making, evidence-based approaches, and building capabilities that serve researchers and engineers across Design Generation.

About the role:

As the Design Generation Evaluation owner, you'll build the infrastructure that enables quality monitoring across Design Generation. You're the expert who both understands evaluation methodologies deeply AND builds the infrastructure to scale them across the organization.

This role sits at the intersection of three critical areas:

Evaluation Strategy & Expertise: You'll guide Design Generation teams on how to evaluate their systems effectively - which methods work for different scenarios, how to set up robust test sets, when to use LLM-as-Judge vs. 1st Party Quality Models vs. user signals, and how to balance cost with detection speed. You'll establish evaluation best practices and patterns that teams can adopt.

Infrastructure & Scale: You'll build the platforms that make evaluation accessible and automated - alerting systems, continuous monitoring, production sampling, step-level harnesses. Your goal is enabling teams to run evaluations effortlessly, eventually integrating evaluation checks into Continuous Deployment so quality gates happen automatically.

Ecosystem Integration: Canva has multiple evaluation tools. You'll articulate how these pieces fit together for Design Generation, define clear integration points, and build the connective tissue that makes the evaluation story coherent rather than fragmented.

The challenge: Quality degradation in generative systems is subtle. Designs look slightly off-brand, layouts don't quite work, users stop publishing without clear signals. You'll need deep ML intuition to build detection systems that catch real issues while minimizing false positives. This role requires navigating ambiguity, making pragmatic architecture decisions, and building infrastructure that serves diverse evaluation needs across research and engineering.

What you’ll do (responsibilities)

  • Understand and optimize existing evaluation systems including LLM-as-Judge frameworks, visual quality models , and multi-dimensional scoring approaches - analyzing their strengths, limitations, and trade-offs to identify gaps in coverage and opportunities for improvement across brand adherence, visual appeal, layout quality, and functional correctness

  • Design and implement automated evaluation pipelines that score generated designs at scale, balancing accuracy with computational cost

  • Define evaluation strategies for different scenarios: pre-deployment validation, continuous monitoring, A/B experiment analysis, and model comparison

  • Curate high-quality evaluation datasets and benchmark suites that represent diverse use cases, edge cases, and quality dimensions

  • Integrate evaluation systems into continuous deployment pipelines, creating automated quality gates that catch regressions before production

  • Reduce evaluation cycle time to enable teams to iterate faster on model improvements and launch experiments earlier

  • Partner with research teams to understand evaluation needs for new model architectures and capabilities

  • Define the evaluation ecosystem strategy: how different evaluation tools and methods compose together for Design Generation

  • Guide teams on evaluation best practices, appropriate methodologies for their use cases, and interpretation of results

What we're looking for

  • Strong ML engineering fundamentals with experience building and maintaining production ML systems at scale

  • Proven ability to build robust, scalable infrastructure (not just models) - you're a platform engineer who speaks ML

  • Deep understanding of distributed systems, observability patterns, and monitoring best practices

  • Python proficiency with production-quality coding standards, code reviews, and testing practices

  • Experience with data pipelines, time-series data, and statistical analysis for detecting anomalies

  • SQL fluency for querying and analyzing large datasets across data warehouse and analytics systems

  • Track record of building self-service platforms or developer tooling that gets adoption

  • Excellent collaboration skills - this role requires working across teams to understand needs and deliver solutions

  • Experience with evaluation of Gen AI systems at scale (even better if that’s evaluation of systems with creative outputs!)

Additional Information

Don't tick all the boxes? Don't worry about that - nobody does!  We’d still love to hear from you! At Canva, we know that great engineers come from a variety of backgrounds, and we value passion, curiosity, and a willingness to learn just as much as specific experience. If you're excited about this role but don’t tick every box, we encourage you to apply, you might a great fit in ways you didn’t expect!

What's in it for you?

Achieving our crazy big goals motivates us to work hard - and we do - but you'll experience lots of moments of magic, connectivity and fun woven throughout life at Canva, too. We also offer a stack of benefits to set you up for every success in and outside of work.

Here's a taste of what's on offer:

  • Equity packages - we want our success to be yours too

  • Inclusive parental leave policy that supports all parents & carers

  • An annual Vibe & Thrive allowance to support your wellbeing, social connection, office setup & more

  • Flexible leave options that empower you to be a force for good, take time to recharge and supports you personally

Check out  lifeatcanva.com  for more info.

Other stuff to know

We make hiring decisions based on your experience, skills and passion, as well as how you can enhance Canva and our culture. When you apply, please tell us the pronouns you use and any reasonable adjustments you may need during the interview process.

Working Nomads

Post Jobs
Premium Subscription
Sponsorship
Reviews
Job Alerts

Job Skills
Jobs by Location
API
FAQ
Privacy policy
Terms and conditions
Contact us
About us

Jobs by Category

Remote Administration jobs
Remote Consulting jobs
Remote Customer Success jobs
Remote Development jobs
Remote Design jobs
Remote Education jobs
Remote Finance jobs
Remote Legal jobs
Remote Healthcare jobs
Remote Human Resources jobs
Remote Management jobs
Remote Marketing jobs
Remote Sales jobs
Remote System Administration jobs
Remote Writing jobs

Jobs by Position Type

Remote Full-time jobs
Remote Part-time jobs
Remote Contract jobs

Jobs by Region

Remote jobs Anywhere
Remote jobs North America
Remote jobs Latin America
Remote jobs Europe
Remote jobs Middle East
Remote jobs Africa
Remote jobs APAC

Jobs by Skill

Remote Accounting jobs
Remote Assistant jobs
Remote Copywriting jobs
Remote Cyber Security jobs
Remote Data Analyst jobs
Remote Data Entry jobs
Remote English jobs
Remote Spanish jobs
Remote Project Management jobs
Remote QA jobs
Remote SEO jobs

Jobs by Country

Remote jobs Australia
Remote jobs Argentina
Remote jobs Brazil
Remote jobs Canada
Remote jobs Colombia
Remote jobs France
Remote jobs Germany
Remote jobs Ireland
Remote jobs India
Remote jobs Japan
Remote jobs Mexico
Remote jobs Netherlands
Remote jobs New Zealand
Remote jobs Philippines
Remote jobs Poland
Remote jobs Portugal
Remote jobs Singapore
Remote jobs Spain
Remote jobs UK
Remote jobs USA


Working Nomads curates remote digital jobs from around the web.

© 2025 Working Nomads.