MENU
  • Remote Jobs
  • Companies
  • Go Premium
  • Job Alerts
  • Post a Job
  • Log in
  • Sign up
Working Nomads logo Working Nomads
  • Remote Jobs
  • Companies
  • Post Jobs
  • Go Premium
  • Get Free Job Alerts
  • Log in

Data Engineer

Elicit

Full-time
USA
$185k-$305k per year
engineer
python
hadoop
sql
aws
Apply for this position

About Elicit

Elicit is an AI research assistant that uses language models to help professional researchers and high-stakes decision makers break down hard questions, gather evidence from scientific/academic sources, and reason through uncertainty.

What we're aiming for:

  • Elicit radically increases the amount of good reasoning in the world.

    • For experts, Elicit pushes the frontier forward.

    • For non-experts, Elicit makes good reasoning more affordable. People who don't have the tools, expertise, time, or mental energy to make well-reasoned decisions on their own can do so with Elicit.

  • Elicit is a scalable ML system based on human-understandable task decompositions, with supervision of process, not outcomes. This expands our collective understanding of safe AGI architectures.

Visit our Twitter to learn more about how Elicit is helping researchers and making progress on our mission.

Why we're hiring for this role

Two key reasons:

  • Currently, Elicit primarily works over academic papers. One of your key initial responsibilities will be to build a complete corpus of these documents, available as soon as they're published, combining different data sources and ingestion methods.

  • We're actively working to broaden the sorts of tasks you can accomplish in Elicit—necessarily broadening the types of information we can ingest and surface back to users. We need a data engineer to figure out the best way to quickly stand up these new integrations, and transform massive amounts of heterogeneous data in such a way as to make it usable by LLMs.

In general, we're looking for someone who can architect and implement robust, scalable solutions to handle our growing data needs while maintaining high performance and data quality.

Our tech stack

  • Data pipeline: Python, Flyte, Spark

  • Frontend: Next.js, TypeScript, and Tailwind

  • Backend: Node and Python

  • We like static type checking in Python and TypeScript

  • All infrastructure runs in Kubernetes across a couple of clouds

  • We use GitHub for code reviews and CI

Am I a good fit?

Consider the questions:

  • How would you optimize a Spark job that's processing a large amount of data but running slowly?

  • What are the differences between RDD, DataFrame, and Dataset in Spark? When would you use each?

  • How does data partitioning work in distributed systems, and why is it important?

  • How would you implement a data pipeline to handle regular updates from multiple academic paper sources, ensuring efficient deduplication?

If you have a solid answer for these—without reference to documentation—then we should chat!

Location and travel

We have a lovely office in Oakland, CA, but we don't all work from there all the time. It's important to us to spend time with our teammates, however, so we ask that all Elicians spend 1 week out of every 6 with teammates.

  • We have a quarterly team retreat, normally in and around the SF bay area.

  • We have quarterly co-working weeks (offset from the team retreats) in our Oakland office.

  • If you come to the retreats and co-working weeks, you'll meet our expectations for in-person time!

  • There is flexibility around the specifics here: if you're not sure you can make this work, get in touch.

What you'll bring to the role

  • 5+ years of experience as a data engineer building core datasets and supporting business verticals with high data volumes

  • Strong proficiency in Python (5+ years experience)

  • Experience with architecting and optimizing large data pipelines with Spark

  • Strong SQL skills, including understanding of aggregation functions, window functions, UDFs, self-joins, partitioning, and clustering approaches

  • Experience with Parquet file formats and other columnar data storage formats

  • Strong data quality management skills

  • Ability to balance technical expertise with creative problem-solving

  • Excited to interact with product/web app and help ship new features (e.g., real-time updates, advanced filtering options)

  • Experience shipping scalable data solutions in the cloud (e.g., AWS, GCP, Azure), across multiple data stores and methodologies

Nice to Have

  • Familiarity with web crawling technologies and best practices for data extraction from websites

  • Experience in developing deduplication processes for large datasets

  • Hands-on experience with full-text extraction and processing from various document formats (PDF, HTML, XML, etc.)

  • Experience working with academic research databases or scientific literature

  • Familiarity with machine learning concepts and their application in search technologies

  • Experience with distributed computing frameworks beyond Spark (e.g., Dask, Ray)

  • Knowledge of academic publishing processes and metadata standards

  • Hands-on experience with Airflow, DBT, or Hadoop

  • Experience with data lake, data warehouse, and lakehouse paradigms.

What you'll do

You'll own:

  • Building and optimizing our academic research paper pipeline

    • You'll architect and implement robust, scalable solutions to handle our growing data needs while maintaining high performance and data quality.

    • You'll work on efficiently processing, deduplicating, and indexing hundreds of millions of research papers.

    • Your goal will be to make Elicit the most complete and up-to-date database of scholarly sources.

  • Enhancing Elicit's data infrastructure

    • You'll optimize our Spark jobs and data pipelines to handle large amounts of data efficiently.

    • You'll implement data partitioning strategies in our distributed systems to improve performance.

    • You will transform the data space Elicit operates over: from academic papers, to other structured and unstructured documents, to spreadsheets and presentations, to rich media like audio and video.

  • Maintaining and improving data quality

    • You'll implement robust data quality management processes to ensure the accuracy and reliability of our academic database.

    • You'll work on developing defenses against unexpected changes from publishers to maintain data integrity.

Your first week:

  • Start building foundational context

    • Get to know your team, our stack (including Python, Flyte, and Spark), and the product roadmap.

    • Familiarize yourself with our current data pipeline architecture and identify areas for potential improvement.

  • Make your first contribution to Elicit

    • Complete your first Linear issue related to our data pipeline or academic paper processing.

    • Have a PR merged into our monorepo, demonstrating your understanding of our development workflow.

    • Gain understanding of our CI/CD pipeline, monitoring, and logging tools specific to our data infrastructure.

Your first month:

  • You'll complete your first multi-issue project

    • Tackle a significant data pipeline optimization or enhancement project.

    • Collaborate with the team to implement improvements in our academic paper processing workflow.

  • You're actively improving the team

    • Contribute to regular team meetings and hack days, sharing insights from your data engineering expertise.

    • Add documentation or diagrams explaining our data pipeline architecture and best practices.

    • Suggest improvements to our data processing and storage methodologies.

Your first quarter:

  • You're flying solo

    • Independently implement significant enhancements to our data pipeline, improving efficiency and scalability.

    • Make impactful decisions regarding our data architecture and processing strategies.

  • You've developed an area of expertise

    • Become the go-to resource for questions related to our academic paper processing pipeline and data infrastructure.

    • Lead discussions on optimizing our data storage and retrieval processes for academic literature.

  • You actively research and improve the product

    • Propose and scope improvements to make Elicit more comprehensive and up-to-date in terms of scholarly sources.

    • Identify and implement technical improvements to surpass competitors like Google Scholar in terms of coverage and data quality.

Who you'll work with

This role will report directly to James, our Head of Engineering, and work very closely with the rest of the engineering team:

  • Luke (AI engineer, full-stack)

  • Panda (AI engineer, infra)

  • Justin (ML Engineer)

You'll also spend a lot of time collaborating with Kevin (Head of Product), and co-founders Jungwon & Andreas.

Compensation, benefits, and perks

In addition to working on important problems as part of a productive and positive team, we also offer great benefits (with some variation based on location):

  • Flexible work environment: work from our office in Oakland or remotely with time zone overlap (between GMT and GMT-8), as long as you can travel for in-person retreats and coworking events

  • Fully covered health, dental, vision, and life insurance for you, generous coverage for the rest of your family

  • Flexible vacation policy, with a minimum recommendation of 20 days/year + company holidays

  • 401K with a 6% employer match

  • A new Mac + $1,000 budget to set up your workstation or home office in your first year, then $500 every year thereafter

  • $1,000 quarterly AI Experimentation & Learning budget, so you can freely experiment with new AI tools to incorporate into your workflow, take courses, purchase educational resources, or attend AI-focused conferences and events

  • A team administrative assistant who can help you with personal and work tasks

  • You can find more reasons to work with us in this thread!

For all roles at Elicit, we use a data-backed compensation framework to keep salaries market-competitive, equitable, and simple to understand. For this role, we target starting ranges of:

  • Senior (L4): $185-270k + equity

  • Expert (L5): $215-305k + equity

  • Principal (L6): >$260 + significant equity

We're optimizing for a hire who can contribute at a L4/senior-level or above.

We also offer above-market equity for all roles at Elicit, as well as employee-friendly equity terms (10-year exercise periods).

Apply for this position
Bookmark Report

About the job

Full-time
USA
$185k-$305k per year
21 Applicants
Posted 6 days ago
engineer
python
hadoop
sql
aws

Apply for this position

Bookmark
Report
Enhancv advertisement

30,000+
REMOTE JOBS

Unlock access to our database and
kickstart your remote career
Join Premium

Data Engineer

Elicit

About Elicit

Elicit is an AI research assistant that uses language models to help professional researchers and high-stakes decision makers break down hard questions, gather evidence from scientific/academic sources, and reason through uncertainty.

What we're aiming for:

  • Elicit radically increases the amount of good reasoning in the world.

    • For experts, Elicit pushes the frontier forward.

    • For non-experts, Elicit makes good reasoning more affordable. People who don't have the tools, expertise, time, or mental energy to make well-reasoned decisions on their own can do so with Elicit.

  • Elicit is a scalable ML system based on human-understandable task decompositions, with supervision of process, not outcomes. This expands our collective understanding of safe AGI architectures.

Visit our Twitter to learn more about how Elicit is helping researchers and making progress on our mission.

Why we're hiring for this role

Two key reasons:

  • Currently, Elicit primarily works over academic papers. One of your key initial responsibilities will be to build a complete corpus of these documents, available as soon as they're published, combining different data sources and ingestion methods.

  • We're actively working to broaden the sorts of tasks you can accomplish in Elicit—necessarily broadening the types of information we can ingest and surface back to users. We need a data engineer to figure out the best way to quickly stand up these new integrations, and transform massive amounts of heterogeneous data in such a way as to make it usable by LLMs.

In general, we're looking for someone who can architect and implement robust, scalable solutions to handle our growing data needs while maintaining high performance and data quality.

Our tech stack

  • Data pipeline: Python, Flyte, Spark

  • Frontend: Next.js, TypeScript, and Tailwind

  • Backend: Node and Python

  • We like static type checking in Python and TypeScript

  • All infrastructure runs in Kubernetes across a couple of clouds

  • We use GitHub for code reviews and CI

Am I a good fit?

Consider the questions:

  • How would you optimize a Spark job that's processing a large amount of data but running slowly?

  • What are the differences between RDD, DataFrame, and Dataset in Spark? When would you use each?

  • How does data partitioning work in distributed systems, and why is it important?

  • How would you implement a data pipeline to handle regular updates from multiple academic paper sources, ensuring efficient deduplication?

If you have a solid answer for these—without reference to documentation—then we should chat!

Location and travel

We have a lovely office in Oakland, CA, but we don't all work from there all the time. It's important to us to spend time with our teammates, however, so we ask that all Elicians spend 1 week out of every 6 with teammates.

  • We have a quarterly team retreat, normally in and around the SF bay area.

  • We have quarterly co-working weeks (offset from the team retreats) in our Oakland office.

  • If you come to the retreats and co-working weeks, you'll meet our expectations for in-person time!

  • There is flexibility around the specifics here: if you're not sure you can make this work, get in touch.

What you'll bring to the role

  • 5+ years of experience as a data engineer building core datasets and supporting business verticals with high data volumes

  • Strong proficiency in Python (5+ years experience)

  • Experience with architecting and optimizing large data pipelines with Spark

  • Strong SQL skills, including understanding of aggregation functions, window functions, UDFs, self-joins, partitioning, and clustering approaches

  • Experience with Parquet file formats and other columnar data storage formats

  • Strong data quality management skills

  • Ability to balance technical expertise with creative problem-solving

  • Excited to interact with product/web app and help ship new features (e.g., real-time updates, advanced filtering options)

  • Experience shipping scalable data solutions in the cloud (e.g., AWS, GCP, Azure), across multiple data stores and methodologies

Nice to Have

  • Familiarity with web crawling technologies and best practices for data extraction from websites

  • Experience in developing deduplication processes for large datasets

  • Hands-on experience with full-text extraction and processing from various document formats (PDF, HTML, XML, etc.)

  • Experience working with academic research databases or scientific literature

  • Familiarity with machine learning concepts and their application in search technologies

  • Experience with distributed computing frameworks beyond Spark (e.g., Dask, Ray)

  • Knowledge of academic publishing processes and metadata standards

  • Hands-on experience with Airflow, DBT, or Hadoop

  • Experience with data lake, data warehouse, and lakehouse paradigms.

What you'll do

You'll own:

  • Building and optimizing our academic research paper pipeline

    • You'll architect and implement robust, scalable solutions to handle our growing data needs while maintaining high performance and data quality.

    • You'll work on efficiently processing, deduplicating, and indexing hundreds of millions of research papers.

    • Your goal will be to make Elicit the most complete and up-to-date database of scholarly sources.

  • Enhancing Elicit's data infrastructure

    • You'll optimize our Spark jobs and data pipelines to handle large amounts of data efficiently.

    • You'll implement data partitioning strategies in our distributed systems to improve performance.

    • You will transform the data space Elicit operates over: from academic papers, to other structured and unstructured documents, to spreadsheets and presentations, to rich media like audio and video.

  • Maintaining and improving data quality

    • You'll implement robust data quality management processes to ensure the accuracy and reliability of our academic database.

    • You'll work on developing defenses against unexpected changes from publishers to maintain data integrity.

Your first week:

  • Start building foundational context

    • Get to know your team, our stack (including Python, Flyte, and Spark), and the product roadmap.

    • Familiarize yourself with our current data pipeline architecture and identify areas for potential improvement.

  • Make your first contribution to Elicit

    • Complete your first Linear issue related to our data pipeline or academic paper processing.

    • Have a PR merged into our monorepo, demonstrating your understanding of our development workflow.

    • Gain understanding of our CI/CD pipeline, monitoring, and logging tools specific to our data infrastructure.

Your first month:

  • You'll complete your first multi-issue project

    • Tackle a significant data pipeline optimization or enhancement project.

    • Collaborate with the team to implement improvements in our academic paper processing workflow.

  • You're actively improving the team

    • Contribute to regular team meetings and hack days, sharing insights from your data engineering expertise.

    • Add documentation or diagrams explaining our data pipeline architecture and best practices.

    • Suggest improvements to our data processing and storage methodologies.

Your first quarter:

  • You're flying solo

    • Independently implement significant enhancements to our data pipeline, improving efficiency and scalability.

    • Make impactful decisions regarding our data architecture and processing strategies.

  • You've developed an area of expertise

    • Become the go-to resource for questions related to our academic paper processing pipeline and data infrastructure.

    • Lead discussions on optimizing our data storage and retrieval processes for academic literature.

  • You actively research and improve the product

    • Propose and scope improvements to make Elicit more comprehensive and up-to-date in terms of scholarly sources.

    • Identify and implement technical improvements to surpass competitors like Google Scholar in terms of coverage and data quality.

Who you'll work with

This role will report directly to James, our Head of Engineering, and work very closely with the rest of the engineering team:

  • Luke (AI engineer, full-stack)

  • Panda (AI engineer, infra)

  • Justin (ML Engineer)

You'll also spend a lot of time collaborating with Kevin (Head of Product), and co-founders Jungwon & Andreas.

Compensation, benefits, and perks

In addition to working on important problems as part of a productive and positive team, we also offer great benefits (with some variation based on location):

  • Flexible work environment: work from our office in Oakland or remotely with time zone overlap (between GMT and GMT-8), as long as you can travel for in-person retreats and coworking events

  • Fully covered health, dental, vision, and life insurance for you, generous coverage for the rest of your family

  • Flexible vacation policy, with a minimum recommendation of 20 days/year + company holidays

  • 401K with a 6% employer match

  • A new Mac + $1,000 budget to set up your workstation or home office in your first year, then $500 every year thereafter

  • $1,000 quarterly AI Experimentation & Learning budget, so you can freely experiment with new AI tools to incorporate into your workflow, take courses, purchase educational resources, or attend AI-focused conferences and events

  • A team administrative assistant who can help you with personal and work tasks

  • You can find more reasons to work with us in this thread!

For all roles at Elicit, we use a data-backed compensation framework to keep salaries market-competitive, equitable, and simple to understand. For this role, we target starting ranges of:

  • Senior (L4): $185-270k + equity

  • Expert (L5): $215-305k + equity

  • Principal (L6): >$260 + significant equity

We're optimizing for a hire who can contribute at a L4/senior-level or above.

We also offer above-market equity for all roles at Elicit, as well as employee-friendly equity terms (10-year exercise periods).

Working Nomads

Post Jobs
Premium Subscription
Sponsorship
Free Job Alerts

Job Skills
API
FAQ
Privacy policy
Terms and conditions
Contact us
About us

Jobs by Category

Remote Administration jobs
Remote Consulting jobs
Remote Customer Success jobs
Remote Development jobs
Remote Design jobs
Remote Education jobs
Remote Finance jobs
Remote Legal jobs
Remote Healthcare jobs
Remote Human Resources jobs
Remote Management jobs
Remote Marketing jobs
Remote Sales jobs
Remote System Administration jobs
Remote Writing jobs

Jobs by Position Type

Remote Full-time jobs
Remote Part-time jobs
Remote Contract jobs

Jobs by Region

Remote jobs Anywhere
Remote jobs North America
Remote jobs Latin America
Remote jobs Europe
Remote jobs Middle East
Remote jobs Africa
Remote jobs APAC

Jobs by Skill

Remote Accounting jobs
Remote Assistant jobs
Remote Copywriting jobs
Remote Cyber Security jobs
Remote Data Analyst jobs
Remote Data Entry jobs
Remote English jobs
Remote Spanish jobs
Remote Project Management jobs
Remote QA jobs
Remote SEO jobs

Jobs by Country

Remote jobs Australia
Remote jobs Argentina
Remote jobs Brazil
Remote jobs Canada
Remote jobs Colombia
Remote jobs France
Remote jobs Germany
Remote jobs Ireland
Remote jobs India
Remote jobs Japan
Remote jobs Mexico
Remote jobs Netherlands
Remote jobs New Zealand
Remote jobs Philippines
Remote jobs Poland
Remote jobs Portugal
Remote jobs Singapore
Remote jobs Spain
Remote jobs UK
Remote jobs USA


Working Nomads curates remote digital jobs from around the web.

© 2025 Working Nomads.