Senior Researcher
About the Role
We are seeking a highly capable Senior Researcher to push the boundaries of streaming speech recognition and speech understanding. This role sits at the intersection of cutting-edge research and production impact—you'll develop novel architectures and algorithms that ship to millions of users.
The ideal candidate has deep expertise in both automatic speech recognition (ASR) and large language models, with strong foundations in streaming/online inference, neural transducers (RNN-T, CTC), and modern attention-based architectures. You're equally comfortable reading papers on speech language models and debugging distributed training runs.
This is a unique opportunity to shape the next generation of Speech AI at a company experiencing rapid growth in one of the most dynamic fields in AI. You'll join the team developing Universal-Streaming—our production streaming ASR system—while exploring the frontier of LLM-based contextualization and real-time speech understanding. You'll work closely with research engineers and engineers to drive your research from prototype to production, ensuring novel ideas translate into real customer impact.
What You’ll Do
Design and develop novel streaming ASR architectures, pushing the boundaries of accuracy-latency tradeoffs in production systems.
Research and prototype LLM-assisted speech-to-text —exploring how large language models can enhance streaming speech recognition and understanding.
Develop new algorithms for streaming speaker diarization, contextual biasing, and multilingual speech recognition.
Drive research from initial experimentation through rigorous evaluation to production deployment, working closely with engineering teams.
Conduct systematic experiments on internal and public benchmarks, with careful attention to evaluation methodology and statistical rigor.
Contribute to technical publications and represent AssemblyAI's research at top venues.
Collaborate on research direction and technical strategy, helping shape the roadmap for Speech AI capabilities.
What You’ll Need
Strong research background in speech recognition, with deep understanding of classic streaming architectures (RNN-T, CTC) and modern attention mechanisms.
Expertise in LLMs and language modeling, with ability to bridge speech and language model research.
Proficiency in PyTorch and JAX/Flax—you can move fluidly between frameworks and implement complex architectures from scratch.
Experience with large-scale distributed training and the practical challenges of scaling speech models.
Track record of publications at top venues (ICASSP, Interspeech, NeurIPS, ICML) or equivalent industry impact.
Strong experimental methodology—systematic approach to ablations, careful baseline comparisons, and rigorous evaluation.
Ability to translate research insights into production-ready solutions; comfort working at the interface of research and engineering.
Excellent communication skills—can articulate complex technical ideas clearly and collaborate effectively across teams.
Pay Transparency:
AssemblyAI strives to recruit and retain exceptional talent from diverse backgrounds while ensuring pay equity for our team. Our salary ranges are based on paying competitively for our size, stage, and industry, and are one part of many compensation, benefit, and other reward opportunities we provide.
There are many factors that go into salary determinations, including relevant experience, skill level, qualifications assessed during the interview process, and maintaining internal equity with peers on the team. The range shared below is a general expectation for the function as posted, but we are also open to considering candidates who may be more or less experienced than outlined in the job description. In this case, we will communicate any updates in the expected salary range.
The provided range is the expected salary for candidates in the U.S. Outside of those regions, there may be a change in the range which will be communicated to candidates throughout the interview process.
Salary range: $210,000 - $309,000
Senior Researcher
About the Role
We are seeking a highly capable Senior Researcher to push the boundaries of streaming speech recognition and speech understanding. This role sits at the intersection of cutting-edge research and production impact—you'll develop novel architectures and algorithms that ship to millions of users.
The ideal candidate has deep expertise in both automatic speech recognition (ASR) and large language models, with strong foundations in streaming/online inference, neural transducers (RNN-T, CTC), and modern attention-based architectures. You're equally comfortable reading papers on speech language models and debugging distributed training runs.
This is a unique opportunity to shape the next generation of Speech AI at a company experiencing rapid growth in one of the most dynamic fields in AI. You'll join the team developing Universal-Streaming—our production streaming ASR system—while exploring the frontier of LLM-based contextualization and real-time speech understanding. You'll work closely with research engineers and engineers to drive your research from prototype to production, ensuring novel ideas translate into real customer impact.
What You’ll Do
Design and develop novel streaming ASR architectures, pushing the boundaries of accuracy-latency tradeoffs in production systems.
Research and prototype LLM-assisted speech-to-text —exploring how large language models can enhance streaming speech recognition and understanding.
Develop new algorithms for streaming speaker diarization, contextual biasing, and multilingual speech recognition.
Drive research from initial experimentation through rigorous evaluation to production deployment, working closely with engineering teams.
Conduct systematic experiments on internal and public benchmarks, with careful attention to evaluation methodology and statistical rigor.
Contribute to technical publications and represent AssemblyAI's research at top venues.
Collaborate on research direction and technical strategy, helping shape the roadmap for Speech AI capabilities.
What You’ll Need
Strong research background in speech recognition, with deep understanding of classic streaming architectures (RNN-T, CTC) and modern attention mechanisms.
Expertise in LLMs and language modeling, with ability to bridge speech and language model research.
Proficiency in PyTorch and JAX/Flax—you can move fluidly between frameworks and implement complex architectures from scratch.
Experience with large-scale distributed training and the practical challenges of scaling speech models.
Track record of publications at top venues (ICASSP, Interspeech, NeurIPS, ICML) or equivalent industry impact.
Strong experimental methodology—systematic approach to ablations, careful baseline comparisons, and rigorous evaluation.
Ability to translate research insights into production-ready solutions; comfort working at the interface of research and engineering.
Excellent communication skills—can articulate complex technical ideas clearly and collaborate effectively across teams.
Pay Transparency:
AssemblyAI strives to recruit and retain exceptional talent from diverse backgrounds while ensuring pay equity for our team. Our salary ranges are based on paying competitively for our size, stage, and industry, and are one part of many compensation, benefit, and other reward opportunities we provide.
There are many factors that go into salary determinations, including relevant experience, skill level, qualifications assessed during the interview process, and maintaining internal equity with peers on the team. The range shared below is a general expectation for the function as posted, but we are also open to considering candidates who may be more or less experienced than outlined in the job description. In this case, we will communicate any updates in the expected salary range.
The provided range is the expected salary for candidates in the U.S. Outside of those regions, there may be a change in the range which will be communicated to candidates throughout the interview process.
Salary range: $210,000 - $309,000
