Machine Learning Engineer — Inference Optimization

mid

via Ashby

About this role

ABOUT THE ROLE We’re looking for a Machine Learning Engineer to own and push the limits of model inference performance at scale. You’ll work at the intersection of research and production—turning cutting-edge models into fast, reliable, and cost-efficient systems that serve real users. This role is ideal for someone who enjoys deep technical work, profiling systems down to the kernel/GPU level, and translating research ideas into production-grade performance gains. WHAT YOU’LL DO - Optimize inference latency, throughput, and cost for large-scale ML models in production - Profile and bottleneck GPU/CPU inference pipelines (memory, kernels, batching, IO) - Implement and tune techniques such as: - Quantization (fp16, bf16, int8, fp8) - KV-cache optimization & reuse…

Read the full description on Featherlessai's site →

What we'd score you on

reqspace match rubric

Five dimensions, recruiter-grade. Upload your resume and we'll generate a written explanation of where you fit and where the gaps are.

1

Skills match

For this role: pytorch

2

Level fit

This role is mid-level. We check your trajectory against it.

3

Domain experience

Your work in the role's domain matters more than your years total. We weight recent and direct experience.

4

Recency

A skill you used last quarter weighs more than one from five years ago. We grade on recency, not lifetime.

5

Location fit

This role is based in a specific location. We weight your proximity and willingness to relocate.

Score yourself on this role.
Free · no card · written explanation included
See if I'm a fit →

Skills in this role

Pulled from the job description. These are the keywords we'll weight when scoring your fit.

pytorch

More at Featherlessai

See all open jobs at Featherlessai