Any
$8 per hour
30
Apr 17, 2026
We’re hiring a full-stack LLM engineer to build and scale AI systems for a regulated life-sciences content platform.
This is a hands-on role focused on evaluation systems, retrieval pipelines, observability, and experimentation. You’ll be working across the full stack — from prompt systems and multi-agent orchestration to infrastructure, dashboards, and testing frameworks.
This role is not for surface-level AI users. We’re looking for someone who understands how LLM systems actually work in production — and can build reliable, scalable solutions.
What You’ll Work On
Build and improve LLM evaluation systems (golden sets, regression testing, scoring)
Implement retrieval systems (pgvector, hybrid search, reranking)
Develop multi-agent workflows (chains, retries, structured outputs)
Create observability dashboards (cost, latency, failures)
Run experiments and A/B tests across prompts and models
Ensure accuracy, grounding, and compliance in outputs
Core Requirements
Strong experience with TypeScript / Node.js
Production experience with LLMs (Claude API or similar)
Experience with PostgreSQL / Supabase (RLS)
Familiarity with Edge Functions, async jobs, and deployment (Vercel, Railway)
Strong understanding of prompt engineering, structured outputs, and system design
Experience building or working with evaluation frameworks and testing pipelines
Nice to Have
Experience with tools like Langfuse, Braintrust, Promptfoo
Knowledge of guardrails / safety layers
Familiarity with dashboards (React, Recharts)
This is a long-term role with high ownership. You’ll be building core infrastructure that directly impacts product quality and performance.
Only applicants who applied here would be entertained. Please apply through this link: