Full Time
500-2500
TBD
Mar 23, 2026
About the role
We are looking for an AI Engineer to build, train, and optimize high-performance local models. This isn’t a "prompt engineering" or API-wrapping role—you’ll be working under the hood with fine-tuning, LoRAs, and deploying models directly on GPU hardware. Your goal is to take raw open-source weights and turn them into specialized, production-ready engines that are fast, efficient, and private.
What you’ll do
Specialized Training: Execute fine-tuning on local LLMs (LoRA, QLoRA, and full fine-tuning) to adapt models for specific domain tasks.
Model Optimization: Work with the latest open-source architectures (Qwen, Llama, Mistral) to maximize their utility in real-world applications.
Data & Pipelines: Build the "factory" for our models—handling everything from data scraping and cleaning to automated training and evaluation loops.
Inference Engineering: Optimize for speed. You’ll implement quantization, batching, and high-efficiency runtimes (vLLM/TGI) to keep latency low.
On-Prem Deployment: Manage model stability on local GPU servers and multi-GPU setups, ensuring high availability for internal services.
Rapid R&D: Stay at the bleeding edge—test new papers, quantization methods, and training techniques as soon as they hit GitHub.
What we’re looking for
Core Python: Deep experience building the scripts and tooling that power AI pipelines.
Fine-Tuning Expertise: Practical experience using PEFT, LoRA, and QLoRA to steer model behavior.
Local Execution: You’ve spent significant time running models on your own hardware or dedicated instances (not just hitting an OpenAI endpoint).
Architecture Knowledge: A strong handle on tokenization, context management, and preparing high-quality datasets.
The AI Stack: Fluency in Hugging Face, PyTorch, and the Transformers library.
Hardware Proficiency: Experience navigating CUDA, managing VRAM constraints, and orchestrating multi-GPU environments.
Problem Solver: You can debug a failing training run and understand why a model is "hallucinating" or underperforming.
Nice to have
Generative Media: Experience with ComfyUI, Stable Diffusion, or video generation workflows.
Edge Inference: Knowledge of GGUF, AWQ, or running models via engines like Ollama.
Scale: Experience with distributed training or managing massive, multi-terabyte datasets.
Agents: A background in building autonomous AI agents or internal productivity tools.
Top 3 Skills to select on OnlineJobs for this role:
Python (Absolute must-have for the codebase).
Machine Learning (To capture the general AI talent pool).
PyTorch (This filters for people who understand the underlying math/frameworks).