The problem
Most teams using an LLM in production are leaving accuracy on the table because the prompts they ship were written once, in a rush. The model is capable of more — but the prompt is the ceiling. Existing tooling helps you track prompts. Almost nothing helps you improve them.
The product
Promptomize is a supervised model trained on a corpus of (input, weak prompt, strong prompt, eval-score) tuples. Given a frozen base prompt and a small held-out evaluation set, it rewrites the prompt and reports the projected lift before you push it to production.
How it works
- Ingestion. You provide a base prompt and a small evaluation set (input + golden output).
- Rewriting. The Promptomize model produces multiple candidate rewrites with diverse strategies — chain-of-thought, role priming, format scaffolding, decomposition.
- Evaluation. Each candidate is scored against the held-out set on the target backend (GPT, Claude, Gemini, open-weight).
- Selection. The best variant is returned with a confidence interval and a delta against the baseline.
Stack
- Training: PyTorch, Lightning, DeepSpeed on multi-GPU.
- Backend: Modal for GPU rewrite endpoints, Postgres for tuples and evals, Redis for the rewrite cache.
- Frontend: Next.js 15 App Router, React Server Components, Tailwind, shadcn/ui.
- Eval harness: a custom multi-backend runner with deterministic sampling and per-task scoring.
Status
Promptomize is in private beta. The hosted product is closed-loop with a self-hosted CLI in the works for teams that need to keep prompts and evals on-prem.
