Whether Hilbi IQ stays a limited-risk tool (and not a regulated medical device) depends on it never crossing into clinical decision-making — you own the evidence that it doesn't.
Research & DevelopmentClosed
AI Safety / Evaluation Engineer
"How would this break — and how do we prove it doesn't?"
Why this role matters
What you'll work on
- Eval suites for non-deterministic outputs
- Guardrails and adversarial testing
- AI-transparency checks
- Jurisdiction-aware output rules
Skills needed
- You evaluate or red-team LLM systems
- You turn fuzzy requirements into testable criteria
- Interested in AI safety and the EU AI Act
Valuable extras
Bias/fairness work
What we evaluate
- Creativity of adversarial cases
- Understanding of the decision-support vs decision-making line
- Rigor of pass/fail criteria
The assignment
Design an eval suite proving IQ doesn't make clinical decisions, including adversarial cases.
Full brief is shared after a short intro call.
This role is currently closed — we've consolidated it into adjacent openings. Browse the other roles on the careers page or reach out to careers@hilbi.com.
