Research & DevelopmentClosed

AI Safety / Evaluation Engineer

"How would this break — and how do we prove it doesn't?"

Why this role matters

Whether Hilbi IQ stays a limited-risk tool (and not a regulated medical device) depends on it never crossing into clinical decision-making — you own the evidence that it doesn't.

What you'll work on

Eval suites for non-deterministic outputs
Guardrails and adversarial testing
AI-transparency checks
Jurisdiction-aware output rules

Skills needed

You evaluate or red-team LLM systems
You turn fuzzy requirements into testable criteria
Interested in AI safety and the EU AI Act

Valuable extras

Bias/fairness work

What we evaluate

Creativity of adversarial cases
Understanding of the decision-support vs decision-making line
Rigor of pass/fail criteria

The assignment

Design an eval suite proving IQ doesn't make clinical decisions, including adversarial cases.

Full brief is shared after a short intro call.

This role is currently closed — we've consolidated it into adjacent openings. Browse the other roles on the careers page or reach out to careers@hilbi.com.