Research & DevelopmentClosed

AI Safety / Evaluation Engineer

"How would this break — and how do we prove it doesn't?"

Why this role matters

Whether Hilbi IQ stays a limited-risk tool (and not a regulated medical device) depends on it never crossing into clinical decision-making — you own the evidence that it doesn't.

What you'll work on

  • Eval suites for non-deterministic outputs
  • Guardrails and adversarial testing
  • AI-transparency checks
  • Jurisdiction-aware output rules

Skills needed

  • You evaluate or red-team LLM systems
  • You turn fuzzy requirements into testable criteria
  • Interested in AI safety and the EU AI Act

Valuable extras

Bias/fairness work

What we evaluate

  • Creativity of adversarial cases
  • Understanding of the decision-support vs decision-making line
  • Rigor of pass/fail criteria

The assignment

Design an eval suite proving IQ doesn't make clinical decisions, including adversarial cases.

Full brief is shared after a short intro call.

This role is currently closed — we've consolidated it into adjacent openings. Browse the other roles on the careers page or reach out to careers@hilbi.com.