AI/ML engineer & researcher building trustworthy LLM systems (evaluation, grounded generation, and participatory design).
I build trustworthy LLM systems for high-stakes finance and health: grounded generation, measurable evaluation, and participatory design.
Finance: 80% faster incident detection Health: 200k records Health: 93% agreement
What I focus on
I build trustworthy AI systems where mistakes are expensive: finance and health. The goal is not “AI adoption” — it is reliable decision support embedded in real workflows.
Focus 01
Finance (investment + market operations)
LLM systems that stay grounded under uncertainty, and monitoring that surfaces failures early.
What I build
- Grounded RAG for research and memo workflows (claim-first retrieval + citations).
- Evaluation and monitoring loops: router audits, context capture, and feedback-driven iteration.
- Workflow-first delivery: clear interfaces, rollback plans, and reliability targets.
Focus 02
Health (research classification + evidence workflows)
High-agreement AI that respects expert time — precision-first systems that know when not to decide.
What I build
- LLM-assisted classification and triage at scale, with abstention to route ambiguity to experts.
- Grounding and traceability: evidence-linked outputs, uncertainty handling, and failure analysis.
- Human-centered design: align system outputs with accountability and clinical/research decision points.
Selected projects / case studies
Industry experience (Finance)
Investment intelligence and market monitoring — grounded generation, evaluation, and reliability loops.
See industry
Research experience (Health + Evaluation)
Medical research classification at scale, agent evaluation, governance, and participatory design.
See research
Writing (Takeaways)
Short essays on what makes AI useful in real organisations — problem-first, workflow-first.
Read takeaways
Research interests
- LLM agent evaluation in mixed-motive and cooperative settings
- Governance, explainability, and participatory design for agentic systems
- Grounded generation: claim-first retrieval, citation, and uncertainty handling
- Quality monitoring: feedback loops, evaluation harnesses, and failure analysis
Open to: research collaborations on trustworthy AI, and speaking on LLM systems in high-stakes finance and health.
◎
Evaluation
Define failure modes, set targets, and measure reliability in deployment — not just in demos.
⊞
Grounded generation
Claim-first retrieval with citations and uncertainty handling for high-stakes decisions.
◈
Participatory AI
Design with the people affected — map workflows, decisions, and accountability.