I build trustworthy AI systems where mistakes are expensive: finance and health. The goal is not “AI adoption” — it is reliable decision support embedded in real workflows.
Focus 01
Finance (investment + market operations)
LLM systems that stay grounded under uncertainty, and monitoring that surfaces failures early.
What I build
- Grounded RAG for research and memo workflows (claim-first retrieval + citations).
- Evaluation and monitoring loops: router audits, context capture, and feedback-driven iteration.
- Workflow-first delivery: clear interfaces, rollback plans, and reliability targets.
Focus 02
Health (research classification + evidence workflows)
High-agreement AI that respects expert time — precision-first systems that know when not to decide.
What I build
- LLM-assisted classification and triage at scale, with abstention to route ambiguity to experts.
- Grounding and traceability: evidence-linked outputs, uncertainty handling, and failure analysis.
- Human-centered design: align system outputs with accountability and clinical/research decision points.
Research interests
- LLM agent evaluation in mixed-motive and cooperative settings
- Governance, explainability, and participatory design for agentic systems
- Grounded generation: claim-first retrieval, citation, and uncertainty handling
- Quality monitoring: feedback loops, evaluation harnesses, and failure analysis
Open to: research collaborations on trustworthy AI, and speaking on LLM systems in high-stakes finance and health.
◎
Evaluation
Define failure modes, set targets, and measure reliability in deployment — not just in demos.
⊞
Grounded generation
Claim-first retrieval with citations and uncertainty handling for high-stakes decisions.
◈
Participatory AI
Design with the people affected — map workflows, decisions, and accountability.