About

I build trustworthy AI systems where mistakes are expensive: finance and health. The goal is not “AI adoption” — it is reliable decision support embedded in real workflows.

Focus 01
Finance (investment + market operations)

LLM systems that stay grounded under uncertainty, and monitoring that surfaces failures early.

What I build
  • Grounded RAG for research and memo workflows (claim-first retrieval + citations).
  • Evaluation and monitoring loops: router audits, context capture, and feedback-driven iteration.
  • Workflow-first delivery: clear interfaces, rollback plans, and reliability targets.
Focus 02
Health (research classification + evidence workflows)

High-agreement AI that respects expert time — precision-first systems that know when not to decide.

What I build
  • LLM-assisted classification and triage at scale, with abstention to route ambiguity to experts.
  • Grounding and traceability: evidence-linked outputs, uncertainty handling, and failure analysis.
  • Human-centered design: align system outputs with accountability and clinical/research decision points.

Research interests

  • LLM agent evaluation in mixed-motive and cooperative settings
  • Governance, explainability, and participatory design for agentic systems
  • Grounded generation: claim-first retrieval, citation, and uncertainty handling
  • Quality monitoring: feedback loops, evaluation harnesses, and failure analysis

Contact

Open to: research collaborations on trustworthy AI, and speaking on LLM systems in high-stakes finance and health.

Focus
Evaluation
Define failure modes, set targets, and measure reliability in deployment — not just in demos.
Grounded generation
Claim-first retrieval with citations and uncertainty handling for high-stakes decisions.
Participatory AI
Design with the people affected — map workflows, decisions, and accountability.