EXP-001 / Decision Safety / Published

Underwriting Decision Safety Lab

A decision-safety lab for underwriting workflows with corrected calibration, abstention policy design, validation, coverage-quality tradeoffs, slice reporting, and a Streamlit review dashboard.

EXP-001 / question

Technical question

Can an underwriting model know when to defer uncertain decisions to human review?

EXP-001 / method

Method and workflow

  1. Validate the input dataset before schema inference and model training.
  2. Train a scikit-learn model that outputs underwriting approval probabilities.
  3. Evaluate probability quality with corrected ECE, Brier score, ROC-AUC, PR-AUC, and baseline metrics.
  4. Generate abstention policies that trade off coverage, auto-decision quality, and human review volume.
  5. Report slice-level review rates, error rates, false approval/rejection rates, and calibration diagnostics.
  6. Expose results through pipeline artifacts and a Streamlit review dashboard.
loan data validation probability model calibration abstention slice report review UI

EXP-001 / evidence

Evidence of work

Calibration

Expected Calibration Error compares predicted approval probability against observed approval rate, with reliability diagrams for inspection.

Decision safety

Coverage curves and policy variants show when the system can automate and when it should defer to review.

Slice reporting

Slice artifacts summarize review rate, auto-decision behavior, error rates, and calibration by applicant bands.

EXP-001 / stack

Technical stack

PythonpandasNumPyscikit-learnStreamlitmatplotlibjoblibunittestGitHub Actions
Open repository ↗

EXP-001 / limitations

Limitations and honesty check

  • The project is a portfolio demo, not a production underwriting system.
  • The included dataset and policies do not certify fairness, compliance, or deployability.
  • Real lending use would require governance, monitoring, audit trails, security controls, and legal review.

EXP-001 / next

Next improvements

  • Add drift monitoring and model-card metadata.
  • Add deeper fairness and subgroup analysis.
  • Add a FastAPI scoring endpoint and Docker deployment path.
  • Connect reviewer feedback to policy and calibration monitoring.