Financial Fraud Risk Engine | honardoust.codes

EXP-002 / question

Technical question

How can fraud-risk scores become analyst-review decisions instead of just model outputs?

EXP-002 / method

Generate synthetic fraud data with overlap, label noise, and class imbalance.
Prepare train/test transaction datasets through a reusable data-prep workflow.
Train a scikit-learn fraud-risk pipeline and evaluate ranking, calibration, and classification metrics.
Search thresholds with cost, recall, precision, and review-capacity tradeoffs.
Score new CSV files with fraud probability, binary flags, and analyst-friendly reason codes.
Support interactive review through a Streamlit dashboard with SHAP-based explanations.

transactions validation fraud model threshold search policy artifacts reason codes dashboard

EXP-002 / evidence

Threshold policy

Policy artifacts compare cost-optimized, high-recall, high-precision, balanced, and review-capacity strategies.

Explainability

Reason codes and SHAP views turn model scores into inspectable analyst review signals.

Validation

Training and scoring inputs fail early when required columns or probability contracts are invalid.

EXP-002 / stack

PythonpandasNumPyscikit-learnSciPySHAPStreamlitmatplotlibjoblibunittestGitHub Actions

Open repository ↗

EXP-002 / limitations

The data is synthetic and does not represent real banking or payment-network behavior.
Reason codes are analyst-facing summaries, not causal explanations.
A real fraud platform would need monitoring, access control, audit logs, feedback loops, and compliance review.

EXP-002 / next