Building Fraud Detection Solution on GCP

  1. Preparing the data on BigQuery
  2. building the fraud detection model using BigQuery ML
  3. hosting the BigQuery ML model on AI Platform to make online predictions on streaming data using Dataflow
  4. setting up alert-based fraud notifications using Pub/Sub
  5. creating operational dashboards for business stakeholders and the technical team using Data Studio

Preparing the data on BigQuery

SELECT
*
FROM
`qp-fraud-detection.cc_data.training_data`

Building the fraud detection model using BigQuery ML

CREATE MODEL   
`[PROJECT_ID].[DATASET].simplemodel`
OPTIONS(
model_type='BOOSTED_TREE_CLASSIFIER',
num_parallel_tree=8,
max_iterations=50,
input_label_cols=["is_fraud"]
) AS
SELECT
*
FROM `qp-fraud-detection.cc_data.train_simple`
CREATE VIEW cc_data.train_w_aggregates AS (
SELECT
TIMESTAMP_DIFF(trans_date_trans_time, last_txn_date , MINUTE) AS trans_diff,
AVG(amt) OVER(
PARTITION BY cc_num
ORDER BY unix_time
RANGE BETWEEN 604800 PRECEDING AND 1 PRECEDING) AS avg_spend_pw,
AVG(amt) OVER(
PARTITION BY cc_num
ORDER BY unix_time
RANGE BETWEEN 2592000 PRECEDING AND 1 PRECEDING) AS avg_spend_pm,
COUNT(*) OVER(
PARTITION BY cc_num
ORDER BY unix_time
RANGE BETWEEN 86400 PRECEDING AND 1 PRECEDING ) AS trans_freq_24,
...
CREATE MODEL  
`[PROJECT_ID].[DATASET].model_w_aggregates`
OPTIONS(
model_type='BOOSTED_TREE_CLASSIFIER',
num_parallel_tree=8,
max_iterations=50,
input_label_cols=["is_fraud"]
) AS
SELECT * FROM `qp-fraud-detection.cc_data.train_w_aggregates`
SELECT 
"simplemodel" AS model_name,
*
FROM
ML.EVALUATE(
MODEL `[PROJECT_ID].[DATASET].[MODEL_NAME_WITHOUT_AGG]`,
(SELECT * FROM `qp-fraud-detection.cc_data.test_simple`))
UNION ALL
SELECT
"model_w_aggregates" AS model_name,
*
FROM
ML.EVALUATE(
MODEL `[PROJECT_ID].[DATASET].[MODEL_NAME_WITH_AGG]`,
(SELECT * FROM `qp-fraud-detection.cc_data.test_w_aggregates`))

Hosting the BigQuery ML model for online predictions on streaming data using Dataflow

gcloud ai-platform predict --model $AI_MODEL_NAME --version $VERSION_NAME_WITHOUT_AGG --region global --json-instances sample_inputs/input_wo_aggregates.json
{
"trans_date_trans_time": "2020-12-20 02:56:31 UTC",
"cc_num": 4464457352619,
"merchant": "fraud_Parisian, Schiller and Altenwerth",
"category": "misc_net",
"amt": 757.78,
"trans_num": "1b2d78e707630f2e30e728a0fdba7003",
"unix_time": 1608432991,
"merch_lat": 30.481454,
"merch_long": -83.915763
}

Setting up alert-based fraud notifications using Pub/Sub

{
"trans_num": "1b2d78e707630f2e30e728a0fdba7003",
"is_fraud_model_w_aggregates": 0,
"is_fraud_simplemodel": 1,
"prob_is_fraud_model_w_aggregates": [0.09164, 0.90836],
"prob_is_fraud_simplemodel": [0.87256, 0.12744]
}

Creating dashboards for business and technical operations

Summary

--

--

--

If your Product gives great value, you will live immortal !

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Words are more than wind!

Machine Learning Has Been Used to Automatically Translate Long-Lost Languages

Named Tuples: A Little Known Machine Learning Helper

Maximum likelihood estimation -MLE

Electroencephalography (EEG) based person identification

MLS-C01 : Passes in 2020.

Watermark and protect your Deep Neural Networks!

Feature Selection and Dimensionality Reduction

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Akash M Dubey

Akash M Dubey

If your Product gives great value, you will live immortal !

More from Medium

Data implementation, starting with the ‘why’

Are some users really passionate about your product?

Adventures in product discovery land: part II

Experimentation-driven development