Credit Card Fraud Detection Using ML

7 minute read

Project Overview

Context

Welcome to the Credit Card Fraud Detection Project!

Credit card fraud poses a major threat to both financial institutions and consumers. As online transactions become more prevalent, identifying fraudulent activities has become more complex.

Here, I created a fraud detection system using Python and widely-used machine learning libraries like scikit-learn. The primary goal was to build a robust fraud detection system using Random Forest, Logistic Regression, and Decision Tree classifiers. This project also addresses the class imbalance problem inherent in credit card fraud datasets.

Project Structure

creditcard.csv: The dataset containing transaction details and fraud labels (The dataset used in this project is available on Kaggle).

Credit Card Fraud Detection.py: The main script containing data preprocessing, model training, evaluation, and model saving.

credit_card_model.p: The serialized Random Forest model saved using pickle.

Actions

I first needed to compile the necessary data from tables in the database.

As I was predicting a binary output, I tested three classification modeling approaches, namely:

Logistic Regression
Decision Tree
Random Forest

I imported the data, defined models using a dictionary, trained & tested each model, and then measured this predictive performance based on several metrics to give a well-rounded overview of which was the best. I also used undersampling and oversampling techniques to address the class imbalance problem inherent in credit card fraud datasets.

Results

Here, I aimed to build a model that would accurately predict the fraud transaction.For that, I calculated the abovementioned metrics for all three models.

Metric 1: Classification Accuracy

Random Forest = 0.99
Decision Tree = 0.99
Logistic Regression = 0.99

Metric 2: Precision

Random Forest = 0.90
Logistic Regression = 0.88
Decision Tree = 0.66

Metric 3: Recall

Random Forest = 0.76
Decision Tree = 0.75
Logistic Regression = 0.58

Metric 4: F1 Score

Random Forest = 0.82
Logistic Regression = 0.7
Decision Tree = 0.7

Based upon these, the chosen model is the Random Forest as it was the most consistently performant on the test set across classification accuracy, precision, recall, and F1-score.

Handling Class Imbalance

The project addresses class imbalance using:

Undersampling: Reducing the majority class to balance the dataset.
Oversampling: Using SMOTE (Synthetic Minority Over-sampling Technique) to generate synthetic samples for the minority class.

I implemented both undersampling and oversampling techniques, but oversampling appears to be more favorable as it ensures no data is lost.

Using oversampling, I ended up following enhanced metrics with Random forest and decision tree offers very smilar metrics.

Metric 1: Classification Accuracy

Random Forest = 0.99
Decision Tree = 0.99
Logistic Regression = 0.96

Metric 2: Precision

Random Forest = 0.99
Decision Tree = 0.99
Logistic Regression = 0.98

Metric 3: Recall

Random Forest = 1
Decision Tree = 0.99
Logistic Regression = 0.93

Metric 4: F1 Score

Random Forest = 0.99
Decision Tree = 0.99
Logistic Regression = 0.96

Modelling Overview

I built a model that looked to accurately predict fraud transaction.

As I was predicting a binary output, I tested three classification modeling approaches, namely:

Logistic Regression
Decision Tree
Random Forest

Import Required Packagess

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import pickle


from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from sklearn.preprocessing import StandardScaler
from imblearn.over_sampling import SMOTE

Get the Data

Read the creditcard.csv file into a dataframe.

data = pd.read_csv("creditcard.csv")

Data Preprocessing

# Check data head
data.head()

# Let's see all the columns (instead of seeing ... for a couple of them)
pd.options.display.max_columns = None
data.head()

# Check data tail
data.tail()

# Check data shape
data.shape
print(f'Number of rows: {data.shape[0]}')
print(f'Number of columns: {data.shape[1]}')

# Check data type and missing values
data.info()
data.isna().sum()

# Drop Time column
data.drop('Time', axis = 1, inplace = True)

# Find duplicated data
data.duplicated().any()

# Drop duplicated data
data = data.drop_duplicates()
data.shape

I also investigated the class balance of the dependent variable - which is important when assessing classification accuracy.

# Class balance
data['Class'].value_counts()   

From the last step in the above code, I saw that 0.2% of the data were in class 1 and the rest of them were in class 0. This showed me a clear class imbalance. I made sure to not rely on classification accuracy alone when assessing results - also analyzing Precision, Recall, and F1-Score.

Create Input and Output Variables

X = data.drop(['Class'], axis = 1)
y = data['Class']

Train Test Split

Train_test_split was used to split our data into a training set and a testing set.

X_train, X_test, y_train, y_test = train_test_split (X, y, test_size = 0.2, random_state = 42)

Standardisation

Standardisation was used for scaling the data (normaliztion makes a specific bound for my data that was not what I wanted here)

scale_standard = StandardScaler()

X_train = pd.DataFrame(scale_standard.fit_transform(X_train), columns = X_train.columns)
X_test = pd.DataFrame(scale_standard.transform(X_test), columns = X_test.columns)

Model Training and Assessment

Three classifiers were trained and evaluated:

Logistic Regression
Decision Tree Classifier
Random Forest Classifier

Each model’s performance was evaluated using the following metrics:

Accuracy
Precision
Recall
F1-Score

classifier = {
       'LogisticRegression': LogisticRegression(random_state = 42),
       'DecisionTreeClassifier': DecisionTreeClassifier(random_state = 42),
       'RandomForestClassifier': RandomForestClassifier(random_state = 42)}

for name, clf in classifier.items():
    
    print(f'\n==========={name}==========')
    
    clf.fit(X_train, y_train)
    y_pred_class = clf.predict(X_test)

    # Accuracy (the number of correct classification out of all attempted classifications)
    accuracy = accuracy_score(y_test, y_pred_class) 
    print(f'\n Accuracy: {accuracy}')
    
    # Precision (of all observations that were predicted as positive, how many were actually positive)
    precision = precision_score(y_test, y_pred_class)
    print(f'\n Precision: {precision}')

    # Recall (of all positive observations, how many did we predict as positive)
    recall = recall_score(y_test, y_pred_class) 
    print(f'\n Recall: {recall}')

    # F1-Score (the harmonic mean of precision and recall)
    f1 = f1_score(y_test, y_pred_class)
    print(f'\n F1_score: {f1}')

Below, you can find the abovementioned metrics for all three models, without addressing class imbalance.

Metric 1: Classification Accuracy

Random Forest = 0.99
Decision Tree = 0.99
Logistic Regression = 0.99

Metric 2: Precision

Random Forest = 0.90
Logistic Regression = 0.88
Decision Tree = 0.66

Metric 3: Recall

Random Forest = 0.76
Decision Tree = 0.75
Logistic Regression = 0.58

Metric 4: F1 Score

Random Forest = 0.82
Logistic Regression = 0.7
Decision Tree = 0.7

Handling Class Imbalance

The dataset includes two classes of data: class 1 represents fraudulent transactions, while class 0 represents normal transactions:

normal = data[data['Class']==0]
fraud = data[data['Class']==1]

normal.shape
fraud.shape

Undersampling

normal.sample = normal.sample(n=473)

# Create new undersampled dataset
new_data  = pd.concat([normal.sample, fraud], ignore_index=True)
new_data.head()
new_data['Class'].value_counts()

X = new_data.drop(['Class'], axis = 1)
y = new_data['Class']

X_train, X_test, y_train, y_test = train_test_split (X, y, test_size = 0.2, random_state = 42)

scale_standard = StandardScaler()

X_train = pd.DataFrame(scale_standard.fit_transform(X_train), columns = X_train.columns)
X_test = pd.DataFrame(scale_standard.transform(X_test), columns = X_test.columns)

classifier = {
       'LogisticRegression': LogisticRegression(random_state = 42),
       'DecisionTreeClassifier': DecisionTreeClassifier(random_state = 42),
       'RandomForestClassifier': RandomForestClassifier(random_state = 42)}               

for name, clf in classifier.items():
    print(f'\n==========={name}==========')
    clf.fit(X_train, y_train)
    y_pred_class = clf.predict(X_test)

    # Accuracy (the number of correct classification out of all attempted classifications)
    accuracy = accuracy_score(y_test, y_pred_class) 
    print(f'\n Accuracy: {accuracy}')
    
    # Precision (of all observations that were predicted as positive, how many were actually positive)
    precision = precision_score(y_test, y_pred_class)
    print(f'\n Precision: {precision}')

    # Recall (of all positive observations, how many did we predict as positive)
    recall = recall_score(y_test, y_pred_class) 
    print(f'\n Recall: {recall}')

    # F1-Score (the harmonic mean of precision and recall)
    f1 = f1_score(y_test, y_pred_class)
    print(f'\n F1_score: {f1}')

Oversampling

X = data.drop(['Class'], axis = 1)
y = data['Class']

X.shape
y.shape

X_res, y_res = SMOTE().fit_resample(X,y)

y_res.value_counts()

X_train, X_test, y_train, y_test = train_test_split (X_res, y_res, test_size = 0.2, random_state = 42)

scale_standard = StandardScaler()

X_train = pd.DataFrame(scale_standard.fit_transform(X_train), columns = X_train.columns)
X_test = pd.DataFrame(scale_standard.transform(X_test), columns = X_test.columns)

classifier = {
       'LogisticRegression': LogisticRegression(random_state = 42),
       'DecisionTreeClassifier': DecisionTreeClassifier(random_state = 42),
       'RandomForestClassifier': RandomForestClassifier(random_state = 42)}              

for name, clf in classifier.items():
    print(f'\n==========={name}==========')
    clf.fit(X_train, y_train)
    y_pred_class = clf.predict(X_test)

    # Accuracy (the number of correct classification out of all attempted classifications)
    accuracy = accuracy_score(y_test, y_pred_class) 
    print(f'\n Accuracy: {accuracy}')
    
    # Precision (of all observations that were predicted as positive, how many were actually positive)
    precision = precision_score(y_test, y_pred_class)
    print(f'\n Precision: {precision}')

    # Recall (of all positive observations, how many did we predict as positive)
    recall = recall_score(y_test, y_pred_class) 
    print(f'\n Recall: {recall}')

    # F1-Score (the harmonic mean of precision and recall)
    f1 = f1_score(y_test, y_pred_class)
    print(f'\n F1_score: {f1}')

Running this code resulted in:

Metric 1: Classification Accuracy

Random Forest = 0.99
Decision Tree = 0.99
Logistic Regression = 0.96

Metric 2: Precision

Random Forest = 0.99
Decision Tree = 0.99
Logistic Regression = 0.98

Metric 3: Recall

Random Forest = 1
Decision Tree = 0.99
Logistic Regression = 0.93

Metric 4: F1 Score

Random Forest = 0.99
Decision Tree = 0.99
Logistic Regression = 0.96

These are all higher than what we saw without resampling! Random forest and decision tree offers very smilar metrics.

Save Model

rfc = RandomForestClassifier()
rfc.fit(X_res, y_res)

pickle.dump(rfc, open('Credit Card Fraud Detection/credit_card_model.p', 'wb'))   

Fraud Detection (Prediction)

An example prediction is provided using the saved model to classify a new transaction.

model = pickle.load(open('Credit Card Fraud Detection/credit_card_model.p', 'rb'))         
                 
prediction = model.predict([[-1.3598071336738,-0.0727811733098497,2.53634673796914,1.37815522427443,-0.338320769942518,0.462387777762292,0.239598554061257,0.0986979012610507,0.363786969611213,0.0907941719789316,-0.551599533260813,-0.617800855762348,-0.991389847235408,-0.311169353699879,1.46817697209427,-0.470400525259478,0.207971241929242,0.0257905801985591,0.403992960255733,0.251412098239705,-0.018306777944153,0.277837575558899,-0.110473910188767,0.0669280749146731,0.128539358273528,-0.189114843888824,0.133558376740387,-0.0210530534538215,149.62]])                                                                     
prediction[0]     

if prediction[0]==0:
    print ('Normal Transaction')
else:
    print ('Fraud Transaction')

Growth & Next Steps

We could look to tune the hyperparameters of the Random Forest, notably regularisation parameters such as tree depth, as well as potentially training on a higher number of Decision Trees in the Random Forest.

Arezoo Khalili

Data Science Portfolio

Credit Card Fraud Detection Using ML

Project Overview

Context

Project Structure

Actions

Results

Handling Class Imbalance

Modelling Overview

Import Required Packagess

Get the Data

Data Preprocessing

Create Input and Output Variables

Train Test Split

Standardisation

Model Training and Assessment

Handling Class Imbalance

Undersampling

Oversampling

Save Model

Fraud Detection (Prediction)

Growth & Next Steps