Credit Card fraud detection using Machine Learning 

  • By Shubham Baghel
  • June 3, 2021
  • Machine Learning

Credit Card fraud: 

Credit card fraud can be defined as a crime where a person uses another person’s credit card for personal reasons while the cardholder and issuing authorities do not know that the card is being used. Due to the rise and speed of E-Commerce, there has been a massive use of credit cards in online purchases which has led to a high number of credit card-related frauds. When doing digitalization, the need to identify credit card fraud is required. Fraud detection involves monitoring and analyzing the performance of various users to measure the detection or avoidance of undesirable behavior. To detect successful credit card fraud, we need to understand the various technologies, algorithms, and types involved in detecting credit card fraud. The algorithm can distinguish between fraudulent transactions or not. To find out the fraud, they need to transfer the data and the knowledge of the fraudulent activity. They analyze the database and classify everything that is done. Fraud detection includes monitoring users’ activities to balance, detect or avoid undesirable behavior, which includes fraud, intrusion, and error. Machine learning algorithms are used to analyze all authorized transactions and suspicious reporting. These reports are investigated by experts who contact cardholders to verify that the transaction was genuine or counterfeit. 

This implementation provides a response to an automated system used to train and update the machine learning algorithm to ultimately improve the performance of detecting fraud over time as follows.

 

For Free, Demo classes Call:  8605110150

Registration Link:Click Here!

 

Import the necessary packages and dataset

Such as numpy , pandas, matplotlib and read dataset “creditcard.csv” 

import numpy as np 

import pandas as pd 

import matplotlib.pyplot as plt 

data=pd.read_csv(‘creditcard.csv’)#available on kaggle print(data.shape) 

data.head()

 

Output:Overview of dataset

Plot and count Fraud Cases and Genuine Cases: 

Visualize Fraud Cases and Genuine Cases 

fraud_transaction=len(data[data[“Class”]==1]) 

genuine_transaction=len(data[data[“Class”]==0]) 

print(“Number of Fraud Transactions”,fraud_transaction) print(“Number of Genuine Transactions”,genuine_transaction) case = [‘fraud_transaction’, ‘genuine_transaction’] count = [fraud_transaction,genuine_transaction] 

plt.bar(case,count) 

plt.show()

For Free, Demo classes Call:  8605110150

Registration Link:Click Here!

Output: 

From the above count, we can conclude that the dataset is highly imbalanced as very few fraud cases in comparison with genuine cases. 

We need to use here ensemble technique(Random Forest Classifier) to classify cases. 

Splitting dataset: 

Splitting dataset as features(x) and target(y) .

x=data.drop([‘Class’],axis=1)

 

x.shape 

y=data[‘Class’] 

y.shape 

x_data=x.values 

y_data=y.values

 

Split further for training and testing purposes. 

from sklearn.model_selection import train_test_split as tts x_train,x_test,y_train,y_test=tts(x_data,y_data,test_size=0.25, random_state=0)

For Free, Demo classes Call:  8605110150

Registration Link:Click Here!

 

Apply model: 

Model creation and training model on dataset 

from sklearn.ensemble import RandomForestClassifier model=RandomForestClassifier() 

model.fit(x_train,y_train)

 

Output: 

Prediction: 

Predict the output of testing dataset as follows 

y_pred=model.predict(x_test)

 

Evaluation and accuracy check: 

Accuracy score checked using sklearn.metrics package 

from sklearn.metrics import confusion_matrix,accuracy_score print(accuracy_score(y_test,y_pred)*100) 

print(confusion_matrix(y_test,y_pred))

 

Output:

Plot the Confusion Matrix: 

For better visualization of confusion matrix use seaborn plotting package 

cf_matrix=confusion_matrix(y_test,y_pred) 

import seaborn as sns 

sns.heatmap(cf_matrix/np.sum(cf_matrix), annot=True,fmt=’.2%’, cmap=’gist_rainbow’)

 

For Free, Demo classes Call:  8605110150

Registration Link:Click Here!

 

Output: 

Using Random Forest Classifier 94 is correctly found as Fraud Cases. Now let us calculate how many real fraud cases are there in y_test data for that convert y-test(array) to DataFrame as follows, 

l_y_test=pd.DataFrame(y_test,columns=[‘Class’]) 

len(l_y_test[l_y_test[“Class”]==1])

 

Output: 

Therefore using Random Forest Classifier 94 is detected as fraud cases out of 120 cases 94/120*100=78.33 % accuracy which is quite good.

 

Author:
Kamble, Amol

Call the Trainer and Book your free demo Class now!!!

© Copyright 2021 | Sevenmentor Pvt Ltd.

Submit Comment

Your email address will not be published. Required fields are marked *

*
*