IJFANS International Journal of Food and Nutritional Sciences

ISSN PRINT 2319 1775 Online 2320-7876

Finding Credit Card Fraud Using Supervised Machine Learning Algorithms: A Survey

Main Article Content

Ms. Neha Patidar

Abstract

The rapid growth in E-Commerce industry has lead to an exponential increase in the use of credit cards for online purchases and for different types of transactions . So there will be more chances for occurring fraud. Banks have many and enormous databases. Important business information can be extracted from these data stores. Fraud is an issue with far reaching consequences in the banking industry, government, corporate sectors and for ordinary consumers. Increasing dependence on new technologies such as cloud and mobile computing in recent years has encountered the problem. Physical detections are not only time consuming they are costly and they don’t give accurate result. Fraud is any malicious activity that aims to cause financial loss to the other party. As the use of digital money or plastic money even in developing countries is on the rise so is the fraud associated with them. Frauds caused by Credit Cards have costs consumers and banks billions of dollars globally. Even after numerous mechanisms to stop fraud, fraudsters are continuously trying to find new ways and tricks to commit fraud. . It has become very difficult for detecting the fraud in credit card system. Machine learning plays avital role for detecting the credit card fraud in the transactions. For predicting these transactions banks make use of various machine learning methodologies, past data has been collected and new features are been used for enhancing the predictive power. The performance of fraud detection in credit card transactions is greatly affected by the sampling approach on data-set, selection of variables and detection techniques used. We have explained various techniques available for a fraud detection system such as Random Forest Classifier, K-nearest neighbors Classifier, Decision Tree Classifier, Gaussian Naive Bayes and Logistic Regression. These techniques are applied on both unbalanced data and balanced data and we provide a survey and a comparative analyses of techniques for both unbalanced data and balanced data, together with evaluation metrics. Dataset of credit card transactions is collected from kaggle and it contains a total of 2,84,808 credit card transactions of a European bank data set. It considers fraud transactions as the “class 1” and genuine ones as the “class 0” . The data set is highly imbalanced, it has about 0.172% of fraud transactions and the rest are genuine transactions. So to balance the dataset SMOTE over sampling technique has been applied to the data set, which resulted in 50% of fraud transactions and 50% genuine ones. We trained five techniques and evaluate each methodology based on certain criteria namely sensitivity, precision, accuracy and ROC AUC. Based on the criteria of different techniques, the best technique for detecting credit card fraud is choosen. The five techniques are applied for the data set and work is implemented in python language.

Article Details