IJFANS International Journal of Food and Nutritional Sciences

Volume 13 Issue 4

Effect of Biostimulants (Azospirillum, Pseudomonas and Bacillus) on the growth and disease suppression of neem Azadirachta indica (A) juss.seedlings
Volume 13 | Issue 4

“Pigments of Imagination & Color Psychology of Consumers towards Apparel: A Perceptual Study”
Volume 13 | Issue 4

Exploring The Relationship Between Weather Patterns and Energy Consumption in Smart Homes: A Regression Analysis
Volume 13 | Issue 4

DEEP LEARNING BASED APPROACH FOR BIRD SPECIES IDENTIFICATION AND CLASSIFICATION
Volume 13 | Issue 4

ML-DRIVEN WASTE CLASSIFICATION FOR EFFECTIVE ORGANIC AND NON-ORGANIC WASTE MANAGEMENT
Volume 13 | Issue 4

A Machine Learning Approach for Diabetes Prediction in Women

PDF

Keywords:

Support vector machine, diabetes prediction, XGB, logistic regression, random forest,machine learning, accuracy, recall, precision, f1 score, linear discriminant analysis, Adasyn.

Afshan Hashmi1, Md Tabrez Nafis2,*, Sameena Naaz3, Imran Hussain4

Abstract

Diabetes is one of the diseases that are chronic and has seen exponential growth in the recent past. Trends suggest that the number of patients suffering from this disease is going to be doubled very soon which is a cause of serious concern and it needs to be tackled at the earliest. The reason why it is considered a chronic disease is that it is the cause of several other serious diseases such as hypertension, kidney failure, blindness, limb amputation, etc. So, it is highly required to predict diabetes as early as possible to protect the patient from further damage. Machine learning can be proven as a beneficial tool for the prediction of diabetes. In this study, we have taken the PIMA India dataset, dropped the highly correlated feature, and filled the missing value by KNN imputation. Inter Quartile range was used to get rid of the outliers and Adaptive synthetic sampling was used for class balancing and min-max scaler for normalizing the dataset. Eight machine learning algorithms were used named Support vector classifier, Logistic regression, Naïve Bayes, Decision Tree, Xtreme gradient boosting,K-nearest neighbor, Linear discriminant analysis, and Random Forest.These algorithms were compared based on various performance metrics such as Accuracy, Precision, Recall, F1-score, and Auc-Roc curve. It was found that the linear discriminant analysis and Xtreme gradient boosting was the best performer in terms of accuracy followed by Random Forest, Logistic regression, K nearest neighbor, support vector classifier, and naïve Bayes. The decision tree however showed poor performance. The effect of oversampling on the result was also analyzed and it was found that oversampling enhances the precision and F1 score of all the algorithms but decision tree. Performance can be further improved by using a larger dataset with no or negligible missing values or with a dataset with some additional features such as lifestyle, calorie intake, etc.

Issue

Volume 11, Issue - 12 (2022 )

Submit article