Diabetes Prediction
Support Vector Machine
![]() |
| Fig: Support Vector Machine |
Workflow
Predicting Diabetes Using Support Vector
Machine: A Focus on Accuracy Evaluation
Abstract
The early detection of diabetes is crucial for effective management and
treatment. This study investigates the use of the Support Vector Machine (SVM)
algorithm to predict diabetes based on a dataset containing various health
indicators. The SVM model was evaluated based solely on accuracy and achieved
an accuracy of 78%. The findings highlight the potential of SVM as a reliable
tool for aiding early diabetes diagnosis.
1. Introduction
Diabetes is a chronic condition that affects millions of people globally.
Early diagnosis and management are essential to prevent severe complications
associated with the disease. Traditional diagnostic methods can be
resource-intensive, relying on extensive medical testing. Machine learning,
particularly Support Vector Machines (SVMs), offers a more efficient
alternative by enabling automated predictions based on patient data.
This study aims to evaluate the effectiveness of the SVM algorithm in
predicting diabetes using a dataset of health indicators. The evaluation
focuses solely on the accuracy metric to determine the model's predictive
performance.
2. Related Works
Several studies have explored the use of various machine learning
algorithms for diabetes prediction. SVMs, in particular, have been widely used
due to their ability to handle high-dimensional data and their robustness in
binary classification tasks. Prior research has shown that SVMs can achieve
competitive accuracy compared to other algorithms, especially when feature
scaling and kernel selection are appropriately handled.
For instance, Patel et al. (2020) utilized SVMs for diabetes prediction
and reported an accuracy of 78% on the Pima Indians Diabetes Dataset. Other
studies have also demonstrated the effectiveness of SVMs, highlighting their
potential in clinical applications.
3. Algorithm
This study employs the Support Vector Machine (SVM) algorithm for
diabetes prediction. SVM is a powerful supervised learning model that
constructs a hyperplane to separate classes in a high-dimensional space. The
key characteristics of SVM include:
Kernel Functions: SVM uses kernel functions to transform the input data into a
higher-dimensional space, making it easier to separate classes that are not
linearly separable in the original space.
Margin Maximization: SVM aims to maximize the margin between the hyperplane and the nearest
data points of any class, which helps improve the model's generalization
capability.
4. Experimental Work
4.1 Dataset
The dataset used in this study consists of 768 records and 9 features,
including glucose levels, BMI, blood pressure, insulin levels, and family
history of diabetes. The dataset was obtained from Kaggle, a widely used
benchmark dataset for diabetes prediction.
4.2 Data Preprocessing
The data preprocessing steps included handling missing values, scaling
features, and splitting the dataset into training and testing subsets using an
80-20 split. Feature scaling was performed to ensure that all features
contributed equally to the SVM model's performance.
4.3 Model Training and Evaluation
The SVM model was trained on the training dataset. The model's
performance was evaluated on the test set using only the accuracy metric.
Hyperparameters were tuned using grid search to optimize the model's accuracy.
5. Methodology
The methodology followed in this study is outlined below:
Data Collection and Preprocessing: The dataset was cleaned, missing values were handled,
and features were scaled. The data was split into training and testing sets
with an 80-20 ratio.
Model Selection: SVM was chosen for its proven effectiveness in binary classification
tasks. The Radial Basis Function (RBF) kernel was employed to handle
non-linearly separable data.
Training and Evaluation: The SVM model was trained on the training data, and its accuracy was
evaluated on the test set. Hyperparameters were adjusted to achieve the highest
accuracy possible.
6. Results
The SVM model achieved an accuracy of 77% on the test set. The accuracy
metric indicates the proportion of correct predictions made by the model out of
all predictions. This result suggests that the SVM model is effective in
predicting diabetes based on the given dataset.
7. Conclusion
This study evaluated the use of the Support Vector Machine (SVM)
algorithm for diabetes prediction, focusing solely on accuracy as the
performance metric. The model achieved an accuracy of 78%, demonstrating its
potential as a reliable tool for early diabetes diagnosis. Future work could
explore additional performance metrics such as precision, recall, and F1-score
to provide a more comprehensive evaluation of the model's performance.
8. References
- Patel, R., & Sharma, S. (2020). "Application of Support Vector Machines in Diabetes Prediction: A Comparative Study." Journal of Healthcare Informatics, 18(2), 45-52.
- Cortes, C., & Vapnik, V. (1995). "Support-Vector Networks." Machine Learning, 20(3), 273-297.
- Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., … Duchesnay, É. (2011). "Scikit-learn: Machine Learning in Python." Journal of Machine Learning Research, 12, 2825–2830.
.jpeg)

Comments
Post a Comment