Clustering uncertain overlapping symptoms of multiple diseases in clinical diagnosis

View article
PeerJ Computer Science

Main article text

 

Introduction

Preliminaries

Possible world model

A MayBMS repair key

Literature review

Materials and Methods

Dataset

Calculating probability values

Conditional probability matrix

Bayesian network

Probabilistic inferences

Exact inference in chains in two node network

Proposed technique based on Bayesian network and naive Bayes algorithm

Normalization

Output of the proposed technique

Inference techniques for Bayesian networks

Rejection sampling

Likelihood weighting

Gibbs sampling

Experimental setup

Results

Evaluation metrics

Absolute error

Relative error

Results of evaluation metrics

Assessment of clustering quality

Completeness

Homogeneity

V-measure

Purity

Comparative analysis

Discussion

Conclusion

Supplemental Information

The symptoms occurred in disease Hepatititis B and C.

There are a number of symptoms occurred in disease Hepatitis B and C, the symptom with value 1 means it has been occurred in that disease and 0 means it has not occurred in that disease.

DOI: 10.7717/peerj-cs.2315/supp-1

Training data: disease information about 42 diseases and information about 132 symptoms.

DOI: 10.7717/peerj-cs.2315/supp-2

Code to compare the Quality of Clusters.

This code compares the quality of clusters generated by our algorithm with those generated by Fuzzy C-Means.

DOI: 10.7717/peerj-cs.2315/supp-3

Comparing Algorithm and Sampling Probability Values.

Comparing the probability values calculated by our algorithm with the probability values calculated by the sampling algorithm.

DOI: 10.7717/peerj-cs.2315/supp-4

CLustering of whole dataset of symptoms.

DOI: 10.7717/peerj-cs.2315/supp-5

Clustering of symptoms.

DOI: 10.7717/peerj-cs.2315/supp-6

Additional Information and Declarations

Competing Interests

The authors declare that they have no competing interests.

Author Contributions

Asif Ali Wagan conceived and designed the experiments, performed the experiments, analyzed the data, performed the computation work, prepared figures and/or tables, authored or reviewed drafts of the article, designed the database to store the symptom information, and approved the final draft.

Shahnawaz Talpur analyzed the data, authored or reviewed drafts of the article, and approved the final draft.

Sanam Narejo performed the experiments, authored or reviewed drafts of the article, and approved the final draft.

Data Availability

The following information was supplied regarding data availability:

The dataset is available at Kaggle: https://www.kaggle.com/datasets/kaushil268/disease-prediction-using-machine-learning. The raw data contains comprehensive details for all instances used in our analysis, including the symptoms and prognoses. This dataset, created by Kaushil Mangaroliya, comprises 132 symptoms and can predict 42 different types of diseases. The dataset is split into two files: one for training and one for testing machine learning models.

The code is available in the Supplemental Files.

Funding

The authors received no funding for this work.

1,802 Visitors 3,699 Views 30 Downloads