Machine learning is a method of training machines to make predictions based on experience. It uses an algorithm that iteratively learns from data and keeps getting better with additional data. These models can be effectively used to determine the effect of various variables and relationship between them. Machine learning is now being increasingly used in the field of science.
In this article, I demonstrate the use of machine learning in predicting the 1 year outcome of patients on the basis of their echocardiogram parameters. The dataset was taken from the UCI database. This data has the echocardiogram of 132 patients. Due to missing data, 60 patient observations were used.
I will be using several alternate machine learning models and presenting the confusion matrix of them. Confusion matrix is a 2X2 table with the comparison of predicted and actual values.
Importing dataset
data <- read.csv("echocardiogram.data.csv", sep =',', na.strings = c('NA','?')) str(data) names(data) <- c('survival','alive','age','pericardialeffusion','fractionalshortening', 'epss','lvdd','wallmotionscore','wallmotionindex','mult','name','group','aliveat1')
Omit rows with missing data
In this part, we will remove the rows which contain missing data. We will not be imputing the missing values since it may effect out model. We will remove the variables that will not be useful for our study. The column “aliveat1”, which denotes if the patient is alive at 1 year will be turned in to a factor. This will be our dependent variable which we will try to predict with out model on the basis of the echocardiogram paramaters.
data1 <- na.omit(data)
data1 <- subset(data1, select=-c(name, survival, group, mult, alive))
data1$aliveat1 <- factor(data1$aliveat1, levels=c(0,1))
data1$pericardialeffusion <- factor(data1$pericardialeffusion, levels = c(0, 1))
Splitting dataset in to training and testing set
The machine learning model learns from the training test and its accuracy is tested by attempting to predict the value of the dependent variable in the test set.
library(caTools)
set.seed(1234)
split <- sample.split(data1$aliveat1, SplitRatio = 0.8)
train <- subset(data1, split == TRUE)
test <- subset(data1, split==FALSE)
Fitting the machine learning model
In this example, we are trying to predict whether the patient will be alive at 1 year, given the set of echocardiogram parameters. In other words, we are trying to classify the patient as either alive (1) or dead (0). This is a classical Machine learning classification problem and several algorithms may be used to solve it. Here we have used decision tree, Naive bayes, Support Vector Machines (SVM) and Random forest. With larger datasets, we will be able to compare these models based on their Confusion matrix scores.
Decision tree classification
# Model Fitting
library(rpart)
classifier <- rpart(aliveat1~., data=train)
y_pred <- predict(classifier, newdata=test[ , c(1:7)], type='class', method='class', control = rpart.control(minsplit=1))
# Confusion Matrix
library(caret)
confusionMatrix(test$aliveat1, y_pred)
Output of the Confusion matrix:
Confusion Matrix and Statistics
Reference
Prediction 0 1
0 8 1
1 1 2
Accuracy : 0.8333
95% CI : (0.5159, 0.9791)
No Information Rate : 0.75
P-Value [Acc > NIR] : 0.3907
Kappa : 0.5556
Mcnemar's Test P-Value : 1.0000
Sensitivity : 0.8889
Specificity : 0.6667
Pos Pred Value : 0.8889
Neg Pred Value : 0.6667
Prevalence : 0.7500
Detection Rate : 0.6667
Detection Prevalence : 0.7500
Balanced Accuracy : 0.7778
'Positive' Class : 0
Support Vector Classification
# SVM Model Fitting
library(e1071)
classifier <- svm(formula = aliveat1~., data=train, type="C-classification", kernel='linear')
y_pred <- predict(classifier, newdata=test[ , c(1:7)], type='class')
# Confusion Matrix
library(caret)
confusionMatrix(test$aliveat1, y_pred)
Output of the Confusion Matrix:
Confusion Matrix and Statistics
Reference
Prediction 0 1
0 10 3
1 2 3
Accuracy : 0.7222
95% CI : (0.4652, 0.9031)
No Information Rate : 0.6667
P-Value [Acc > NIR] : 0.4122
Kappa : 0.3478
Mcnemar's Test P-Value : 1.0000
Sensitivity : 0.8333
Specificity : 0.5000
Pos Pred Value : 0.7692
Neg Pred Value : 0.6000
Prevalence : 0.6667
Detection Rate : 0.5556
Detection Prevalence : 0.7222
Balanced Accuracy : 0.6667
'Positive' Class : 0
Naive Bayes Classification
# Naive Bayes Model Fitting
library(e1071)
classifier <- naiveBayes(x = train[-8], y = train$aliveat1)
# Predict Test set results
y_pred <- predict(classifier, newdata = test[-8])
# Confusion Matrix
library(caret)
confusionMatrix(test$aliveat1, y_pred)
Output of the confusion matrix
Confusion Matrix and Statistics
Reference
Prediction 0 1
0 10 3
1 2 3
Accuracy : 0.7222
95% CI : (0.4652, 0.9031)
No Information Rate : 0.6667
P-Value [Acc > NIR] : 0.4122
Kappa : 0.3478
Mcnemar's Test P-Value : 1.0000
Sensitivity : 0.8333
Specificity : 0.5000
Pos Pred Value : 0.7692
Neg Pred Value : 0.6000
Prevalence : 0.6667
Detection Rate : 0.5556
Detection Prevalence : 0.7222
Balanced Accuracy : 0.6667
'Positive' Class : 0
Random Forest Classification
# Random forest model fitting
library(randomForest)
classifier <- randomForest(x=train[-8], y = train$aliveat1, ntree=400)
# Predict Test set results
y_pred <- predict(classifier, newdata = test[-8])
# Confusion Matrix
library(caret)
confusionMatrix(test$aliveat1, y_pred)
Confusion Matrix Output
Confusion Matrix and Statistics
Reference
Prediction 0 1
0 12 1
1 3 2
Accuracy : 0.7778
95% CI : (0.5236, 0.9359)
No Information Rate : 0.8333
P-Value [Acc > NIR] : 0.8318
Kappa : 0.3684
Mcnemar's Test P-Value : 0.6171
Sensitivity : 0.8000
Specificity : 0.6667
Pos Pred Value : 0.9231
Neg Pred Value : 0.4000
Prevalence : 0.8333
Detection Rate : 0.6667
Detection Prevalence : 0.7222
Balanced Accuracy : 0.7333
'Positive' Class : 0
Really informative article. thanks!
Hi there to all, because I am truly keen of reading
this website’s post to be updated daily. It includes nice data.
This site definitely has all the info I needed concerning this subject and didn’t know
who to ask.
Just wish to say your article is as astounding. The clarity in your submit is just spectacular and that i can think you’re a professional in this subject. Fine with your permission allow me to seize your feed to stay updated with imminent post. Thanks a million and please continue the enjoyable work.
whoah this weblog is excellent i really like studying your posts.
Keep up the good work! You understand, lots of individuals are searching around for this info, you could help them greatly.
Just wish to say your article is as astonishing.
The clarity in your post is simply excellent and i could assume you’re an expert
on this subject. Well with your permission let me to grab your RSS feed to keep up to date with forthcoming post.
Thanks a million and please carry on the rewarding work.