Machine Learning Echocardiogram

Machine learning is a method of training machines to make predictions based on experience. It uses an algorithm that iteratively learns from data and keeps getting better with additional data. These models can be effectively used to determine the effect of various variables and relationship between them. Machine learning is now being increasingly used in the field of science.

In this article, I demonstrate the use of machine learning in predicting the 1 year outcome of patients on the basis of their echocardiogram parameters. The dataset was taken from the UCI database This data has the echocardiogram of 132 patients. Due to missing data, 60 patient observations were used.

I will be using several alternate machine learning models and presenting the confusion matrix of them. Confusion matrix is a 2X2 table with the comparison of predicted and actual values.

Importing dataset

data <- read.csv("echocardiogram.data.csv", sep =',', na.strings = c('NA','?')) str(data) names(data) <- c('survival','alive','age','pericardialeffusion','fractionalshortening', 'epss','lvdd','wallmotionscore','wallmotionindex','mult','name','group','aliveat1')

Omit rows with missing data

In this part, we will remove the rows which contain missing data. We will not be imputing the missing values since it may effect out model. We will remove the variables that will not be useful for our study. The column “aliveat1”, which denotes if the patient is alive at 1 year will be turned in to a factor. This will be our dependent variable which we will try to predict with out model on the basis of the echocardiogram paramaters.

data1 <- na.omit(data)
data1 <- subset(data1, select=-c(name, survival, group, mult, alive))
data1$aliveat1 <- factor(data1$aliveat1, levels=c(0,1))
data1$pericardialeffusion <- factor(data1$pericardialeffusion, levels = c(0, 1))

Splitting dataset in to training and testing set

The machine learning model learns from the training test and its accuracy is tested by attempting to predict the value of the dependent variable in the test set.

library(caTools)
set.seed(1234)
split <- sample.split(data1$aliveat1, SplitRatio = 0.8)
train <- subset(data1, split == TRUE)
test <- subset(data1, split==FALSE)

Fitting the machine learning model

In this example, we are trying to predict whether the patient will be alive at 1 year, given the set of echocardiogram parameters. In other words, we are trying to classify the patient as either alive (1) or dead (0). This is a classical Machine learning classification problem and several algorithms may be used to solve it. Here we have used decision tree, Naive bayes, Support Vector Machines (SVM) and Random forest. With larger datasets, we will be able to compare these models based on their Confusion matrix scores.

Decision tree classification

# Model Fitting
library(rpart)
classifier <- rpart(aliveat1~., data=train)
y_pred <- predict(classifier, newdata=test[ , c(1:7)], type='class', method='class', control = rpart.control(minsplit=1))
# Confusion Matrix
library(caret)
confusionMatrix(test$aliveat1, y_pred)

Output of the Confusion matrix:

Confusion Matrix and Statistics

          Reference
Prediction 0 1
         0 8 1
         1 1 2
                                          
               Accuracy : 0.8333          
                 95% CI : (0.5159, 0.9791)
    No Information Rate : 0.75            
    P-Value [Acc > NIR] : 0.3907          
                                          
                  Kappa : 0.5556          
                                          
 Mcnemar's Test P-Value : 1.0000          
                                          
            Sensitivity : 0.8889          
            Specificity : 0.6667          
         Pos Pred Value : 0.8889          
         Neg Pred Value : 0.6667          
             Prevalence : 0.7500          
         Detection Rate : 0.6667          
   Detection Prevalence : 0.7500          
      Balanced Accuracy : 0.7778          
                                          
       'Positive' Class : 0 

Support Vector Classification

# SVM Model Fitting
library(e1071)
classifier <- svm(formula = aliveat1~., data=train, type="C-classification", kernel='linear')
y_pred <- predict(classifier, newdata=test[ , c(1:7)], type='class')
# Confusion Matrix
library(caret)
confusionMatrix(test$aliveat1, y_pred)

Output of the Confusion Matrix:

Confusion Matrix and Statistics

          Reference
Prediction  0  1
         0 10  3
         1  2  3
                                          
               Accuracy : 0.7222          
                 95% CI : (0.4652, 0.9031)
    No Information Rate : 0.6667          
    P-Value [Acc > NIR] : 0.4122          
                                          
                  Kappa : 0.3478          
                                          
 Mcnemar's Test P-Value : 1.0000          
                                          
            Sensitivity : 0.8333          
            Specificity : 0.5000          
         Pos Pred Value : 0.7692          
         Neg Pred Value : 0.6000          
             Prevalence : 0.6667          
         Detection Rate : 0.5556          
   Detection Prevalence : 0.7222          
      Balanced Accuracy : 0.6667          
                                          
       'Positive' Class : 0     

Naive Bayes Classification


# Naive Bayes Model Fitting
library(e1071)
classifier <- naiveBayes(x = train[-8], y = train$aliveat1)
# Predict Test set results
y_pred <- predict(classifier, newdata = test[-8])
# Confusion Matrix
library(caret)
confusionMatrix(test$aliveat1, y_pred)

Output of the confusion matrix

Confusion Matrix and Statistics

          Reference
Prediction  0  1
         0 10  3
         1  2  3
                                          
               Accuracy : 0.7222          
                 95% CI : (0.4652, 0.9031)
    No Information Rate : 0.6667          
    P-Value [Acc > NIR] : 0.4122          
                                          
                  Kappa : 0.3478          
                                          
 Mcnemar's Test P-Value : 1.0000          
                                          
            Sensitivity : 0.8333          
            Specificity : 0.5000          
         Pos Pred Value : 0.7692          
         Neg Pred Value : 0.6000          
             Prevalence : 0.6667          
         Detection Rate : 0.5556          
   Detection Prevalence : 0.7222          
      Balanced Accuracy : 0.6667          
                                          
       'Positive' Class : 0    

Random Forest Classification

# Random forest model fitting
library(randomForest)
classifier <- randomForest(x=train[-8], y = train$aliveat1, ntree=400)
# Predict Test set results
y_pred <- predict(classifier, newdata = test[-8])
# Confusion Matrix
library(caret)
confusionMatrix(test$aliveat1, y_pred)

 

Confusion Matrix Output

Confusion Matrix and Statistics

          Reference
Prediction  0  1
         0 12  1
         1  3  2
                                          
               Accuracy : 0.7778          
                 95% CI : (0.5236, 0.9359)
    No Information Rate : 0.8333          
    P-Value [Acc > NIR] : 0.8318          
                                          
                  Kappa : 0.3684          
                                          
 Mcnemar's Test P-Value : 0.6171          
                                          
            Sensitivity : 0.8000          
            Specificity : 0.6667          
         Pos Pred Value : 0.9231          
         Neg Pred Value : 0.4000          
             Prevalence : 0.8333          
         Detection Rate : 0.6667          
   Detection Prevalence : 0.7222          
      Balanced Accuracy : 0.7333          
                                          
       'Positive' Class : 0  

About Saurabh

6 thoughts on “Machine Learning Echocardiogram

  1. Just wish to say your article is as astounding. The clarity in your submit is just spectacular and that i can think you’re a professional in this subject. Fine with your permission allow me to seize your feed to stay updated with imminent post. Thanks a million and please continue the enjoyable work.

  2. Just wish to say your article is as astonishing.
    The clarity in your post is simply excellent and i could assume you’re an expert
    on this subject. Well with your permission let me to grab your RSS feed to keep up to date with forthcoming post.
    Thanks a million and please carry on the rewarding work.

Leave a Reply

Your email address will not be published. Required fields are marked *