Prediction Of Tumor Severity

In this project, we aim to predict whether a tumor is benign or malignant. we implemented KNN on Python.

The data contains the following columns:

  • BI_RADS_assessment: Definitely benign(1) to Highly suggestive of malignancy (5)
  • Age: patient’s age in years
  • Shape: mass shape: round=1 oval=2 lobular=3 irregular=4 (nominal)
  • Margin: mass margin: circumscribed=1 microlobulated=2 obscured=3 ill-defined=4 spiculated=5 (nominal)
  • Density: mass density high=1 iso=2 low=3 fat-containing=4 (ordinal)
  • Severity: Predictor Class: benign=0 or malignant=1


let’s get our environment ready with the libraries we’ll need and then import the data!

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
from sklearn.metrics import confusion_matrix
from sklearn import metrics

Check out the Data!

df = pd.read_csv('~/DataSet GitHub/KNN/mammogram_weka_dataset.csv')


Let’s create some simple plots to check out the data!

#corelation matrix.
cor_mat= df[:].corr()
mask = np.array(cor_mat)
mask[np.tril_indices_from(mask)] = False


Standardize the Variables

Because the KNN classifier predicts the class of a given test observation by identifying the observations that are nearest to it, the scale of the variables matters. Any variables that are on a large scale will have a much larger effect on the distance between the observations, and hence on the KNN classifier, than variables that are on a small scale.

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()'severity',axis=1))
scaled_features = scaler.transform(df.drop('severity',axis=1))
df_feat = pd.DataFrame(scaled_features,columns=df.columns[:-1])

Training KNN Model

Let’s now begin to train out the regression model! We will need to first split up our data into an X array that contains the features to train on, and a y array with the target variable.

We split our data for test and train our regression. We use sklearn library for that. I use %30 for test my regression and %70 for train my regression.

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(scaled_features, df['severity'], test_size=0.3)

Creating and Training the Model

Remember that we are trying to come up with a model to predict whether the tumor will be benign or malignant. We’ll start with k=1.

from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors=1),y_train)

Predictions and Evaluations

Now predict values for the testing data.

pred = knn.predict(X_test)

Making Confusion Matrix

Confusion Matrix is going to contain the correct predictions that our model made on the set as well as the incorrect predictions.

from sklearn.metrics import classification_report,confusion_matrix
cm = confusion_matrix(y_test,pred)
class_names=[0,1] # name  of classes
fig, ax = plt.subplots()
tick_marks = np.arange(len(class_names))
plt.xticks(tick_marks, class_names)
plt.yticks(tick_marks, class_names)
# create heatmap
sns.heatmap(pd.DataFrame(cm), annot=True, cmap="YlGnBu" ,fmt='g')
plt.title('Confusion matrix', y=1.1)
plt.ylabel('Actual label')
plt.xlabel('Predicted label')

135 and 91 are the correct predictions. In addition, 34 and 29 are the incorrect predictions. so we can see that we have quiet lot of correct predictions.

Also Read:  Prediction of Tomorrow Rain in Australia

Correct Predictions : 135+91 = 226

Incorrect Predictions: 34+29 = 63

Create a classification report for the model.

The accuracy of the model is %78 !!!

Choosing a K Value

Let’s go ahead and use the elbow method to pick a good K Value:

error_rate = []

# Will take some time
for i in range(1,40):
    knn = KNeighborsClassifier(n_neighbors=i),y_train)
    pred_i = knn.predict(X_test)
    error_rate.append(np.mean(pred_i != y_test))
plt.plot(range(1,40),error_rate,color='blue', linestyle='dashed', marker='o',
         markerfacecolor='red', markersize=10)
plt.title('Error Rate vs. K Value')
plt.ylabel('Error Rate')

The accuracy of the model with k=5 is %82 !

1031cookie-checkPrediction Of Tumor Severity

Leave a Reply

Your email address will not be published.