Prediction Of Tumor Severity

In this project, we aim to predict whether a tumor is benign or malignant. we implemented KNN on Python.

The data contains the following columns:

  • BI_RADS_assessment: Definitely benign(1) to Highly suggestive of malignancy (5)
  • Age: patient’s age in years
  • Shape: mass shape: round=1 oval=2 lobular=3 irregular=4 (nominal)
  • Margin: mass margin: circumscribed=1 microlobulated=2 obscured=3 ill-defined=4 spiculated=5 (nominal)
  • Density: mass density high=1 iso=2 low=3 fat-containing=4 (ordinal)
  • Severity: Predictor Class: benign=0 or malignant=1

.

let’s get our environment ready with the libraries we’ll need and then import the data!

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
from sklearn.metrics import confusion_matrix
from sklearn import metrics

Check out the Data!

df = pd.read_csv('~/DataSet GitHub/KNN/mammogram_weka_dataset.csv')
df.head()
df.info()
df.corr()

EDA

Let’s create some simple plots to check out the data!

#corelation matrix.
cor_mat= df[:].corr()
mask = np.array(cor_mat)
mask[np.tril_indices_from(mask)] = False
fig=plt.gcf()
fig.set_size_inches(30,12)
sns.heatmap(data=cor_mat,mask=mask,square=True,annot=True,cbar=True)

.

Standardize the Variables

Because the KNN classifier predicts the class of a given test observation by identifying the observations that are nearest to it, the scale of the variables matters. Any variables that are on a large scale will have a much larger effect on the distance between the observations, and hence on the KNN classifier, than variables that are on a small scale.

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaler.fit(df.drop('severity',axis=1))
scaled_features = scaler.transform(df.drop('severity',axis=1))
df_feat = pd.DataFrame(scaled_features,columns=df.columns[:-1])
df_feat.head()

Training KNN Model

Let’s now begin to train out the regression model! We will need to first split up our data into an X array that contains the features to train on, and a y array with the target variable.

We split our data for test and train our regression. We use sklearn library for that. I use %30 for test my regression and %70 for train my regression.

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(scaled_features, df['severity'], test_size=0.3)

Creating and Training the Model

Remember that we are trying to come up with a model to predict whether the tumor will be benign or malignant. We’ll start with k=1.

from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors=1)
knn.fit(X_train,y_train)

Predictions and Evaluations

Now predict values for the testing data.

pred = knn.predict(X_test)

Making Confusion Matrix

Confusion Matrix is going to contain the correct predictions that our model made on the set as well as the incorrect predictions.

from sklearn.metrics import classification_report,confusion_matrix
cm = confusion_matrix(y_test,pred)
class_names=[0,1] # name  of classes
fig, ax = plt.subplots()
tick_marks = np.arange(len(class_names))
plt.xticks(tick_marks, class_names)
plt.yticks(tick_marks, class_names)
# create heatmap
sns.heatmap(pd.DataFrame(cm), annot=True, cmap="YlGnBu" ,fmt='g')
ax.xaxis.set_label_position("top")
plt.tight_layout()
plt.title('Confusion matrix', y=1.1)
plt.ylabel('Actual label')
plt.xlabel('Predicted label')
print(classification_report(y_test,pred))

135 and 91 are the correct predictions. In addition, 34 and 29 are the incorrect predictions. so we can see that we have quiet lot of correct predictions.

Also Read:  Customer Mall Segmentation

Correct Predictions : 135+91 = 226

Incorrect Predictions: 34+29 = 63

Create a classification report for the model.

The accuracy of the model is %78 !!!

Choosing a K Value

Let’s go ahead and use the elbow method to pick a good K Value:

error_rate = []

# Will take some time
for i in range(1,40):
    
    knn = KNeighborsClassifier(n_neighbors=i)
    knn.fit(X_train,y_train)
    pred_i = knn.predict(X_test)
    error_rate.append(np.mean(pred_i != y_test))
plt.figure(figsize=(10,6))
plt.plot(range(1,40),error_rate,color='blue', linestyle='dashed', marker='o',
         markerfacecolor='red', markersize=10)
plt.title('Error Rate vs. K Value')
plt.xlabel('K')
plt.ylabel('Error Rate')

The accuracy of the model with k=5 is %82 !

1031cookie-checkPrediction Of Tumor Severity

Leave a Reply

Your email address will not be published. Required fields are marked *