In this project, we aim to predict the occurrence of diabetes within the PIMA Native American Group. We implemented the Decision Tree algorithm on Python.
The data contains the following columns:
- times_pregnant: Number of times pregnant
- plasma_glucose: Concentration of plasma glucose in a 2 hour oral glucose tolerance test
- diastolic_blood_pressure: Measured in mmHg
- tricep_skin_fold_thickness: Measured in mm
- serum_insulin: Insulin concentration in serum in 2-hour period. Measured in (mu U/ml)
- body_mass_index: Weight in kg/height in (m^2)
- diabetes_pedigree_function: Function that assigns probability of someone getting diabetes
- age: Years
- class: Predictor: the value of 0 or 1 correspond to no diabetes and diabetes
let’s get our environment ready with the libraries we’ll need and then import the data!
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
from sklearn.metrics import confusion_matrix
from sklearn import metrics
Check out the Data!
#importing the dataset
df = pd.read_csv('~/DataSet GitHub/Decision Tree/pima_native_american_diabetes_weka_dataset.csv')
df.head()

df.info()

Let’s check out the data summary!
plt.figure(figsize=(12,8))
sns.heatmap(df.describe()[1:].transpose(),
annot=True,linecolor="w",
linewidth=2,cmap=sns.color_palette("Set1"))
plt.title("Data summary")
plt.show()

.
Exploratory Data Analysis
Let’s check out the correlation between variables.
correlation = df.corr()
plt.figure(figsize=(10,8))
sns.heatmap(correlation,annot=True,
cmap=sns.color_palette("magma"),
linewidth=2,edgecolor="k")
plt.title("CORRELATION BETWEEN VARIABLES")
plt.show()

Let’s check out the Proportion of target variable in dataset!
plt.figure(figsize=(12,6))
plt.pie(df["class"].value_counts().values,
labels=["no diabets","diabets"],
autopct="%1.0f%%",wedgeprops={"linewidth":2,"edgecolor":"white"})
my_circ = plt.Circle((0,0),.7,color = "white")
plt.gca().add_artist(my_circ)
plt.subplots_adjust(wspace = .2)
plt.title("Proportion of target variable in dataset")
plt.show()

plt.figure(figsize=(12,6))
sns.scatterplot(data=df,x='age',y='times_pregnant',hue='class',cmap="Set2")
plt.legend(title='legend',loc='upper right', labels=['no diabets', 'diabets'])

For having a chance to get diabetes one should have times_pregnant=4.87, plasma_glucose=141.25, diastolic_blood_pressure= 70.82. If you get scores more than this then your chances of diabetes are likely.
df[(df['class'] ==1)].mean().reset_index()

Train Test Split
Split the data into a training set and a testing set
X = df.iloc[:,:-1]
Y = df.iloc[:,8]
#Splitting the data into training set and test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,Y, test_size = 0.25, random_state = 0)
Train a Model
Now it’s time to train a Decision Tree Classifier.
#fitting classifier to the training set
from sklearn.tree import DecisionTreeClassifier
classifier = DecisionTreeClassifier(criterion='entropy', random_state = 0)
classifier.fit(X_train,y_train)

Model Evaluation
Now get predictions from the model and create a confusion matrix and a classification report.
y_pred = classifier.predict(X_test)
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test,y_pred)
class_names=[0,1] # name of classes
fig, ax = plt.subplots()
tick_marks = np.arange(len(class_names))
plt.xticks(tick_marks, class_names)
plt.yticks(tick_marks, class_names)
# create heatmap
sns.heatmap(pd.DataFrame(cm), annot=True, cmap="BuPu" ,fmt='g')
ax.xaxis.set_label_position("top")
plt.tight_layout()
plt.title('Confusion matrix', y=1.1)
plt.ylabel('Actual label')
plt.xlabel('Predicted label')

105 and 44 are the correct predictions. In addition, 18 and 25 are
Correct Predictions : 105+44 = 149
Incorrect Predictions: 18+25 = 43
Create a classification report for the model.
from sklearn.metrics import classification_report
print(classification_report(y_test,y_pred))

The accuracy of the model is %77!
.
Tree Visualisation
from IPython.display import Image
from sklearn.externals.six import StringIO
from sklearn.tree import export_graphviz
import pydot
features = list(df.columns[:-1])
features

dot_data = StringIO()
export_graphviz(classifier, out_file=dot_data,feature_names=features,filled=True,rounded=True)
graph = pydot.graph_from_dot_data(dot_data.getvalue())
Image(graph[0].create_png())


You may have heard the world is made up of atoms and molecules, but it’s really made up of stories.
Comments
Right here is the perfect website for anybody who hopes to understand this topic.
You know so much its almost tough to argue with you (not that I really will need to…HaHa).
You certainly put a new spin on a subject that’s been written about for ages.
Wonderful stuff, just excellent!
Hello! Do you use Twitter? I’d like to follow you if that would be okay. I’m absolutely enjoying your blog and look forward to new posts.|
Great post however , I was wanting to know if you could write a litte more on this subject? I’d be very grateful if you could elaborate a little bit further. Kudos!|
Somebody necessarily help to make seriously posts I’d state. This is the first time I frequented your website page and to this point? I surprised with the analysis you made to create this particular post extraordinary. Excellent job!|
Hi, I wish for to subscribe for this blog to obtain newest updates, thus where can i do it please help out.|
This site was… how do you say it? Relevant!! Finally I have found something that helped me. Thanks!|
I really like what you guys are up too. This sort of clever work and exposure! Keep up the great works guys I’ve added you guys to my personal blogroll.|
Wonderful beat ! I would like to apprentice even as you amend your site, how could i subscribe for a weblog web site? The account aided me a acceptable deal. I had been a little bit familiar of this your broadcast provided vibrant transparent idea|
Good information. Lucky me I came across your site by chance (stumbleupon). I have bookmarked it for later!|
Hi there i am kavin, its my first occasion to commenting anyplace, when i read this post i thought i could also make comment due to this good piece of writing.|
I read this article fully regarding the resemblance of most up-to-date and previous technologies, it’s amazing article.|
Saved as a favorite, I really like your web site!|
This is a topic that’s close to my heart… Best wishes! Exactly where are your contact details though?|
Wonderful site. Lots of useful information here. I’m sending it to some friends ans also sharing in delicious. And certainly, thanks to your effort!|
I think this is among the most important info for me. And i’m glad reading your article. But should remark on some general things, The site style is wonderful, the articles is really great : D. Good job, cheers|
I’m not sure where you’re getting your information, but great topic. I needs to spend some time learning more or understanding more. Thanks for great info I was looking for this info for my mission.|
This is my first time visit at here and i am really happy to read all at alone place.|
Very good post. I definitely appreciate this site. Keep writing!|
It’s an awesome piece of writing for all the web people; they will get advantage from it I am sure.|
Hi! I’ve been reading your site for a while now and finally got the bravery to go ahead and give you a shout out from New Caney Tx! Just wanted to say keep up the great work!|
Pretty nice post. I just stumbled upon your blog and wished to say that I’ve truly enjoyed browsing your blog posts. After all I will be subscribing to your rss feed and I hope you write again soon!|
I think this is among the most important information for me. And i am glad reading your article. But should remark on few general things, The website style is ideal, the articles is really nice : D. Good job, cheers|
Very quickly this site will be famous among all blogging and site-building users, due to it’s fastidious articles|
This is really interesting, You’re a very skilled blogger. I have joined your rss feed and look forward to seeking more of your magnificent post. Also, I have shared your website in my social networks!|