Prediction Of Iris Species

The Iris flower data set is a multivariate data set introduced by the British statistician and biologist Ronald Fisher in his 1936 paper The use of multiple measurements in taxonomic problems. It is sometimes called Anderson’s Iris data set because Edgar Anderson collected the data to quantify the morphologic variation of Iris flowers of three related species. The data set consists of 50 samples from each of three species of Iris (Iris Setosa, Iris virginica, and Iris versicolor). Four features were measured from each sample: the length and the width of the sepals and petals, in centimeters.

.

Here’s a picture of the three different Iris types:

# The Iris Setosa
from IPython.display import Image
url = 'http://upload.wikimedia.org/wikipedia/commons/5/56/Kosaciec_szczecinkowaty_Iris_setosa.jpg'
Image(url,width=300, height=300)
The Iris Setosa
# The Iris Versicolor
from IPython.display import Image
url = 'http://upload.wikimedia.org/wikipedia/commons/4/41/Iris_versicolor_3.jpg'
Image(url,width=300, height=300)
The Iris Versicolor
# The Iris Virginica
from IPython.display import Image
url = 'http://upload.wikimedia.org/wikipedia/commons/9/9f/Iris_virginica.jpg'
Image(url,width=300, height=300)
The Iris Virginica

The iris dataset contains measurements for 150 Iris flowers from three different species.

The three classes in the Iris dataset:

  • Iris-setosa (n=50)
  • Iris-versicolor (n=50)
  • Iris-virginica (n=50)

The four features of the Iris dataset:

  • sepal length in cm
  • sepal width in cm
  • petal length in cm
  • petal width in cm

.

Get the data

Use seaborn to get the iris data

import seaborn as sns
df = sns.load_dataset('iris')
df.head()

.

Exploratory Data Analysis

Import some libraries we will need.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

plot on bases of petal!

sns.set_style("whitegrid")
sns.FacetGrid(df,hue="species",size=6)\
    .map(plt.scatter,"petal_length","petal_width")\
    .add_legend()
plt.show()

Create a pairplot of the data set. Which flower species seems to be the most separable?

# Setosa is the most separable. 
sns.set_style("whitegrid")
sns.pairplot(df,hue='species',palette="BuPu")
Setosa is the most separable

Create a kde plot of sepal_length versus sepal width for setosa species of flower.

setosa = df[df['species']=='setosa']
sns.set_style("whitegrid")
sns.kdeplot( setosa['sepal_width'], setosa['sepal_length'],
                 cmap="plasma", shade=True, shade_lowest=False)

.

Train Test Split

Split the data into a training set and a testing set

from sklearn.model_selection import train_test_split
X = df.drop('species',axis=1)
y = df['species']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30)

Train a Model

Now its time to train a Support Vector Machine Classifier. 

Also Read:  Credit Card Clustering

Call the SVC() model from sklearn and fit the model to the training data.

from sklearn.svm import SVC
svc_model = SVC()
svc_model.fit(X_train,y_train)

Model Evaluation

Now get predictions from the model and create a confusion matrix and a classification report.

pred = svc_model.predict(X_test)
from sklearn.metrics import classification_report,confusion_matrix
cm = confusion_matrix(y_test,pred)
class_names=[0,1] # name  of classes
fig, ax = plt.subplots()
tick_marks = np.arange(len(class_names))
plt.xticks(tick_marks, class_names)
plt.yticks(tick_marks, class_names)
# create heatmap
sns.heatmap(pd.DataFrame(cm), annot=True, cmap="BuPu" ,fmt='g')
ax.xaxis.set_label_position("top")
plt.tight_layout()
plt.title('Confusion matrix', y=1.1)
plt.ylabel('Actual label')
plt.xlabel('Predicted label')
print(classification_report(y_test,pred))

The Accuracy of the model is %95!

1158cookie-checkPrediction Of Iris Species

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *