In this project, we aim to predict the intention of online shoppers. We implemented the Logistic Regression to create our model and Linear Discriminant Analysis to create dimensionality reduction in the dataset using Python.
The data contains the following columns:
- Administrative: Administrative Value
- Administrative_Duration: Duration in Administrative Page
- Informational: Informational Value
- Informational_Duration: Duration in Informational Page
- ProductRelated: Product Related Value
- ProductRelated_Duration: Duration in Product Related Page
- BounceRates: Bounce Rates of a web page
- ExitRates: Exit rate of a web page
PageValues : Page values of each web pageSpecialDay : Special days like valentine etcOperatingSystems : Operating system used- Browser: Browser used
- Month: Month of the year
- Region: Region of the user
- VisitorType: Types of Visitor
- Weekend: Weekend or not
- Revenue: Revenue will be generated or not
.
let’s get our environment ready with the libraries we’ll need and then import the data!
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
Check out the Data
df = pd.read_csv('~/DataSet GitHub/LDA/online_shoppers_intention.csv')
df.head()

df.info()

Let’s Visualise the missing data in the columns!
import missingno as msno
msno.matrix(df)

Next step is to drop null values in the dataset and get rid of them!
df=df.dropna() #We drop all NaN values.
Next step is to sort out the categorical columns. switch True/False to 1/0 in our Weekend and Revenue columns.
df.Weekend = df.Weekend.astype(int)
df.Revenue = df.Revenue.astype(int)
.
EDA
Let’s visualise the data summary
plt.figure(figsize=(12,8))
sns.heatmap(df.describe()[1:].transpose(),
annot=True,linecolor="w",
linewidth=2,cmap=sns.color_palette("muted"))
plt.title("Data summary")
plt.show()

Let’s see the frequency of revenue in our dataset.
plt.figure(figsize=(10,6))
print('No:',len(df[df.Revenue == 0]))
print('Yes:',len(df[df.Revenue == 1]))
y = len(df[df.Revenue == 0]),len(df[df.Revenue == 1])
x = ['No','Yes']
plt.bar(x,y,color = 'hotpink')
plt.show()

Let’s see the percentage of different visitors in the dataset!
plt.rcParams['figure.figsize'] = (20, 10)
size = [10551, 1694, 85]
colors = ['mediumseagreen', 'coral', 'yellow']
labels = "Returning Visitor", "New_Visitor", "Others"
explode = [0, 0, 0.1]
plt.subplot(1, 2, 1)
plt.pie(size, colors = colors, labels = labels, explode = explode, shadow = True, autopct = '%.2f%%')
plt.title('Different Visitors', fontsize = 20)
plt.axis('off')
plt.legend()

Let’s visualise weekend vs Revenue!
df = pd.crosstab(df['Weekend'], df['Revenue'])
df.div(df.sum(1).astype(float), axis = 0).plot(kind = 'bar', stacked = True, figsize = (16, 9), color = ['orangered', 'mediumaquamarine'])
plt.title('Weekend vs Revenue', fontsize = 15)
plt.show()

Next step is to sort out the categorical columns for preparing the data for Machine Learning
categorical = ['VisitorType','Month']
df = pd.get_dummies(df,columns = categorical,drop_first=True)
df.head()
df.columns

.
Training Logistic Regression
Let’s now begin to train the Logistic Regression model! We will need to first split up our data into an X array that contains the features to train on, and a y array with the target variable, in this case, the Revenue column.
X = df.drop('Revenue',axis=1)
y = df['Revenue']
Train Test Split
Now let’s split the data into a training set and a testing set. We will train our model on the training set and then use the test set to evaluate the model.
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size = 0.2, random_state = 0)
Feature Scalling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
Linear Discriminant Analysis
Let’s use LDA to reduce the dimensionality of the data for getting better accuracy.
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
lda = LinearDiscriminantAnalysis(n_components = 2)
X_train = lda.fit_transform(X_train, y_train)
X_test = lda.transform(X_test)
Building Logistic Regression model
from sklearn.linear_model import LogisticRegression
classifier = LogisticRegression(random_state = 0)
classifier.fit(X_train, y_train)

.
Predictions and Evaluations
Now predict values for the testing data.
# Predicting the test set result
y_pred = classifier.predict(X_test)
Making Confusion Matrix
Confusion Matrix is going to contain the correct predictions that our model made on the set as well as the incorrect predictions.
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test,y_pred)
class_names=[0,1] # name of classes
fig, ax = plt.subplots()
tick_marks = np.arange(len(class_names))
plt.xticks(tick_marks, class_names)
plt.yticks(tick_marks, class_names)
# create heatmap
sns.heatmap(pd.DataFrame(cm), annot=True, cmap="Set3" ,fmt='g')
ax.xaxis.set_label_position("top")
plt.tight_layout()
plt.title('Confusion matrix', y=1.1)
plt.ylabel('Actual label')
plt.xlabel('Predicted label')

2042 and 151 are the correct predictions. In addition, 46 and 225 are
Correct Predictions : 2042+151 = 2193
Incorrect Predictions: 46+225 = 271
Create a classification report for the model.
from sklearn.metrics import classification_report
print(classification_report(y_test,y_pred))

For the final step let’s see the accuracy of the model
from sklearn import metrics
print("Accuracy:",metrics.accuracy_score(y_test, y_pred))


You may have heard the world is made up of atoms and molecules, but it’s really made up of stories.
Comments
There is certainly a lot to find out about this subject. I love all of the points you ave made.
You’re so interesting! I don’t suppose I have read anything like that before. So good to discover another person with unique thoughts on this subject. Seriously.. thanks for starting this up. This web site is one thing that is needed on the internet, someone with a little originality!
Everything is very open with a clear clarification of the issues. It was truly informative. Your site is extremely helpful. Thanks for sharing!