Customer Mall Segmentation

You have a supermarket mall and through membership cards, you have some basic data about your customers like Customer ID, age, gender, annual income and spending score. Spending Score is something you assign to the customer based on your defined parameters like customer behavior and purchasing data.

By the end of this case study, you would be able to answer the below questions. 1- How to achieve customer segmentation using Machine Learning algorithm (Hierarchical Clustering) in Python in the simplest way. 2- Who are your target customers with whom you can start marketing strategy

The data contains the following columns:

  • CustomerID: Unique ID assigned to the customer
  • Gender: Gender of the customer
  • Age: Age of the customer
  • Annual Income (k$): Annual Income of the customer
  • Spending Score (1-100): Score assigned by the mall based on customer behavior and spending nature

let’s get our environment ready with the libraries we’ll need and then import the data!

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
plt.style.use('fivethirtyeight')
%matplotlib inline

Check out the Data

df = pd.read_csv('~/DataSet GitHub/Hierarchical Clustering/Mall_Customers.csv')
df.head(7)
df.info()

.

Exploratory Data Analysis

Plotting the Relation between Age , Annual Income and Spending Score

sns.set_palette("Set1",10)
plt.figure(1 , figsize = (15 , 7))
n = 0 
for x in ['Age' , 'Annual Income (k$)' , 'Spending Score (1-100)']:
    for y in ['Age' , 'Annual Income (k$)' , 'Spending Score (1-100)']:
        n += 1
        plt.subplot(3 , 3 , n)
        plt.subplots_adjust(hspace = 0.5 , wspace = 0.5)
        sns.regplot(x = x , y = y , data = df)
        plt.ylabel(y.split()[0]+' '+y.split()[1] if len(y.split()) > 1 else y )
plt.show()

Let’s visualise the frequency of each gender in the dataset

plt.figure(figsize = (8, 8))
sns.set_palette("Set2",7)
ax = sns.countplot(df['Gender'],label="Count") 
B, M = df['Gender'].value_counts()

Let’s visualise the distribution of annual income and ages of customers

plt.rcParams['figure.figsize'] = (18, 8)

plt.subplot(1, 2, 1)
sns.set(style = 'whitegrid')
sns.distplot(df['Annual Income (k$)'])
plt.title('Distribution of Annual Income', fontsize = 20)
plt.xlabel('Range of Annual Income')
plt.ylabel('Count')


plt.subplot(1, 2, 2)
sns.set(style = 'whitegrid')
sns.distplot(df['Age'], color = 'red')
plt.title('Distribution of Age', fontsize = 20)
plt.xlabel('Range of Age')
plt.ylabel('Count')
plt.show()
plt.figure(1 , figsize = (15 , 8))
for gender in ['Male' , 'Female']:
    plt.scatter(x = 'Age' , y = 'Annual Income (k$)' , data = df[df['Gender'] == gender] ,
                s = 200 , alpha = 0.5 , label = gender)
plt.xlabel('Age'), plt.ylabel('Annual Income (k$)') 
plt.title('Age vs Annual Income w.r.t Gender')
plt.legend()
plt.show()

Lets see the Pairplot of the dataset

plt.figure(figsize = (8, 8))
sns.set_palette("pastel",40)
sns.pairplot(df,)
plt.show()

.

Hierarchical Clustering

In the clustering stage we just need the annual income and spending score of the customers

X = df.iloc[:,[3,4]].values

using the dendogram to find the optimal number of cluster

import scipy.cluster.hierarchy as sch
plt.figure(figsize = (15, 10))
dendrogram = sch.dendrogram(sch.linkage(X, method = 'ward'))
plt.title('Dendrogram')
plt.xlabel('Customers')
plt.ylabel('Euclidean Distances')
plt.show()

Fitting Hierarchical Clustering to the mall dataset

from sklearn.cluster import AgglomerativeClustering
hc = AgglomerativeClustering(n_clusters=5, affinity='euclidean', linkage='ward')
y_hc = hc.fit_predict(X)

Visualising The Clusters

plt.figure(figsize = (15, 10))
plt.scatter(X[y_hc == 0, 0], X[y_hc == 0, 1], s = 100, c = 'red', label = 'Cluster 1')
plt.scatter(X[y_hc == 1, 0], X[y_hc == 1, 1], s = 100, c = 'blue', label = 'Cluster 2')
plt.scatter(X[y_hc == 2, 0], X[y_hc == 2, 1], s = 100, c = 'green', label = 'Cluster 3')
plt.scatter(X[y_hc == 3, 0], X[y_hc == 3, 1], s = 100, c = 'cyan', label = 'Cluster 4')
plt.scatter(X[y_hc == 4, 0], X[y_hc == 4, 1], s = 100, c = 'magenta', label = 'Cluster 5')
plt.title('Clusters of customers')
plt.xlabel('Annual Income (k$)')
plt.ylabel('Spending Score (1-100)')
plt.legend()
plt.show()
993cookie-checkCustomer Mall Segmentation

Leave a Reply

Your email address will not be published.