# Customer Mall Segmentation

You have a supermarket mall and through membership cards, you have some basic data about your customers like Customer ID, age, gender, annual income and spending score. Spending Score is something you assign to the customer based on your defined parameters like customer behavior and purchasing data.

By the end of this case study, you would be able to answer the below questions. 1- How to achieve customer segmentation using Machine Learning algorithm (Hierarchical Clustering) in Python in the simplest way. 2- Who are your target customers with whom you can start marketing strategy

The data contains the following columns:

• CustomerID: Unique ID assigned to the customer
• Gender: Gender of the customer
• Age: Age of the customer
• Annual Income (k\$): Annual Income of the customer
• Spending Score (1-100): Score assigned by the mall based on customer behavior and spending nature

let’s get our environment ready with the libraries we’ll need and then import the data!

``````import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
plt.style.use('fivethirtyeight')
%matplotlib inline``````

Check out the Data

``````df = pd.read_csv('~/DataSet GitHub/Hierarchical Clustering/Mall_Customers.csv')
``df.info()``

.

### Exploratory Data Analysis

Plotting the Relation between Age , Annual Income and Spending Score

``````sns.set_palette("Set1",10)
plt.figure(1 , figsize = (15 , 7))
n = 0
for x in ['Age' , 'Annual Income (k\$)' , 'Spending Score (1-100)']:
for y in ['Age' , 'Annual Income (k\$)' , 'Spending Score (1-100)']:
n += 1
plt.subplot(3 , 3 , n)
plt.subplots_adjust(hspace = 0.5 , wspace = 0.5)
sns.regplot(x = x , y = y , data = df)
plt.ylabel(y.split()+' '+y.split() if len(y.split()) > 1 else y )
plt.show()``````

Let’s visualise the frequency of each gender in the dataset

``````plt.figure(figsize = (8, 8))
sns.set_palette("Set2",7)
ax = sns.countplot(df['Gender'],label="Count")
B, M = df['Gender'].value_counts()``````

Let’s visualise the distribution of annual income and ages of customers

``````plt.rcParams['figure.figsize'] = (18, 8)

plt.subplot(1, 2, 1)
sns.set(style = 'whitegrid')
sns.distplot(df['Annual Income (k\$)'])
plt.title('Distribution of Annual Income', fontsize = 20)
plt.xlabel('Range of Annual Income')
plt.ylabel('Count')

plt.subplot(1, 2, 2)
sns.set(style = 'whitegrid')
sns.distplot(df['Age'], color = 'red')
plt.title('Distribution of Age', fontsize = 20)
plt.xlabel('Range of Age')
plt.ylabel('Count')
plt.show()``````
``````plt.figure(1 , figsize = (15 , 8))
for gender in ['Male' , 'Female']:
plt.scatter(x = 'Age' , y = 'Annual Income (k\$)' , data = df[df['Gender'] == gender] ,
s = 200 , alpha = 0.5 , label = gender)
plt.xlabel('Age'), plt.ylabel('Annual Income (k\$)')
plt.title('Age vs Annual Income w.r.t Gender')
plt.legend()
plt.show()``````

Lets see the Pairplot of the dataset

``````plt.figure(figsize = (8, 8))
sns.set_palette("pastel",40)
sns.pairplot(df,)
plt.show()``````

.

### Hierarchical Clustering

In the clustering stage we just need the annual income and spending score of the customers

``X = df.iloc[:,[3,4]].values``

using the dendogram to find the optimal number of cluster

``````import scipy.cluster.hierarchy as sch
plt.figure(figsize = (15, 10))
dendrogram = sch.dendrogram(sch.linkage(X, method = 'ward'))
plt.title('Dendrogram')
plt.xlabel('Customers')
plt.ylabel('Euclidean Distances')
plt.show()
``````

Fitting Hierarchical Clustering to the mall dataset

``````from sklearn.cluster import AgglomerativeClustering
y_hc = hc.fit_predict(X)``````

Visualising The Clusters

``````plt.figure(figsize = (15, 10))
plt.scatter(X[y_hc == 0, 0], X[y_hc == 0, 1], s = 100, c = 'red', label = 'Cluster 1')
plt.scatter(X[y_hc == 1, 0], X[y_hc == 1, 1], s = 100, c = 'blue', label = 'Cluster 2')
plt.scatter(X[y_hc == 2, 0], X[y_hc == 2, 1], s = 100, c = 'green', label = 'Cluster 3')
plt.scatter(X[y_hc == 3, 0], X[y_hc == 3, 1], s = 100, c = 'cyan', label = 'Cluster 4')
plt.scatter(X[y_hc == 4, 0], X[y_hc == 4, 1], s = 100, c = 'magenta', label = 'Cluster 5')
plt.title('Clusters of customers')
plt.xlabel('Annual Income (k\$)')
plt.ylabel('Spending Score (1-100)')
plt.legend()
plt.show()``````