In this project, we are going to implement customer segmentation based on credit card usage behavior with two different approaches (K-means and Hierarchical Clustering)
The data contains the following columns:
CUST_ID : Identification of credit card holder (Categorical)BALANCE : Balance amount left in their account to make purchasesBALANCE_FREQUENCY : How frequently the Balance is updated, score between 0 and 1 (1 = frequently updated, 0 = not frequently updated)PURCHASES : Amount of purchases made fromaccount ONEOFF_PURCHASES : Maximum purchase amountdone in one-goINSTALLMENTS_PURCHASES : Amount of purchase done in installmentCASH_ADVANCE : Cash in advance given by the userPURCHASES_FREQUENCY : How frequently the purchases are beingmade, score between 0 and 1 (1 = frequently purchased, 0 = not frequently purchased)ONEOFFPURCHASESFREQUENCY : How frequently purchases are happening in one-go (1 = frequently purchased, 0 = not frequently purchased)PURCHASESINSTALLMENTSFREQUENCY : How frequently purchases in installments are being done (1 = frequently done, 0 = not frequently done)CASHADVANCEFREQUENCY : How frequently the cash in advance being paidCASHADVANCETRX : Number of transactions made with “Cash in Advanced”PURCHASES_TRX : Number of purchase transactions madeCREDIT_LIMIT : Limit of credit card foruser PAYMENTS : Amount of Payment done byuser MINIMUM_PAYMENTS : Minimum amount of payments made byuser PRCFULLPAYMENT : Percent of full payment paid byuser TENURE : Tenure of credit card service for user
.
let’s get our environment ready with the libraries we’ll need and then import the data!
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
plt.style.use('ggplot')
Check out the Data
df = pd.read_csv('~/DataSet GitHub/K-Means/CC GENERAL.csv')
df.head()

df.info()

Visualise the missing value
import missingno as msno
msno.matrix(df)

fill the NaN value to mean value of the column
df=df.fillna(df.mean())
df.info()

We don’t need customer id data for clustering.
df.drop(["CUST_ID"], axis = 1, inplace = True)
df.head()

Let’s visualise the Correlation Map
f,ax = plt.subplots(figsize=(15, 15))
sns.heatmap(df.corr(), annot=True, linewidths=.5, fmt= '.1f',ax=ax)

.
Feature Scalling
from sklearn.preprocessing import StandardScaler
standardscaler = StandardScaler()
X = standardscaler.fit_transform(df)
K-Means Clustering
Let’s Use the elbow method to find the optimal number of clusters
plt.figure(figsize=(10,6))
from sklearn.cluster import KMeans
wcss = []
for i in range(1, 15):
kmeans = KMeans(n_clusters = i, init = 'k-means++', random_state = 42)
kmeans.fit(X)
wcss.append(kmeans.inertia_)
plt.plot(range(1, 15), wcss)
plt.title('The Elbow Method')
plt.xlabel('Number of clusters')
plt.ylabel('WCSS')
plt.show()

Fitting K-Means to the dataset
Elbow point starting from 8
kmeans = KMeans(n_clusters = 8, init = 'k-means++', random_state = 42)
y_kmeans = kmeans.fit_predict(X)
kmeans.cluster_centers_

Let’s see the cluster for each customer
y_kmeans = kmeans.predict(X)
df["cluster"] = y_kmeans
df.head()

Plot data after k = 8 clustering on important columns
best_cols = ["BALANCE", "PURCHASES", "CASH_ADVANCE","CREDIT_LIMIT", "PAYMENTS", "MINIMUM_PAYMENTS"]
kmeans = KMeans(n_clusters=8, init="k-means++", n_init=10, max_iter=300)
best_vals = df[best_cols].iloc[ :, 1:].values
y_pred = kmeans.fit_predict( best_vals )
df["cluster"] = y_pred
best_cols.append("cluster")
sns.pairplot( df[ best_cols ], hue="cluster")

.
Hierarchical Clustering
Let’s use the dendogram to find the optimal number of cluster
plt.figure(figsize=(10,6))
import scipy.cluster.hierarchy as sch
dendrogram = sch.dendrogram(sch.linkage(X, method = 'ward'))
plt.title('Dendrogram')
plt.xlabel('Customers')
plt.ylabel('Euclidean distances')
plt.show()

Fitting Hierarchical Clustering to the credit card usage dataset
from sklearn.cluster import AgglomerativeClustering
hc = AgglomerativeClustering(n_clusters = 4, affinity = 'euclidean', linkage = 'ward')
Let’s see the cluster for each customer
y_hc = hc.fit_predict(X)
df["cluster"] = y_hc
df.head()


You may have heard the world is made up of atoms and molecules, but it’s really made up of stories.
Comments
I have been exploring for a bit for any high-quality articles or blog posts on this sort of space . Exploring in Yahoo I at last stumbled upon this site. Studying this info So i am happy to show that I’ve a very just right uncanny feeling I discovered exactly what I needed. I most indubitably will make sure to do not forget this website and give it a glance regularly.| а
*Hello! I just would like to give a huge thumbs up for the great info you have here on this post. I will be coming back to your blog for more soon.
Hello! Would you mind if I share your blog with my myspace group? There’s a lot of people that I think would really appreciate your content. Please let me know. Many thanks
Everything is very open with a very clear description of the issues. It was truly informative. Your website is useful. Many thanks for sharing!