In this project, we are going to implement customer segmentation based on credit card usage behavior with two different approaches (K-means and Hierarchical Clustering)
The data contains the following columns:
CUST_ID :Identification of credit card holder (Categorical) BALANCE :Balance amount left in their account to make purchases BALANCE_FREQUENCY :How frequently the Balance is updated, score between 0 and 1 (1 = frequently updated, 0 = not frequently updated) PURCHASES :Amount of purchases made from account ONEOFF_PURCHASES :Maximum purchase amount donein one-go INSTALLMENTS_PURCHASES :Amount of purchase done in installment CASH_ADVANCE :Cash in advance given by the user PURCHASES_FREQUENCY :How frequently the purchases are being made,score between 0 and 1 (1 = frequently purchased, 0 = not frequently purchased) ONEOFFPURCHASESFREQUENCY :How frequently purchases are happening in one-go (1 = frequently purchased, 0 = not frequently purchased) PURCHASESINSTALLMENTSFREQUENCY :How frequently purchases in installments are being done (1 = frequently done, 0 = not frequently done) CASHADVANCEFREQUENCY :How frequently the cash in advance being paid CASHADVANCETRX :Number of transactions made with “Cash in Advanced” PURCHASES_TRX :Number of purchase transactions made CREDIT_LIMIT :Limit of credit card for user PAYMENTS :Amount of Payment done by user MINIMUM_PAYMENTS :Minimum amount of payments made by user PRCFULLPAYMENT :Percent of full payment paid by user TENURE :Tenure of credit card service for user
let’s get our environment ready with the libraries we’ll need and then import the data!
import numpy as np import pandas as pd import matplotlib.pyplot as plt import seaborn as sns %matplotlib inline plt.style.use('ggplot')
Check out the Data
df = pd.read_csv('~/DataSet GitHub/K-Means/CC GENERAL.csv') df.head()
Visualise the missing value
import missingno as msno msno.matrix(df)
fill the NaN value to mean value of the column
We don’t need customer id data for clustering.
df.drop(["CUST_ID"], axis = 1, inplace = True) df.head()
Let’s visualise the Correlation Map
f,ax = plt.subplots(figsize=(15, 15)) sns.heatmap(df.corr(), annot=True, linewidths=.5, fmt= '.1f',ax=ax)
from sklearn.preprocessing import StandardScaler standardscaler = StandardScaler() X = standardscaler.fit_transform(df)
Let’s Use the elbow method to find the optimal number of clusters
plt.figure(figsize=(10,6)) from sklearn.cluster import KMeans wcss =  for i in range(1, 15): kmeans = KMeans(n_clusters = i, init = 'k-means++', random_state = 42) kmeans.fit(X) wcss.append(kmeans.inertia_) plt.plot(range(1, 15), wcss) plt.title('The Elbow Method') plt.xlabel('Number of clusters') plt.ylabel('WCSS') plt.show()
Fitting K-Means to the dataset
Elbow point starting from 8
kmeans = KMeans(n_clusters = 8, init = 'k-means++', random_state = 42) y_kmeans = kmeans.fit_predict(X)
Let’s see the cluster for each customer
y_kmeans = kmeans.predict(X) df["cluster"] = y_kmeans df.head()
Plot data after k = 8 clustering on important columns
best_cols = ["BALANCE", "PURCHASES", "CASH_ADVANCE","CREDIT_LIMIT", "PAYMENTS", "MINIMUM_PAYMENTS"] kmeans = KMeans(n_clusters=8, init="k-means++", n_init=10, max_iter=300) best_vals = df[best_cols].iloc[ :, 1:].values y_pred = kmeans.fit_predict( best_vals ) df["cluster"] = y_pred best_cols.append("cluster") sns.pairplot( df[ best_cols ], hue="cluster")
Let’s use the dendogram to find the optimal number of cluster
plt.figure(figsize=(10,6)) import scipy.cluster.hierarchy as sch dendrogram = sch.dendrogram(sch.linkage(X, method = 'ward')) plt.title('Dendrogram') plt.xlabel('Customers') plt.ylabel('Euclidean distances') plt.show()
Fitting Hierarchical Clustering to the credit card usage dataset
from sklearn.cluster import AgglomerativeClustering hc = AgglomerativeClustering(n_clusters = 4, affinity = 'euclidean', linkage = 'ward')
Let’s see the cluster for each customer
y_hc = hc.fit_predict(X) df["cluster"] = y_hc df.head()
You may have heard the world is made up of atoms and molecules, but it’s really made up of stories.
I have been exploring for a bit for any high-quality articles or blog posts on this sort of space . Exploring in Yahoo I at last stumbled upon this site. Studying this info So i am happy to show that I’ve a very just right uncanny feeling I discovered exactly what I needed. I most indubitably will make sure to do not forget this website and give it a glance regularly.| а
*Hello! I just would like to give a huge thumbs up for the great info you have here on this post. I will be coming back to your blog for more soon.
Hello! Would you mind if I share your blog with my myspace group? There’s a lot of people that I think would really appreciate your content. Please let me know. Many thanks
Everything is very open with a very clear description of the issues. It was truly informative. Your website is useful. Many thanks for sharing!