Credit Card Clustering

In this project, we are going to implement customer segmentation based on credit card usage behavior with two different approaches (K-means and Hierarchical Clustering)

The data contains the following columns:

  • CUST_ID : Identification of credit card holder (Categorical) 
  • BALANCE : Balance amount left in their account to make purchases
  • BALANCE_FREQUENCY : How frequently the Balance is updated, score between 0 and 1 (1 = frequently updated, 0 = not frequently updated) 
  • PURCHASES : Amount of purchases made from account 
  • ONEOFF_PURCHASES : Maximum purchase amount done in one-go 
  • INSTALLMENTS_PURCHASES : Amount of purchase done in installment 
  • CASH_ADVANCE : Cash in advance given by the user
  • PURCHASES_FREQUENCY : How frequently the purchases are being made, score between 0 and 1 (1 = frequently purchased, 0 = not frequently purchased) 
  • ONEOFFPURCHASESFREQUENCY : How frequently purchases are happening in one-go (1 = frequently purchased, 0 = not frequently purchased) 
  • PURCHASESINSTALLMENTSFREQUENCY : How frequently purchases in installments are being done (1 = frequently done, 0 = not frequently done) 
  • CASHADVANCEFREQUENCY : How frequently the cash in advance being paid 
  • CASHADVANCETRX : Number of transactions made with “Cash in Advanced” 
  • PURCHASES_TRX : Number of purchase transactions made 
  • CREDIT_LIMIT : Limit of credit card for user 
  • PAYMENTS : Amount of Payment done by user
  • MINIMUM_PAYMENTS : Minimum amount of payments made by user 
  • PRCFULLPAYMENT : Percent of full payment paid by user
  • TENURE : Tenure of credit card service for user


let’s get our environment ready with the libraries we’ll need and then import the data!

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline'ggplot')

Check out the Data

df = pd.read_csv('~/DataSet GitHub/K-Means/CC GENERAL.csv')

Visualise the missing value

import missingno as msno

fill the NaN value to mean value of the column


We don’t need customer id data for clustering.

df.drop(["CUST_ID"], axis = 1, inplace = True)

Let’s visualise the Correlation Map

f,ax = plt.subplots(figsize=(15, 15))
sns.heatmap(df.corr(), annot=True, linewidths=.5, fmt= '.1f',ax=ax)


Feature Scalling

from sklearn.preprocessing import StandardScaler
standardscaler = StandardScaler()
X = standardscaler.fit_transform(df)

K-Means Clustering

Let’s Use the elbow method to find the optimal number of clusters

from sklearn.cluster import KMeans
wcss = []
for i in range(1, 15):
    kmeans = KMeans(n_clusters = i, init = 'k-means++', random_state = 42)
plt.plot(range(1, 15), wcss)
plt.title('The Elbow Method')
plt.xlabel('Number of clusters')

Fitting K-Means to the dataset

Also Read:  Prediction of Breast Cancer Diagnosis

Elbow point starting from 8

kmeans = KMeans(n_clusters = 8, init = 'k-means++', random_state = 42)
y_kmeans = kmeans.fit_predict(X)

Let’s see the cluster for each customer

y_kmeans = kmeans.predict(X)
df["cluster"] = y_kmeans

Plot data after k = 8 clustering on important columns

kmeans = KMeans(n_clusters=8, init="k-means++", n_init=10, max_iter=300) 
best_vals = df[best_cols].iloc[ :, 1:].values
y_pred = kmeans.fit_predict( best_vals )

df["cluster"] = y_pred
sns.pairplot( df[ best_cols ], hue="cluster")


Hierarchical Clustering

Let’s use the dendogram to find the optimal number of cluster

import scipy.cluster.hierarchy as sch
dendrogram = sch.dendrogram(sch.linkage(X, method = 'ward'))
plt.ylabel('Euclidean distances')

Fitting Hierarchical Clustering to the credit card usage dataset

from sklearn.cluster import AgglomerativeClustering
hc = AgglomerativeClustering(n_clusters = 4, affinity = 'euclidean', linkage = 'ward')

Let’s see the cluster for each customer

y_hc = hc.fit_predict(X)
df["cluster"] = y_hc
2965cookie-checkCredit Card Clustering


  1. гришина и.в математика алгебра 8 класс тесты саратов лицей

    I have been exploring for a bit for any high-quality articles or blog posts on this sort of space . Exploring in Yahoo I at last stumbled upon this site. Studying this info So i am happy to show that I’ve a very just right uncanny feeling I discovered exactly what I needed. I most indubitably will make sure to do not forget this website and give it a glance regularly.| а

Leave a Reply

Your email address will not be published.