Credit Card Clustering

In this project, we are going to implement customer segmentation based on credit card usage behavior with two different approaches (K-means and Hierarchical Clustering)

The data contains the following columns:

  • CUST_ID : Identification of credit card holder (Categorical) 
  • BALANCE : Balance amount left in their account to make purchases
  • BALANCE_FREQUENCY : How frequently the Balance is updated, score between 0 and 1 (1 = frequently updated, 0 = not frequently updated) 
  • PURCHASES : Amount of purchases made from account 
  • ONEOFF_PURCHASES : Maximum purchase amount done in one-go 
  • INSTALLMENTS_PURCHASES : Amount of purchase done in installment 
  • CASH_ADVANCE : Cash in advance given by the user
  • PURCHASES_FREQUENCY : How frequently the purchases are being made, score between 0 and 1 (1 = frequently purchased, 0 = not frequently purchased) 
  • ONEOFFPURCHASESFREQUENCY : How frequently purchases are happening in one-go (1 = frequently purchased, 0 = not frequently purchased) 
  • PURCHASESINSTALLMENTSFREQUENCY : How frequently purchases in installments are being done (1 = frequently done, 0 = not frequently done) 
  • CASHADVANCEFREQUENCY : How frequently the cash in advance being paid 
  • CASHADVANCETRX : Number of transactions made with “Cash in Advanced” 
  • PURCHASES_TRX : Number of purchase transactions made 
  • CREDIT_LIMIT : Limit of credit card for user 
  • PAYMENTS : Amount of Payment done by user
  • MINIMUM_PAYMENTS : Minimum amount of payments made by user 
  • PRCFULLPAYMENT : Percent of full payment paid by user
  • TENURE : Tenure of credit card service for user

.

let’s get our environment ready with the libraries we’ll need and then import the data!

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
plt.style.use('ggplot')

Check out the Data

df = pd.read_csv('~/DataSet GitHub/K-Means/CC GENERAL.csv')
df.head()
df.info()

Visualise the missing value

import missingno as msno
msno.matrix(df)

fill the NaN value to mean value of the column

df=df.fillna(df.mean())
df.info()

We don’t need customer id data for clustering.

df.drop(["CUST_ID"], axis = 1, inplace = True)
df.head()

Let’s visualise the Correlation Map

f,ax = plt.subplots(figsize=(15, 15))
sns.heatmap(df.corr(), annot=True, linewidths=.5, fmt= '.1f',ax=ax)

.

Feature Scalling

from sklearn.preprocessing import StandardScaler
standardscaler = StandardScaler()
X = standardscaler.fit_transform(df)

K-Means Clustering

Let’s Use the elbow method to find the optimal number of clusters

plt.figure(figsize=(10,6))
from sklearn.cluster import KMeans
wcss = []
for i in range(1, 15):
    kmeans = KMeans(n_clusters = i, init = 'k-means++', random_state = 42)
    kmeans.fit(X)
    wcss.append(kmeans.inertia_)
plt.plot(range(1, 15), wcss)
plt.title('The Elbow Method')
plt.xlabel('Number of clusters')
plt.ylabel('WCSS')
plt.show()

Fitting K-Means to the dataset

Also Read:  Prediction of Breast Cancer Diagnosis

Elbow point starting from 8

kmeans = KMeans(n_clusters = 8, init = 'k-means++', random_state = 42)
y_kmeans = kmeans.fit_predict(X)
kmeans.cluster_centers_

Let’s see the cluster for each customer

y_kmeans = kmeans.predict(X)
df["cluster"] = y_kmeans
df.head()

Plot data after k = 8 clustering on important columns

best_cols = ["BALANCE", "PURCHASES", "CASH_ADVANCE","CREDIT_LIMIT", "PAYMENTS", "MINIMUM_PAYMENTS"]
kmeans = KMeans(n_clusters=8, init="k-means++", n_init=10, max_iter=300) 
best_vals = df[best_cols].iloc[ :, 1:].values
y_pred = kmeans.fit_predict( best_vals )

df["cluster"] = y_pred
best_cols.append("cluster")
sns.pairplot( df[ best_cols ], hue="cluster")

.

Hierarchical Clustering

Let’s use the dendogram to find the optimal number of cluster

plt.figure(figsize=(10,6))
import scipy.cluster.hierarchy as sch
dendrogram = sch.dendrogram(sch.linkage(X, method = 'ward'))
plt.title('Dendrogram')
plt.xlabel('Customers')
plt.ylabel('Euclidean distances')
plt.show()

Fitting Hierarchical Clustering to the credit card usage dataset

from sklearn.cluster import AgglomerativeClustering
hc = AgglomerativeClustering(n_clusters = 4, affinity = 'euclidean', linkage = 'ward')

Let’s see the cluster for each customer

y_hc = hc.fit_predict(X)
df["cluster"] = y_hc
df.head()
2965cookie-checkCredit Card Clustering

Comments

  1. гришина и.в математика алгебра 8 класс тесты саратов лицей

    I have been exploring for a bit for any high-quality articles or blog posts on this sort of space . Exploring in Yahoo I at last stumbled upon this site. Studying this info So i am happy to show that I’ve a very just right uncanny feeling I discovered exactly what I needed. I most indubitably will make sure to do not forget this website and give it a glance regularly.| а

Leave a Reply

Your email address will not be published.