# Credit Card Clustering

In this project, we are going to implement customer segmentation based on credit card usage behavior with two different approaches (K-means and Hierarchical Clustering)

The data contains the following columns:

• CUST_ID : Identification of credit card holder (Categorical)
• BALANCE : Balance amount left in their account to make purchases
• BALANCE_FREQUENCY : How frequently the Balance is updated, score between 0 and 1 (1 = frequently updated, 0 = not frequently updated)
• PURCHASES : Amount of purchases made from account
• ONEOFF_PURCHASES : Maximum purchase amount done in one-go
• INSTALLMENTS_PURCHASES : Amount of purchase done in installment
• CASH_ADVANCE : Cash in advance given by the user
• PURCHASES_FREQUENCY : How frequently the purchases are being made, score between 0 and 1 (1 = frequently purchased, 0 = not frequently purchased)
• ONEOFFPURCHASESFREQUENCY : How frequently purchases are happening in one-go (1 = frequently purchased, 0 = not frequently purchased)
• PURCHASESINSTALLMENTSFREQUENCY : How frequently purchases in installments are being done (1 = frequently done, 0 = not frequently done)
• CASHADVANCEFREQUENCY : How frequently the cash in advance being paid
• CASHADVANCETRX : Number of transactions made with “Cash in Advanced”
• PURCHASES_TRX : Number of purchase transactions made
• CREDIT_LIMIT : Limit of credit card for user
• PAYMENTS : Amount of Payment done by user
• MINIMUM_PAYMENTS : Minimum amount of payments made by user
• PRCFULLPAYMENT : Percent of full payment paid by user
• TENURE : Tenure of credit card service for user

.

let’s get our environment ready with the libraries we’ll need and then import the data!

``````import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
plt.style.use('ggplot')``````

Check out the Data

``````df = pd.read_csv('~/DataSet GitHub/K-Means/CC GENERAL.csv')
``df.info()``

Visualise the missing value

``````import missingno as msno
msno.matrix(df)``````

fill the NaN value to mean value of the column

``````df=df.fillna(df.mean())
df.info()``````

We don’t need customer id data for clustering.

``````df.drop(["CUST_ID"], axis = 1, inplace = True)

Let’s visualise the Correlation Map

``````f,ax = plt.subplots(figsize=(15, 15))
sns.heatmap(df.corr(), annot=True, linewidths=.5, fmt= '.1f',ax=ax)``````

.

### Feature Scalling

``````from sklearn.preprocessing import StandardScaler
standardscaler = StandardScaler()
X = standardscaler.fit_transform(df)``````

### K-Means Clustering

Let’s Use the elbow method to find the optimal number of clusters

``````plt.figure(figsize=(10,6))
from sklearn.cluster import KMeans
wcss = []
for i in range(1, 15):
kmeans = KMeans(n_clusters = i, init = 'k-means++', random_state = 42)
kmeans.fit(X)
wcss.append(kmeans.inertia_)
plt.plot(range(1, 15), wcss)
plt.title('The Elbow Method')
plt.xlabel('Number of clusters')
plt.ylabel('WCSS')
plt.show()``````

Fitting K-Means to the dataset

Also Read:  Prediction of Tomorrow Rain in Australia

Elbow point starting from 8

``````kmeans = KMeans(n_clusters = 8, init = 'k-means++', random_state = 42)
y_kmeans = kmeans.fit_predict(X)``````
``kmeans.cluster_centers_``

Let’s see the cluster for each customer

``````y_kmeans = kmeans.predict(X)
df["cluster"] = y_kmeans

Plot data after k = 8 clustering on important columns

``````best_cols = ["BALANCE", "PURCHASES", "CASH_ADVANCE","CREDIT_LIMIT", "PAYMENTS", "MINIMUM_PAYMENTS"]
kmeans = KMeans(n_clusters=8, init="k-means++", n_init=10, max_iter=300)
best_vals = df[best_cols].iloc[ :, 1:].values
y_pred = kmeans.fit_predict( best_vals )

df["cluster"] = y_pred
best_cols.append("cluster")
sns.pairplot( df[ best_cols ], hue="cluster")``````

.

### Hierarchical Clustering

Let’s use the dendogram to find the optimal number of cluster

``````plt.figure(figsize=(10,6))
import scipy.cluster.hierarchy as sch
dendrogram = sch.dendrogram(sch.linkage(X, method = 'ward'))
plt.title('Dendrogram')
plt.xlabel('Customers')
plt.ylabel('Euclidean distances')
plt.show()
``````

Fitting Hierarchical Clustering to the credit card usage dataset

``````from sklearn.cluster import AgglomerativeClustering
hc = AgglomerativeClustering(n_clusters = 4, affinity = 'euclidean', linkage = 'ward')``````

Let’s see the cluster for each customer

``````y_hc = hc.fit_predict(X)
df["cluster"] = y_hc

1. 2. jalowkicielne

*Hello! I just would like to give a huge thumbs up for the great info you have here on this post. I will be coming back to your blog for more soon.

3. Crave Freebies

Hello! Would you mind if I share your blog with my myspace group? There’s a lot of people that I think would really appreciate your content. Please let me know. Many thanks

4. Simple Wifi Profits Review

Everything is very open with a very clear description of the issues. It was truly informative. Your website is useful. Many thanks for sharing!