Prediction of Churn For Bank Customers

In this project, we aim to predict whether the customers will leave the bank or not?. We implemented the Artificial Neural Network (ANN) on Python.

The data contains the following columns:

  • RowNumber: The number of row
  • CustomerId: Identity number
  • Surname: Last Name
  • CreditScore: The score of credit by bank
  • Geography: Country or region
  • Gender: Male or female
  • Age: Customer age
  • Tenure: Year in bank
  • Balance: Amount in account
  • NumOfProducts: How many accounts, bank account affiliated products the person has
  • HasCrCard: Do they have credit card or not
  • IsActiveMember: Active with different functionalities with bank like programs ,bonds,insurance etc.
  • EstimatedSalary: Salary estimated by bank
  • Exited: Did they leave the bank after all?

This dataset contains 10000 bank customers information. Our goal is to create a job demographic segmentation model to tell the bank which of their customers at highest risk of leaving.

let’s get our environment ready with the libraries we’ll need and then import the data!

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns'seaborn-white')

Check out the Data!

df = pd.read_csv('~/DataSet GitHub/ANN/Churn_Modelling.csv')


Exploratory Data Analysis

Let’s check out the correlation between variables.

#corelation matrix.
cor_mat= df[:].corr()
mask = np.array(cor_mat)
mask[np.tril_indices_from(mask)] = False
g = sns.PairGrid(df[['CreditScore','Age','Tenure','Balance','NumOfProducts','EstimatedSalary','Exited']], hue = "Exited",palette='Set1')
g =

Let’s check out the Proportion of target variable in dataset!

explode = (0.1,0)  
fig1, ax1 = plt.subplots(figsize=(12,7))
ax1.pie(df['Exited'].value_counts(), explode=explode,labels=['Retained','Exited'], autopct='%1.1f%%',
# Equal aspect ratio ensures that pie is drawn as a circle

Let’s visualise the frequency of exited customer in each country

plt.title = "Exited Class Histogram"
plt.xlabel = "Geography"
plt.ylabel = "Frequency"


Encoding categorical data

Lets Encoding categorical (string-based) data.

  • Country: there are 3 options: France, Spain and Germany. 
  • gender: there are 2 options: Male and Female.

This will convert those strings into scalar values for analysis

X = df.iloc[:,3:-1].values
y = df.iloc[:,13].values
from sklearn.preprocessing import OneHotEncoder, LabelEncoder
LabelEncoder_X_1 = LabelEncoder()
X[:,1] = LabelEncoder_X_1.fit_transform(X[:,1])
LabelEncoder_X_2 = LabelEncoder()
X[:,2] = LabelEncoder_X_2.fit_transform(X[:,2])
Onehotencoder = OneHotEncoder(categorical_features=[1])
X = Onehotencoder.fit_transform(X).toarray()
X = X[:,1:]

Train Test Split

Split the data into a training set and a testing set.

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size = 0.2, random_state = 0)

Feature Scaling

from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)


Let’s make the ANN

First step is importing the keras libraries and packages.

import keras
from keras.models import Sequential
from keras.layers import Dense

Let’s Initialise the ANN.

classifier = Sequential()

Next step is adding the input layer and first hidden layer.

classifier.add(Dense(units = 6, kernel_initializer = 'uniform', activation = 'relu', input_dim=11))

And then adding the second hidden layer.

classifier.add(Dense(units = 6, kernel_initializer = 'uniform', activation = 'relu'))

this step is for adding the output layer.

classifier.add(Dense(units = 1, kernel_initializer = 'uniform', activation = 'sigmoid'))

And final step is Compiling the ANN.

classifier.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])

Let’s Fit the ANN to the training set., y_train, batch_size=10,nb_epoch = 100)


Making the prediction and evaluation of the model

Now get predictions from the model and create a confusion matrix and a classification report.

# predicting the test set result
y_pred = classifier.predict(X_test)
y_pred = (y_pred > 0.5)
# Making the confusion matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test,y_pred)
class_names=[0,1] # name  of classes
fig, ax = plt.subplots()
tick_marks = np.arange(len(class_names))
plt.xticks(tick_marks, class_names)
plt.yticks(tick_marks, class_names)
# create heatmap
sns.heatmap(pd.DataFrame(cm), annot=True, cmap="BuPu" ,fmt='g')
plt.title('Confusion matrix', y=1.1)
plt.ylabel('Actual label')
plt.xlabel('Predicted label')

1537 and 156 are the correct predictions. In addition, 249 and 58 are theincorrect predictions. so we can see that we have quiet lot of correct predictions.

Also Read:  Prediction of Cats Vs Dogs

Correct Predictions : 1537+156 = 1693

Incorrect Predictions: 249+58 = 307

Create a classification report for the model.

from sklearn.metrics import classification_report

The accuracy of the model is %84.

2766cookie-checkPrediction of Churn For Bank Customers

Leave a Reply

Your email address will not be published. Required fields are marked *