In this project, we aim to predict whether the customers will leave the bank or not?. We implemented the Artificial Neural Network (ANN) on Python.
The data contains the following columns:
- RowNumber: The number of
row - CustomerId: Identity number
- Surname: Last Name
- CreditScore: The score of credit by
bank - Geography: Country or region
- Gender: Male or female
- Age: Customer age
- Tenure: Year in bank
- Balance: Amount in account
- NumOfProducts: How many accounts, bank account affiliated products the person has
HasCrCard : Do they havecredit card or not- IsActiveMember: Active with different functionalities with
bank like programs ,bonds,insurance etc. - EstimatedSalary: Salary estimated by
bank - Exited: Did they leave the bank after all?
This dataset contains 10000 bank customers information. Our goal is to create a job demographic segmentation model to tell the bank which of their customers at highest risk of leaving.
let’s get our environment ready with the libraries we’ll need and then import the data!
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
plt.style.use('seaborn-white')
Check out the Data!
df = pd.read_csv('~/DataSet GitHub/ANN/Churn_Modelling.csv')
df.head()

df.info()

.
Exploratory Data Analysis
Let’s check out the correlation between variables.
#corelation matrix.
cor_mat= df[:].corr()
mask = np.array(cor_mat)
mask[np.tril_indices_from(mask)] = False
fig=plt.gcf()
fig.set_size_inches(30,12)
sns.heatmap(data=cor_mat,mask=mask,square=True,annot=True,cbar=True)

g = sns.PairGrid(df[['CreditScore','Age','Tenure','Balance','NumOfProducts','EstimatedSalary','Exited']], hue = "Exited",palette='Set1')
g = g.map(plt.scatter).add_legend()

Let’s check out the Proportion of target variable in dataset!
explode = (0.1,0)
fig1, ax1 = plt.subplots(figsize=(12,7))
ax1.pie(df['Exited'].value_counts(), explode=explode,labels=['Retained','Exited'], autopct='%1.1f%%',
shadow=True)
# Equal aspect ratio ensures that pie is drawn as a circle
ax1.axis('equal')
plt.tight_layout()
plt.legend()
plt.show()

Let’s visualise the frequency of exited customer in each country
plt.figure(figsize=(12,6))
df.Geography.value_counts()
plt.title = "Exited Class Histogram"
plt.xlabel = "Geography"
plt.ylabel = "Frequency"
pd.value_counts(df['Geography']).plot.bar()

.
Encoding categorical data
Lets Encoding categorical (string-based) data.
- Country: there are 3 options: France, Spain
and Germany. - gender: there are 2 options: Male and Female.
This will convert those strings into scalar values for analysis
X = df.iloc[:,3:-1].values
y = df.iloc[:,13].values
from sklearn.preprocessing import OneHotEncoder, LabelEncoder
LabelEncoder_X_1 = LabelEncoder()
X[:,1] = LabelEncoder_X_1.fit_transform(X[:,1])
LabelEncoder_X_2 = LabelEncoder()
X[:,2] = LabelEncoder_X_2.fit_transform(X[:,2])
Onehotencoder = OneHotEncoder(categorical_features=[1])
X = Onehotencoder.fit_transform(X).toarray()
X = X[:,1:]
Train Test Split
Split the data into a training set and a testing set.
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size = 0.2, random_state = 0)
Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
.
Let’s make the ANN
First step is importing the keras libraries and packages.
import keras
from keras.models import Sequential
from keras.layers import Dense
Let’s Initialise the ANN.
classifier = Sequential()
Next step is adding the input layer and first hidden layer.
classifier.add(Dense(units = 6, kernel_initializer = 'uniform', activation = 'relu', input_dim=11))
And then adding the second hidden layer.
classifier.add(Dense(units = 6, kernel_initializer = 'uniform', activation = 'relu'))
this step is for adding the output layer.
classifier.add(Dense(units = 1, kernel_initializer = 'uniform', activation = 'sigmoid'))
And final step is Compiling the ANN.
classifier.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])
Let’s Fit the ANN to the training set.
classifier.fit(X_train, y_train, batch_size=10,nb_epoch = 100)

.
Making the prediction and evaluation of the model
Now get predictions from the model and create a confusion matrix and a classification report.
# predicting the test set result
y_pred = classifier.predict(X_test)
y_pred = (y_pred > 0.5)
# Making the confusion matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test,y_pred)
class_names=[0,1] # name of classes
fig, ax = plt.subplots()
tick_marks = np.arange(len(class_names))
plt.xticks(tick_marks, class_names)
plt.yticks(tick_marks, class_names)
# create heatmap
sns.heatmap(pd.DataFrame(cm), annot=True, cmap="BuPu" ,fmt='g')
ax.xaxis.set_label_position("top")
plt.tight_layout()
plt.title('Confusion matrix', y=1.1)
plt.ylabel('Actual label')
plt.xlabel('Predicted label')

1537 and 156 are the correct predictions. In addition, 249 and 58 are
Correct Predictions : 1537+156 = 1693
Incorrect Predictions: 249+58 = 307
Create a classification report for the model.
from sklearn.metrics import classification_report
print(classification_report(y_test,y_pred))

The accuracy of the model is %84.

You may have heard the world is made up of atoms and molecules, but it’s really made up of stories.