Prediction of Shopping Behaviour

Stores are looking for new ways to promote their sale and increase their income. An increase can be found in cross-selling these days. Cross-selling is “an action or practice of selling an additional product or service to an existing customer”. It is important to understand how the products and services should be combined to increase their sale. It is the subject of a technique called Market Basket Analysis (MBA) or product association analysis. 

Market Basket Analysis (MBA) is trying to simply find the coincidences among various purchases and transactions simply it is looking for which item was bought with which? For example, in a foot-wear store, a shoe is often purchased with a pair of socks. 

One of the most problem in market basket analysis is that we have to choose the suitable item that it has high relation with the previous item. Many retailers will use this analysis to give a discount to their consumers and when our model doesn’t work properly, the lost would be very high for that retailer. 

In this project, We are looking for the correlation between the product sales in our dataset to create the related rules and make a model. The model then will be analysed based on support, confidence and lift parameters. 

The data contains 7500 data transactions of customers who had shopping in one of the famous stores in France.

let’s get our environment ready with the libraries we’ll need and then import the data!

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

Check out the Data

df = pd.read_csv('~/DataSet GitHub/Association Rules/Market_Basket_Optimisation.csv', header = None)
df.head()

Now we have transaction dataset, it demonstrates the matrix of products which have been purchased together. We cannot actually see how often they bought together and we cannot see the rules as well now. It will be shown later. 

Also Read:  Reinforcement Learning in Marketing Campaign

Let’s look at the frequency of most popular items.

plt.rcParams['figure.figsize'] = (18, 9)
color = plt.cm.copper(np.linspace(0, 1, 30))
df[0].value_counts().head(30).plot.bar(color=['red', 'blue', 'green', 'orange', 'cyan', 'purple', 'grey','pink','yellow'])
plt.title('frequency of most popular items', fontsize = 20)
plt.xticks(rotation = 90 , fontsize = 16)
plt.grid()
plt.show()

Data Preprocessing

transactions = []
for i in range(0, 7501):
    transactions.append([str(dataset.values[i,j]) for j in range(0, 20)])

Training Apriori on the dataset

from apyori import apriori
rules = apriori(transactions, min_support = 0.003, min_confidence = 0.2, min_lift = 3, min_length = 2)

I allocated 0.003 for the support, 0.2 for confidence and 3 for lift in order to create the rules. I recommend allocating 3 or higher than 3 for lift in your Association Rules model because the accuracy of your result will be better. 

.

Visualising the results

results = list(rules)

final_result = pd.DataFrame(columns=['Items bought','Likely','Support','Confidence','Lift'])
for row in range(0, len(results)):
    final_result = final_result.append({'Items bought':list(results[row].ordered_statistics[0].items_base),
                 'Likely':list(results[row].ordered_statistics[0].items_add),
                 'Support': "%.4f" % results[row].support,
                 'Confidence': "%.4f" % results[row].ordered_statistics[0].confidence,
                 'Lift': "%.4f" % results[row].ordered_statistics[0].lift}, ignore_index = True)
 
rowsdropped=0
for row in range(0, len(final_result)):
    for item in final_result['Items bought'][row]:
        if(item == 'nan'):
            final_result = final_result.drop(final_result.index[row-rowsdropped])
            rowsdropped += 1
            break
    
final_result = final_result.sort_values(final_result.columns[-1], ascending = False)
final_result = final_result.reset_index(drop = True)

We need to check the number of confidence and support in order to analyse the association rules model. Support is a criteria showing how often an item became visible in our data. Confidence is related to the number of times the if/then statements became true. 

As we can see in the result, the first rule shows 40% of people who bought the whole wheat pasta and mineral water also bought the olive oil. In addition, 27% of people who bought the mineral water, milk and frozen vegetable also tending to buy soup. Furthermore, 24% of people who bought the fromage blanc also bought the honey. 

1446cookie-checkPrediction of Shopping Behaviour

Leave a Reply

Your email address will not be published.