Stores are looking for new ways to promote their sale and increase their income. An increase can be found in cross-selling these days. Cross-selling is “an action or practice of selling an additional product or service to an existing customer”. It is important to understand how the products and services should be combined to increase their sale. It is the subject of a technique called Market Basket Analysis (MBA) or product association analysis.
Market Basket Analysis (MBA) is trying to simply find the coincidences among various purchases and transactions simply it is looking for which item was bought with which? For example, in a foot-wear store, a shoe is often purchased with a pair of socks.
One of the most problem in market basket analysis is that we have to choose the suitable item that it has high relation with the previous item. Many retailers will use this analysis to give a discount to their consumers and when our model doesn’t work properly, the lost would be very high for that retailer.
In this project, We are looking for the correlation between the product sales in our dataset to create the related rules and make a model. The model then will be analysed based on support, confidence and lift parameters.
The data contains 7500 data transactions of customers who had shopping in one of the famous stores in France.
let’s get our environment ready with the libraries we’ll need and then import the data!
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
Check out the Data
df = pd.read_csv('~/DataSet GitHub/Association Rules/Market_Basket_Optimisation.csv', header = None)
df.head()

Now we have transaction dataset, it demonstrates the matrix of products which have been purchased together. We cannot actually see how often they bought together and we cannot see the rules as well now. It will be shown later.
Let’s look at the frequency of most popular items.
plt.rcParams['figure.figsize'] = (18, 9)
color = plt.cm.copper(np.linspace(0, 1, 30))
df[0].value_counts().head(30).plot.bar(color=['red', 'blue', 'green', 'orange', 'cyan', 'purple', 'grey','pink','yellow'])
plt.title('frequency of most popular items', fontsize = 20)
plt.xticks(rotation = 90 , fontsize = 16)
plt.grid()
plt.show()

Data Preprocessing
transactions = []
for i in range(0, 7501):
transactions.append([str(dataset.values[i,j]) for j in range(0, 20)])
Training Apriori on the dataset
from apyori import apriori
rules = apriori(transactions, min_support = 0.003, min_confidence = 0.2, min_lift = 3, min_length = 2)
I allocated 0.003 for the support, 0.2 for confidence and 3 for
.
Visualising the results
results = list(rules)
final_result = pd.DataFrame(columns=['Items bought','Likely','Support','Confidence','Lift'])
for row in range(0, len(results)):
final_result = final_result.append({'Items bought':list(results[row].ordered_statistics[0].items_base),
'Likely':list(results[row].ordered_statistics[0].items_add),
'Support': "%.4f" % results[row].support,
'Confidence': "%.4f" % results[row].ordered_statistics[0].confidence,
'Lift': "%.4f" % results[row].ordered_statistics[0].lift}, ignore_index = True)
rowsdropped=0
for row in range(0, len(final_result)):
for item in final_result['Items bought'][row]:
if(item == 'nan'):
final_result = final_result.drop(final_result.index[row-rowsdropped])
rowsdropped += 1
break
final_result = final_result.sort_values(final_result.columns[-1], ascending = False)
final_result = final_result.reset_index(drop = True)

We need to check the number of confidence and support in order to
As we can see in the result, the first rule shows 40% of people who bought the whole wheat pasta and mineral water also bought the olive oil. In addition, 27% of people who bought the mineral water, milk

You may have heard the world is made up of atoms and molecules, but it’s really made up of stories.