Sentiment Analysis for Restaurant Reviews

Most of the Restaurants ask reviews to the customers and based on the reviews the restaurant can improve the customer satisfaction. So Reviews plays a vital role for the successful growth of the restaurant.

The aim of this project is to predict whether the review is positive or negative. This project implemented by Natural Language Processing and Naive Bayes on Python.

The dataset consists of 1000 rows and 2 columns. Review Column consist of customer reviews and like column consist of 0 and 1. If the review is positive, 1 and if negative, 0.

let’s get our environment ready with the libraries we’ll need and then import the data!

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns

Check out the Data

df = pd.read_csv('~/DataSet GitHub/NLP/Restaurant_Reviews.tsv', delimiter = '\t', quoting = 3)
df.head(10)

Let’s clean the text for first review of our dataset with NLP.

import re
review = re.sub('[^a-zA-Z]',' ', df['Review'][0])
review

The second step for cleaning the text is going to be about putting all the letters of restaurant reviews in lowecase.

review = review.lower()
review

The third step is to split each word of review.

review = review.split()
review

The fourth step is to remove all the non significant words which are not relevant into predicting whether the review is positive or negative and then apply stemming to our dataset

import nltk
nltk.download('stopwords')
from nltk.corpus import stopwords
from nltk.stem.porter import PorterStemmer
ps = PorterStemmer()
review = [ps.stem(word) for word in review if not word in set(stopwords.words('english'))]
review

In the fifth step, we will convert the list which we created before to string and join all the words together.

review = ' '.join(review)
review

We cleaned the first review of the dataset so far. let’s apply NLP into all the customer’s reviews

corpus = []
for i in range(0,1000):
    review = re.sub('[^a-zA-Z]',' ', df['Review'][i])
    review = review.lower()
    review = review.split()
    ps = PorterStemmer()
    review = [ps.stem(word) for word in review if not word in set(stopwords.words('english'))]
    review = ' '.join(review)
    corpus.append(review)
corpus[:15]

The next step is to creating Bag of Words model to prepare our data to predict whether the review is positive or negative.

from sklearn.feature_extraction.text import CountVectorizer
cv = CountVectorizer(max_features=1500)
X = cv.fit_transform(corpus).toarray()
y = df.iloc[:,1].values

.

Training a Naive Bayes Model

Now let’s split the data into a training set and a testing set. We will train out model on the training set and then use the test set to evaluate the model.

from sklearn.model_selection import train_test_split
X_train,X_test, y_train, y_test = train_test_split(X,y,test_size = 0.20, random_state = 0)

This step is fitting Naive Bayes into the training set.

from sklearn.naive_bayes import GaussianNB
classifier = GaussianNB()
classifier.fit(X_train,y_train)

Predictions and Evaluations

Let’s Predict the test set

y_pred = classifier.predict(X_test)

Making Confusion Matrix

Confusion Matrix is going to contain the correct predictions that our model made on the set as well as the incorrect predictions.

from sklearn.metrics import confusion_matrix,classification_report
cm = confusion_matrix(y_test,y_pred)
class_names=[0,1] # name  of classes
fig, ax = plt.subplots()
tick_marks = np.arange(len(class_names))
plt.xticks(tick_marks, class_names)
plt.yticks(tick_marks, class_names)
# create heatmap
sns.heatmap(pd.DataFrame(cm), annot=True, cmap="BuPu" ,fmt='g')
ax.xaxis.set_label_position("top")
plt.tight_layout()
plt.title('Confusion matrix', y=1.1)
plt.ylabel('Actual label')
plt.xlabel('Predicted label')

So this confusion matrix gathered all the correct predictions and incorrect predictions of all the reviews.

Also Read:  Prediction of Diabetes Occurrence​

55 and 91 are the correct predictions. In addition, 12 and 42 are incorrect predictions. so we can see that we have quiet lot of correct predictions.

from sklearn.metrics import classification_report
print(classification_report(y_test,y_pred))

The accuracy of the model is %75

2744cookie-checkSentiment Analysis for Restaurant Reviews

Leave a Reply

Your email address will not be published. Required fields are marked *