Prediction of Mobile Price

In this project, we used the sales data of mobile phones in various companies. The aim of this project is to find out the relation between features of a mobile phone(eg:- RAM, Internal Memory, etc) and selling price. In addition, predict the price range of the mobile.

We are going to use Kernel Support Vector Machine, K-Fold Cross Validation and Grid Search to solve this problem.

The data contains the following columns:

  • battery_power: Total energy a battery can store in one time measured in mAh
  • blue: Has bluetooth or not
  • clock_speed: speed at which microprocessor executes instructions
  • dual_sim: Has dual sim support or not
  • fc: Front Camera mega pixels
  • four_g: Has 4G or not
  • int_memory: Internal Memory in Gigabytes
  • m_dep: Mobile Depth in cm
  • mobile_wt: Weight of mobile phone
  • n_cores: Number of cores of processor
  • pc: Primary Camera mega pixels
  • px_height: Pixel Resolution Height
  • px_width: Pixel Resolution Width
  • ram: Random Access Memory in Mega Bytes
  • talk_time: longest time that a single battery charge will last when you are
  • three_g: Has 3G or not
  • touch_screen: Has touch screen or not
  • wifi: Has wifi or not
  • price_range: This is the target variable with value of 0(low cost), 1(medium cost), 2(high cost) and 3(very high cost).

.

let’s get our environment ready with the libraries we’ll need and then import the data!

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

Check out the Data

df = pd.read_csv('~/DataSet GitHub/Model Selection/train.csv')
df.head()
df.info()

Visualising the missing data in the columns!

import missingno as msno
msno.matrix(df)

As we can see, There is not any missing value on the data

.

Exploratory Data Analysis

Price range correlation

corr=df.corr()
corr.sort_values(by=["price_range"],ascending=False).iloc[0].sort_values(ascending=False)

Let’s see the affect of Ram in the Price

plt.figure(figsize=(10,9))
sns.jointplot(x='ram',y='price_range',data=df,color='blue',kind='kde');

Let’s see the percentage of mobile with touch screen and without touch screen in the dataset.

explode = (0.1,0)  
fig1, ax1 = plt.subplots(figsize=(12,7))
ax1.pie(df['touch_screen'].value_counts(), explode=explode,labels=['without touch screnn','with touch screen'], autopct='%1.1f%%',
        shadow=True)
# Equal aspect ratio ensures that pie is drawn as a circle
ax1.axis('equal')  
plt.tight_layout()
plt.legend()
plt.show()

Let’s see the percentage of the different price range in the dataset.

plt.rcParams['figure.figsize'] = (20, 10)
size = [500, 500, 500, 500]
colors = ['mediumseagreen', 'c', 'gold', 'salmon']
labels = "low cost", "medium cost", "high cost" , "very high cost"
explode = [0, 0,0, 0.1]
plt.subplot(1, 2, 1)
plt.pie(size, colors = colors, labels = labels, explode = explode, shadow = True, autopct = '%.2f%%')
plt.axis('off')
plt.legend()

.

Training a Support Vector Machine Model

Let’s now begin to train the random forest model! We will need to first split up our data into an X array that contains the features to train on, and a y array with the target variable, in this case, the Price_Range column.

X = df.iloc[:,:-1].values
y = df.iloc[:,20].values

Train Test Split

Now let’s split the data into a training set and a testing set. We will train our model on the training set and then use the test set to evaluate the model.

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,Y, test_size = 0.2, random_state = 0)

Feature Scalling

from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

Fitting kernel SVM to the training set

from sklearn.svm import SVC
classifier = SVC(kernel='rbf',random_state=0)
classifier.fit(X_train,y_train)

.

Predictions and Evaluations

Now let’s predict values for the testing data.

y_pred = classifier.predict(X_test)

Making Confusion Matrix

Confusion Matrix is going to contain the correct predictions that our model made on the set as well as the incorrect predictions.

from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test,y_pred)
class_names=[0,1] # name  of classes
fig, ax = plt.subplots()
tick_marks = np.arange(len(class_names))
plt.xticks(tick_marks, class_names)
plt.yticks(tick_marks, class_names)
# create heatmap
sns.heatmap(pd.DataFrame(cm), annot=True, cmap="Set1" ,fmt='g')
ax.xaxis.set_label_position("top")
plt.tight_layout()
plt.title('Confusion matrix', y=1.1)
plt.ylabel('Actual label')
plt.xlabel('Predicted label')

89, 82, 83 and 102 are the correct predictions. In addition, 12, 4, 12, 6, 4 and 6 are incorrect predictions. so we can see that we have quiet lot of correct predictions.

Also Read:  Credit Card Clustering

Correct Predictions : 89+82+83+102 = 356

Incorrect Predictions: 12+4+12+6+4+6 = 44

Create a classification report for the model.

from sklearn.metrics import classification_report
print(classification_report(y_test,y_pred))

.

Applying K-Fold Cross Validation

The main reason for using K-Fold Cross Validation is to test the accuracy of your model and test it on the unseen data

from sklearn.model_selection import cross_val_score
accuracies = cross_val_score(estimator=classifier, X = X_train, y = y_train, cv = 10)
accuracies

These are the accuracy rate of 10 random data in our training set. Let’s see the average of these accuracies to get the final model accuracy.

accuracies.mean()

Let’s compute standard deviation

accuracies.std()

We got %2 standard deviation which is a good number.

.

Applying Grid Search

Let’s Apply Grid Search to find the best model and the best parameters.

from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import GridSearchCV
parameters = [{'C': [1, 10, 100, 1000], 'kernel': ['linear']},
              {'C': [1, 10, 100, 1000], 'kernel': ['rbf'], 'gamma': [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]}]
grid_search = GridSearchCV(estimator = classifier,
                           param_grid = parameters,
                           scoring = 'accuracy',
                           cv = 10,
                           n_jobs = -1)
grid_search = grid_search.fit(X_train, y_train)
best_accuracy = grid_search.best_score_
best_parameters = grid_search.best_params_
best_accuracy
best_parameters
2874cookie-checkPrediction of Mobile Price

Leave a Reply

Your email address will not be published. Required fields are marked *