In this project, we used the sales data of mobile phones in various companies. The aim of this project is to find out the relation between features of a mobile phone(eg:- RAM, Internal Memory, etc) and selling price. In addition, predict the price range of the mobile.
We are going to use Kernel Support Vector Machine, K-Fold Cross Validation and Grid Search to solve this problem.
The data contains the following columns:
- battery_power: Total energy a battery can store in one time measured in mAh
- blue: Has
bluetooth or not - clock_speed:
speed at which microprocessor executes instructions - dual_sim: Has dual sim support or not
fc : Front Cameramega pixels - four_g: Has 4G or not
- int_memory: Internal Memory in Gigabytes
- m_dep: Mobile Depth in cm
- mobile_wt: Weight of
mobile phone - n_cores: Number of cores of
processor - pc: Primary Camera
mega pixels - px_height: Pixel Resolution Height
- px_width: Pixel Resolution Width
- ram: Random Access Memory in
Mega Bytes - talk_time: longest time that a single battery charge will last when you are
- three_g: Has 3G or not
- touch_screen: Has
touch screen or not - wifi: Has wifi or not
- price_range: This is the target variable with value of 0(low cost), 1(medium cost), 2(high cost) and 3(very high cost).
.
let’s get our environment ready with the libraries we’ll need and then import the data!
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
Check out the Data
df = pd.read_csv('~/DataSet GitHub/Model Selection/train.csv')
df.head()

df.info()

Visualising the missing data in the columns!
import missingno as msno
msno.matrix(df)

As we can see, There is not any missing value on the data
.
Exploratory Data Analysis
Price range correlation
corr=df.corr()
corr.sort_values(by=["price_range"],ascending=False).iloc[0].sort_values(ascending=False)

Let’s see the affect of Ram in the Price
plt.figure(figsize=(10,9))
sns.jointplot(x='ram',y='price_range',data=df,color='blue',kind='kde');

Let’s see the percentage of mobile with touch screen and without touch screen in the dataset.
explode = (0.1,0)
fig1, ax1 = plt.subplots(figsize=(12,7))
ax1.pie(df['touch_screen'].value_counts(), explode=explode,labels=['without touch screnn','with touch screen'], autopct='%1.1f%%',
shadow=True)
# Equal aspect ratio ensures that pie is drawn as a circle
ax1.axis('equal')
plt.tight_layout()
plt.legend()
plt.show()

Let’s see the percentage of the different price range in the dataset.
plt.rcParams['figure.figsize'] = (20, 10)
size = [500, 500, 500, 500]
colors = ['mediumseagreen', 'c', 'gold', 'salmon']
labels = "low cost", "medium cost", "high cost" , "very high cost"
explode = [0, 0,0, 0.1]
plt.subplot(1, 2, 1)
plt.pie(size, colors = colors, labels = labels, explode = explode, shadow = True, autopct = '%.2f%%')
plt.axis('off')
plt.legend()

.
Training a Support Vector Machine Model
Let’s now begin to train the random forest model! We will need to first split up our data into an X array that contains the features to train on, and a y array with the target variable, in this case, the Price_Range column.
X = df.iloc[:,:-1].values
y = df.iloc[:,20].values
Train Test Split
Now let’s split the data into a training set and a testing set. We will train our model on the training set and then use the test set to evaluate the model.
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,Y, test_size = 0.2, random_state = 0)
Feature Scalling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
Fitting kernel SVM to the training set
from sklearn.svm import SVC
classifier = SVC(kernel='rbf',random_state=0)
classifier.fit(X_train,y_train)

.
Predictions and Evaluations
Now let’s predict values for the testing data.
y_pred = classifier.predict(X_test)
Making Confusion Matrix
Confusion Matrix is going to contain the correct predictions that our model made on the set as well as the incorrect predictions.
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test,y_pred)
class_names=[0,1] # name of classes
fig, ax = plt.subplots()
tick_marks = np.arange(len(class_names))
plt.xticks(tick_marks, class_names)
plt.yticks(tick_marks, class_names)
# create heatmap
sns.heatmap(pd.DataFrame(cm), annot=True, cmap="Set1" ,fmt='g')
ax.xaxis.set_label_position("top")
plt.tight_layout()
plt.title('Confusion matrix', y=1.1)
plt.ylabel('Actual label')
plt.xlabel('Predicted label')

89, 82, 83 and 102 are the correct predictions. In addition, 12, 4, 12, 6, 4 and 6 are incorrect predictions. so we can see that we have
Correct Predictions : 89+82+83+102 = 356
Incorrect Predictions: 12+4+12+6+4+6 = 44
Create a classification report for the model.
from sklearn.metrics import classification_report
print(classification_report(y_test,y_pred))

.
Applying K-Fold Cross Validation
The main reason for using K-Fold Cross Validation is to test the accuracy of your model and test it on the unseen data
from sklearn.model_selection import cross_val_score
accuracies = cross_val_score(estimator=classifier, X = X_train, y = y_train, cv = 10)
accuracies

These are the accuracy rate of 10 random data in our training set. Let’s see the average of these accuracies to get the final model accuracy.
accuracies.mean()

Let’s compute standard deviation
accuracies.std()

We got %2 standard deviation which is a good number.
.
Applying Grid Search
Let’s Apply Grid Search to find the best model and the best parameters.
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import GridSearchCV
parameters = [{'C': [1, 10, 100, 1000], 'kernel': ['linear']},
{'C': [1, 10, 100, 1000], 'kernel': ['rbf'], 'gamma': [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]}]
grid_search = GridSearchCV(estimator = classifier,
param_grid = parameters,
scoring = 'accuracy',
cv = 10,
n_jobs = -1)
grid_search = grid_search.fit(X_train, y_train)
best_accuracy = grid_search.best_score_
best_parameters = grid_search.best_params_
best_accuracy

best_parameters


You may have heard the world is made up of atoms and molecules, but it’s really made up of stories.