Prediction of Pulsar Stars

The HTRU2 dataset describes a sample of pulsar candidates collected during the High Time Resolution Universe Survey.

Pulsars are a rare type of Neutron star that produce radio emission detectable here on Earth. They are of considerable scientific interest as probes of space-time, the inter-stellar medium, and states of matter .

As pulsars rotate, their emission beam sweeps across the sky, and when this crosses our line of sight, produces a detectable pattern of broadband radio emission. As pulsars rotate rapidly, this pattern repeats periodically. Thus pulsar search involves looking for periodic radio signals with large radio telescopes.

Each pulsar produces a slightly different emission pattern, which varies slightly with each rotation . Thus a potential signal detection known as a ‘candidate’, is averaged over many rotations of the pulsar, as determined by the length of an observation. In the absence of additional info, each candidate could potentially describe a real pulsar. However in practice almost all detections are caused by radio frequency interference (RFI) and noise, making legitimate signals hard to find.

The dataset contains a total of 17898 observations, where 1639 are positive examples, and 16 259 are negative.

In this Project, We implemented Naive Bayes Classification on Python

The data contains the following columns:

let’s get our environment ready with the libraries we’ll need and then import the data!

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
from sklearn.metrics import confusion_matrix
from sklearn import metrics

Check out the Data

#importing the dataset
df = pd.read_csv('/Users/sadegh/Desktop/DataSet GitHub/Naive Bayes/pulsar_stars.csv')
df.head()
df.info()
plt.figure(figsize=(12,8))
sns.heatmap(df.describe()[1:].transpose(),
            annot=True,linecolor="w",
            linewidth=2,cmap=sns.color_palette("Set3"))
plt.title("Data summary")
plt.show()

.

Exploratory Data Analysis

Let’s check out the correlation between variables!

#corelation matrix.
cor_mat= df[:].corr()
mask = np.array(cor_mat)
mask[np.tril_indices_from(mask)] = False
fig=plt.gcf()
fig.set_size_inches(30,12)
sns.heatmap(data=cor_mat,mask=mask,square=True,annot=True,cbar=True)

Let’s check out the Proportion of target variable in dataset!

plt.figure(figsize=(12,6))
plt.pie(df["target_class"].value_counts().values,
        labels=["not pulsar stars","pulsar stars"],
        autopct="%1.0f%%",wedgeprops={"linewidth":2,"edgecolor":"white"})
my_circ = plt.Circle((0,0),.7,color = "white")
plt.gca().add_artist(my_circ)
plt.subplots_adjust(wspace = .2)
plt.title("Proportion of target variable in dataset")
plt.show()

Let’s see the PAIR PLOT between all variables!

sns.pairplot(data=df,
             palette="hls",
             hue="target_class",
             vars=["mean_profile",
                   "std_profile",
                   "kurtosis_profile",
                   "skewness_profile",
                   "mean_dmsnr_curve",
                   "std_dmsnr_curve",
                  "kurtosis_dmsnr_curve"])

plt.tight_layout()
plt.show()

.

Train Test Split

Split the data into a training set and a testing set

X = df.drop('target_class',axis=1)
y = df['target_class']
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size = 0.25, random_state = 0)

Train a Model

Now it’s time to train a Naive Bayes Classifier. 

#fitting classifier to the training set
from sklearn.naive_bayes import GaussianNB
classifier = GaussianNB()
classifier.fit(X_train,y_train)

Model Evaluation

Now get predictions from the model and create a confusion matrix and a classification report.

y_pred = classifier.predict(X_test)
from sklearn.metrics import confusion_matrix,classification_report
cm = confusion_matrix(y_test,y_pred)
class_names=[0,1] # name  of classes
fig, ax = plt.subplots()
tick_marks = np.arange(len(class_names))
plt.xticks(tick_marks, class_names)
plt.yticks(tick_marks, class_names)
# create heatmap
sns.heatmap(pd.DataFrame(cm), annot=True, cmap="OrRd" ,fmt='g')
ax.xaxis.set_label_position("top")
plt.tight_layout()
plt.title('Confusion matrix', y=1.1)
plt.ylabel('Actual label')
plt.xlabel('Predicted label')

3946 and 308 are the correct predictions. In addition, 52 and 169 are the incorrectpredictions. so we can see that we have quiet lot of correct predictions.

Also Read:  Credit Card Clustering

Correct Predictions : 3946+308 = 4254

Incorrect Predictions: 52+169 = 221

Create a classification report for the model.

print(classification_report(y_test,y_pred))

The accuracy of the model to predict the pulsar star is %95!

587cookie-checkPrediction of Pulsar Stars

Leave a Reply

Your email address will not be published.