In this project, we aim to implement Time Series Forecasting using Prophet
The data contains the following columns:
- lat : String variable, Latitude
- lng: String variable, Longitude
- desc: String variable, Description of the Emergency Call
- zip: String variable, Zipcode
- title: String variable, Title
- timeStamp: String variable, YYYY-MM-DD HH:MM:SS
- twp: String variable, Township
- addr: String variable, Address
- e: String variable, Dummy variable (always 1)
.
let’s get our environment ready with the libraries we’ll need and then import the data!
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
plt.style.use('ggplot')
Check out the Data
df = pd.read_csv('911.csv')
df.head()

df.info()

.
Exploratory Data Analysis
Let’s visualise the top 5 zip code which called to 911 the most
plt.figure(figsize=(12,8))
df['zip'].value_counts().head(5).plot(kind="bar")
plt.xlabel = "zip"
plt.ylabel = "Frequency"
plt.show()

Let’s visualise the top 5 townships (twp) for 911 calls
plt.figure(figsize=(12,8))
df['twp'].value_counts().head(5).plot(kind="bar", color = 'green')
plt.xlabel = "Township"
plt.ylabel = "Frequency"
plt.show()

Let’s extract the reason of call from the title column
df['Reason'] = df['title'].apply(lambda title: title.split(':')[0])
And then visualising the most common reason for 911 call
plt.figure(figsize=(12,8))
sns.countplot(x='Reason',data=df,palette='cividis')

explode = (0,0,0.1)
fig1, ax1 = plt.subplots(figsize=(12,8))
ax1.pie(df['Reason'].value_counts(), explode=explode,labels=['EMS','Traffic','Fire'], autopct='%1.1f%%',
shadow=True)
# Equal aspect ratio ensures that pie is drawn as a circle
ax1.axis('equal')
plt.tight_layout()
plt.legend()
plt.show()

The type of timestamp column is string. we have to convert it to datetime
df['timeStamp'] = pd.to_datetime(df['timeStamp'])
Now let’s create 3 new columns called Hour, Month, and Day of Week based on timestamp column
df['Hour'] = df['timeStamp'].apply(lambda time: time.hour)
df['Month'] = df['timeStamp'].apply(lambda time: time.month)
df['Day of Week'] = df['timeStamp'].apply(lambda time: time.dayofweek)
dmap = {0:'Mon',1:'Tue',2:'Wed',3:'Thu',4:'Fri',5:'Sat',6:'Sun'}
df['Day of Week'] = df['Day of Week'].map(dmap)
Now let’s visualise the frequency of reasons call for each day in the week
plt.figure(figsize=(12,8))
sns.countplot(x='Day of Week',data=df,hue='Reason',palette='cividis')
# To relocate the legend
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)

let’s visualise the frequency of reasons call for each month
plt.figure(figsize=(12,8))
sns.countplot(x='Month',data=df,hue='Reason',palette='cividis')
# To relocate the legend
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)

byMonth = df.groupby('Month').count()
byMonth

Let’s visualise the count of calls per month.
plt.figure(figsize=(12,8))
byMonth['twp'].plot()

Let’s extract the date from timestamp column and see the number of calls to 911.
plt.figure(figsize=(12,8))
df['Date']=df['timeStamp'].apply(lambda t: t.date())
df.groupby('Date').count()['twp'].plot()
plt.tight_layout()

Now let’s create plot for seeing the count of 911 calls for traffic reason
plt.figure(figsize=(12,8))
df[df['Reason']=='Traffic'].groupby('Date').count()['twp'].plot()
plt.title('Traffic')
plt.tight_layout()

Let’s visualise the count of 911 calls for EMS reason
plt.figure(figsize=(12,8))
df[df['Reason']=='EMS'].groupby('Date').count()['twp'].plot()
plt.title('EMS')
plt.tight_layout()

Now let’s create heatmap for hour and day of the week for all the 911 calls
dayHour = df.groupby(by=['Day of Week','Hour']).count()['Reason'].unstack()
dayHour.head()

plt.figure(figsize=(12,8))
sns.heatmap(dayHour,cmap='cividis')

plt.figure(figsize=(12,8))
sns.clustermap(dayHour,cmap='cividis')

.
Forecasting with Prophet
For the first time we need to create a Dataframe which contains the date and total number of calls per date
df1 = df.groupby('Date', as_index=False).agg({"twp": "count"})
df1 = df1.sort_values(by=['Date'])
df1.head(11)

In this step, we have to rename the Date column to ds and twp column to y
df1 = df1.rename(columns={'Date': 'ds',
'twp': 'y'})
df1.tail()

Let’s visualise the data
ax = df1.set_index('ds').plot(figsize=(12, 8))
ax.set_ylabel('TotalCall')
ax.set_xlabel('Date')
plt.show()

Fitting the data to the model
from fbprophet import Prophet
# set the uncertainty interval to 95% (the Prophet default is 80%)
my_model = Prophet(interval_width=0.95)
my_model.fit(df1)
future_dates = my_model.make_future_dataframe(periods=24, freq='MS')
future_dates.tail()

let’s forecast the 911 calls for next 2 years
forecast = my_model.predict(future_dates)
forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']].tail()

.
Visualising the result
plt.figure(figsize=(15,8))
my_model.plot(forecast,
uncertainty=True)

plt.figure(figsize=(15,8))
my_model.plot_components(forecast)

.
Evaluation
from fbprophet.diagnostics import cross_validation, performance_metrics
df_cv = cross_validation(my_model, horizon='24 days')
df_p = performance_metrics(df_cv)
df_p.head(5)

from fbprophet.plot import plot_cross_validation_metric
fig3 = plot_cross_validation_metric(df_cv, metric='mape')

metric_df = forecast.set_index('ds')[['yhat']].join(df1.set_index('ds').y).reset_index()
metric_df.dropna(inplace=True)
metric_df.head(10)


You may have heard the world is made up of atoms and molecules, but it’s really made up of stories.