In this project, we aim to implement Time Series Forecasting using Prophet
The data contains the following columns:
- lat : String variable, Latitude
- lng: String variable, Longitude
- desc: String variable, Description of the Emergency Call
- zip: String variable, Zipcode
- title: String variable, Title
- timeStamp: String variable, YYYY-MM-DD HH:MM:SS
- twp: String variable, Township
- addr: String variable, Address
- e: String variable, Dummy variable (always 1)
.
let’s get our environment ready with the libraries we’ll need and then import the data!
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
plt.style.use('ggplot')
Check out the Data
df = pd.read_csv('911.csv')
df.head()

df.info()

.
Exploratory Data Analysis
Let’s visualise the top 5 zip code which called to 911 the most
plt.figure(figsize=(12,8))
df['zip'].value_counts().head(5).plot(kind="bar")
plt.xlabel = "zip"
plt.ylabel = "Frequency"
plt.show()

Let’s visualise the top 5 townships (twp) for 911 calls
plt.figure(figsize=(12,8))
df['twp'].value_counts().head(5).plot(kind="bar", color = 'green')
plt.xlabel = "Township"
plt.ylabel = "Frequency"
plt.show()

Let’s extract the reason of call from the title column
df['Reason'] = df['title'].apply(lambda title: title.split(':')[0])
And then visualising the most common reason for 911 call
plt.figure(figsize=(12,8))
sns.countplot(x='Reason',data=df,palette='cividis')

explode = (0,0,0.1)
fig1, ax1 = plt.subplots(figsize=(12,8))
ax1.pie(df['Reason'].value_counts(), explode=explode,labels=['EMS','Traffic','Fire'], autopct='%1.1f%%',
shadow=True)
# Equal aspect ratio ensures that pie is drawn as a circle
ax1.axis('equal')
plt.tight_layout()
plt.legend()
plt.show()

The type of timestamp column is string. we have to convert it to datetime
df['timeStamp'] = pd.to_datetime(df['timeStamp'])
Now let’s create 3 new columns called Hour, Month, and Day of Week based on timestamp column
df['Hour'] = df['timeStamp'].apply(lambda time: time.hour)
df['Month'] = df['timeStamp'].apply(lambda time: time.month)
df['Day of Week'] = df['timeStamp'].apply(lambda time: time.dayofweek)
dmap = {0:'Mon',1:'Tue',2:'Wed',3:'Thu',4:'Fri',5:'Sat',6:'Sun'}
df['Day of Week'] = df['Day of Week'].map(dmap)
Now let’s visualise the frequency of reasons call for each day in the week
plt.figure(figsize=(12,8))
sns.countplot(x='Day of Week',data=df,hue='Reason',palette='cividis')
# To relocate the legend
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)

let’s visualise the frequency of reasons call for each month
plt.figure(figsize=(12,8))
sns.countplot(x='Month',data=df,hue='Reason',palette='cividis')
# To relocate the legend
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)

byMonth = df.groupby('Month').count()
byMonth

Let’s visualise the count of calls per month.
plt.figure(figsize=(12,8))
byMonth['twp'].plot()

Let’s extract the date from timestamp column and see the number of calls to 911.
plt.figure(figsize=(12,8))
df['Date']=df['timeStamp'].apply(lambda t: t.date())
df.groupby('Date').count()['twp'].plot()
plt.tight_layout()

Now let’s create plot for seeing the count of 911 calls for traffic reason
plt.figure(figsize=(12,8))
df[df['Reason']=='Traffic'].groupby('Date').count()['twp'].plot()
plt.title('Traffic')
plt.tight_layout()

Let’s visualise the count of 911 calls for EMS reason
plt.figure(figsize=(12,8))
df[df['Reason']=='EMS'].groupby('Date').count()['twp'].plot()
plt.title('EMS')
plt.tight_layout()

Now let’s create heatmap for hour and day of the week for all the 911 calls
dayHour = df.groupby(by=['Day of Week','Hour']).count()['Reason'].unstack()
dayHour.head()

plt.figure(figsize=(12,8))
sns.heatmap(dayHour,cmap='cividis')

plt.figure(figsize=(12,8))
sns.clustermap(dayHour,cmap='cividis')

.
Forecasting with Prophet
For the first time we need to create a Dataframe which contains the date and total number of calls per date
df1 = df.groupby('Date', as_index=False).agg({"twp": "count"})
df1 = df1.sort_values(by=['Date'])
df1.head(11)

In this step, we have to rename the Date column to ds and twp column to y
df1 = df1.rename(columns={'Date': 'ds',
'twp': 'y'})
df1.tail()

Let’s visualise the data
ax = df1.set_index('ds').plot(figsize=(12, 8))
ax.set_ylabel('TotalCall')
ax.set_xlabel('Date')
plt.show()

Fitting the data to the model
from fbprophet import Prophet
# set the uncertainty interval to 95% (the Prophet default is 80%)
my_model = Prophet(interval_width=0.95)
my_model.fit(df1)
future_dates = my_model.make_future_dataframe(periods=24, freq='MS')
future_dates.tail()

let’s forecast the 911 calls for next 2 years
forecast = my_model.predict(future_dates)
forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']].tail()

.
Visualising the result
plt.figure(figsize=(15,8))
my_model.plot(forecast,
uncertainty=True)

plt.figure(figsize=(15,8))
my_model.plot_components(forecast)

.
Evaluation
from fbprophet.diagnostics import cross_validation, performance_metrics
df_cv = cross_validation(my_model, horizon='24 days')
df_p = performance_metrics(df_cv)
df_p.head(5)

from fbprophet.plot import plot_cross_validation_metric
fig3 = plot_cross_validation_metric(df_cv, metric='mape')

metric_df = forecast.set_index('ds')[['yhat']].join(df1.set_index('ds').y).reset_index()
metric_df.dropna(inplace=True)
metric_df.head(10)

You may have heard the world is made up of atoms and molecules, but it’s really made up of stories. When you sit with an individual that’s been here, you can give quantitative data a qualitative overlay.