Time Series Forecasting for 911 Calls

In this project, we aim to implement Time Series Forecasting using Prophet

The data contains the following columns:

  • lat : String variable, Latitude
  • lng: String variable, Longitude
  • desc: String variable, Description of the Emergency Call
  • zip: String variable, Zipcode
  • title: String variable, Title
  • timeStamp: String variable, YYYY-MM-DD HH:MM:SS
  • twp: String variable, Township
  • addr: String variable, Address
  • e: String variable, Dummy variable (always 1)

.

let’s get our environment ready with the libraries we’ll need and then import the data!

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
plt.style.use('ggplot')

Check out the Data

df = pd.read_csv('911.csv')
df.head()
df.info()

.

Exploratory Data Analysis

Let’s visualise the top 5 zip code which called to 911 the most

plt.figure(figsize=(12,8))
df['zip'].value_counts().head(5).plot(kind="bar")
plt.xlabel = "zip"
plt.ylabel = "Frequency"
plt.show()

Let’s visualise the top 5 townships (twp) for 911 calls

plt.figure(figsize=(12,8))
df['twp'].value_counts().head(5).plot(kind="bar", color = 'green')
plt.xlabel = "Township"
plt.ylabel = "Frequency"
plt.show()

Let’s extract the reason of call from the title column

df['Reason'] = df['title'].apply(lambda title: title.split(':')[0])

And then visualising the most common reason for 911 call

plt.figure(figsize=(12,8))
sns.countplot(x='Reason',data=df,palette='cividis')
explode = (0,0,0.1)  
fig1, ax1 = plt.subplots(figsize=(12,8))
ax1.pie(df['Reason'].value_counts(), explode=explode,labels=['EMS','Traffic','Fire'], autopct='%1.1f%%',
        shadow=True)
# Equal aspect ratio ensures that pie is drawn as a circle
ax1.axis('equal')  
plt.tight_layout()
plt.legend()
plt.show()

The type of timestamp column is string. we have to convert it to datetime

df['timeStamp'] = pd.to_datetime(df['timeStamp'])

Now let’s create 3 new columns called Hour, Month, and Day of Week based on timestamp column

df['Hour'] = df['timeStamp'].apply(lambda time: time.hour)
df['Month'] = df['timeStamp'].apply(lambda time: time.month)
df['Day of Week'] = df['timeStamp'].apply(lambda time: time.dayofweek)
dmap = {0:'Mon',1:'Tue',2:'Wed',3:'Thu',4:'Fri',5:'Sat',6:'Sun'}
df['Day of Week'] = df['Day of Week'].map(dmap)

Now let’s visualise the frequency of reasons call for each day in the week

plt.figure(figsize=(12,8))
sns.countplot(x='Day of Week',data=df,hue='Reason',palette='cividis')

# To relocate the legend
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)

let’s visualise the frequency of reasons call for each month

plt.figure(figsize=(12,8))
sns.countplot(x='Month',data=df,hue='Reason',palette='cividis')

# To relocate the legend
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)
byMonth = df.groupby('Month').count()
byMonth

Let’s visualise the count of calls per month.

plt.figure(figsize=(12,8))
byMonth['twp'].plot()

Let’s extract the date from timestamp column and see the number of calls to 911.

plt.figure(figsize=(12,8))
df['Date']=df['timeStamp'].apply(lambda t: t.date())
df.groupby('Date').count()['twp'].plot()
plt.tight_layout()

Now let’s create plot for seeing the count of 911 calls for traffic reason

plt.figure(figsize=(12,8))
df[df['Reason']=='Traffic'].groupby('Date').count()['twp'].plot()
plt.title('Traffic')
plt.tight_layout()

Let’s visualise the count of 911 calls for EMS reason

plt.figure(figsize=(12,8))
df[df['Reason']=='EMS'].groupby('Date').count()['twp'].plot()
plt.title('EMS')
plt.tight_layout()

Now let’s create heatmap for hour and day of the week for all the 911 calls

dayHour = df.groupby(by=['Day of Week','Hour']).count()['Reason'].unstack()
dayHour.head()
plt.figure(figsize=(12,8))
sns.heatmap(dayHour,cmap='cividis')
plt.figure(figsize=(12,8))
sns.clustermap(dayHour,cmap='cividis')

.

Forecasting with Prophet

For the first time we need to create a Dataframe which contains the date and total number of calls per date

df1 = df.groupby('Date', as_index=False).agg({"twp": "count"})
df1 = df1.sort_values(by=['Date'])
df1.head(11)

In this step, we have to rename the Date column to ds and twp column to y

df1 = df1.rename(columns={'Date': 'ds',
                        'twp': 'y'})
df1.tail()

Let’s visualise the data

ax = df1.set_index('ds').plot(figsize=(12, 8))
ax.set_ylabel('TotalCall')
ax.set_xlabel('Date')
plt.show()

Fitting the data to the model

from fbprophet import Prophet
# set the uncertainty interval to 95% (the Prophet default is 80%)
my_model = Prophet(interval_width=0.95)
my_model.fit(df1)
future_dates = my_model.make_future_dataframe(periods=24, freq='MS')
future_dates.tail()

let’s forecast the 911 calls for next 2 years

forecast = my_model.predict(future_dates)
forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']].tail()

.

Visualising the result

plt.figure(figsize=(15,8))
my_model.plot(forecast,
              uncertainty=True) 
plt.figure(figsize=(15,8))
my_model.plot_components(forecast)

.

Evaluation

from fbprophet.diagnostics import cross_validation, performance_metrics
df_cv = cross_validation(my_model, horizon='24 days')
df_p = performance_metrics(df_cv)
df_p.head(5)
from fbprophet.plot import plot_cross_validation_metric
fig3 = plot_cross_validation_metric(df_cv, metric='mape')
metric_df = forecast.set_index('ds')[['yhat']].join(df1.set_index('ds').y).reset_index()
metric_df.dropna(inplace=True)
metric_df.head(10)

2932cookie-checkTime Series Forecasting for 911 Calls

Leave a Reply

Your email address will not be published. Required fields are marked *