The CRISP-DM model has consistently been the most popular data mining process model throughout the past fifteen years that provides a structured approach to planning a data mining project. CRISP-DM stands for cross industry standard process for data mining.
The six phases of the CRISP Data Mining process are business understanding, data understanding, data preparation, modelling, evaluation and deployment .
As shown in figure below, CRISP-DM methodology is based on 6 steps:
The first step of this stage is to understand the problem to be solved. It really is an iterative process of discovery with the data understanding phase. In addition, this step defines the data mining goals as well.
This phase involved the gathering the data then describe, explore and verify the quality of this data. The data scientist spends a lot of time on this phase and it one of the important phases of CRISP-DM. The purpose of this phase is to find the strength and limitation of the data.
The third stage of the CRISP-DM model is data preparation and essentially analytics technologies often require that data be in a form that’s different from how the data was initially provided and so conversion maybe necessary. Some common data preparation examples include converting data to tabular format; removing or inferring missing values; converting data to different types and scaling numerical values. In addition, this phase is one of the most time-consuming stages and the important stage of the six stages model.
In the data preparation phase data will be cleansed, transformed and will be shape for modelling phase.
The significant part of this phase is to apply a variety of modelling techniques and implement the dataset into the appropriate algorithm. There are lots of algorithms that have been created in the world. Therefore, the type of algorithm is really driven by the type of problem and type of the data.
There are two broad styles in this phase:
Hypothesis led: fields/predictors which are moving the result will be added.
Data led: put more fields at the start and gradually decline (algorithm will do that).
Generally, different techniques are discovered for the data in the modelling step. The models always change because of the unexpectedly matters in this phase and it needs to going back to data preparation step
The fifth stage of Crisp DM methodology is the evaluation step. The purpose of this stage is to assess data mining results both in qualitative and quantitative perspectives, to have