Data Science is exploding… or so it seems. I came across an article by Oscar Olmedo that describes some of the stages that practitioners in the area follow, and I think it makes for an interesting discussion.
In general data science is interested in extracting knowledge from a given data set and in order to do that tools such as mathematical modelling and machine learning are employed. Olmedo lists the following steps in the data science process:
- Data selection and gathering
- Data cleaning/integration, and storage
- Feature extraction
- Knowledge extraction
- Visualisation
I think I agree with the view and in particular I am a firm believer that steps 1 and 2 are probably the most crucial and time consuming. You can read Olmedo’s post here.