Are you interested in exploring data using Python? If so, take a look at my this blog post of mine… where I talk about using Pandas Profiler and D-Tale to carry out data exploration.
Helpful steps to:
- Detect erroneous data.
- Determine how much missing data there is.
- Understand the structure of the data.
- Identify important variables in the data.
- Sense-check the validity of the data.
I use the The Mammographic Mass Data Set from the UCI Machine Learning Repository. Information about this dataset can be obtained here.
Read the full blog post in the Domino Data Blog here.