Happy Pi Day 2019
Happy Pi Day 2019
This is pure magic: Byrne’s Euclid by Nicholas Rougeux
THE FIRST SIX BOOKS OF THE ELEMENTS OF EUCLID WITH COLOURED DIAGRAMS AND SYMBOLS
A reproduction of Oliver Byrne’s celebrated work from 1847 plus interactive diagrams, cross references, and posters designed by Nicholas Rougeux – Explore
Decorate your walls with a colorful detailed poster of every geometric illustration from Oliver Byrne’s colorful 1847 edition of Euclid’s Elements.2
No, sadly this is not a post about observing the Mpemba effect on beer. Instead is about me reading about new studies about the Mpemba effect – i.e. the effect that hot water freezes faster than lukewarm or cool water – while enjoying a cold beer.
Cal me a geek!
This is an exceptionally good answer to the question: “What do physicists wish the average person knew about physics?” The answer was written by Inna Vishik, Assistant Professor of Physics at the University of California, Davis.
- Physics makes predictive models about the natural world based on empirical observations (experiments), mathematics, and numerical simulations. These models are called ‘theories’, but this does not mean they are speculative; physics theories explain past behavior and predict future behavior. When a previously-validated theory fails to explain the behavior in a new physical system, it doesn’t mean the theory is suddenly ‘wrong’ altogether, it means that it is inapplicable in a certain regime. It is very exciting for physicists when these exceptions are found, and it is in these holes in our models that we propel our understanding of the physical world forward.
- The domain of physics is vast. Some physicists study the existing universe around us. Some study the smallest constituent particles and forces of matter in this universe. Some manipulate clusters of atoms, and some manipulate light. Some study crystalline solids and the myriad properties they can have when quadrillions of atoms and electrons are arranged in slightly different ways. Others study biological systems. This is not a full list of the many subfields in physics, but what they all have in common is they combine classical (including continuum) mechanics, quantum mechanics, statistical mechanics, general relativity, and electricity and magnetism in various configurations to explain the physical and engineered world around us.
- Research in physics and other fundamental sciences play three crucial roles in an advanced society; they cement our cultural legacy by exploring one aspect of the human condition (the universe we occupy), similar to the role of the arts; they educate a portion of the work force in solving difficult, open ended problems beyond the limits of prior human innovation; they provide the seeds for future technological developments, which is often realized decades in the future in an unpredictable manner (i.e. not amenable to quarterly earnings reports). At the time of their inception, electromagnetic waves (late 19th century), quantum mechanics (early 20th century) and lasers (mid 20th century) were viewed even by their progenitors as esoteric curiosities; now they permeate our life, technology, and medicine so deeply that no one would question their practical importance. In the modern physics research era, there are newer ideas that might have an equally important impact 50 years from now, but they will never be realized without continued investment in the public good known as fundamental science.
–Dr J Rogel-Salazar
During the weekend I got a member of the team getting in touch because he was unable to get a Python package working for him . He had just installed Python in his machine, but things were not quite right… For example pip was not working and he had a bit of a bother setting some environment variables… I recommended to him having a look at installing Python via the Anaconda distribution. Today he was up and running with his app.
Given that outcome, I thought it was a great coincidence that the latest episode of Talk Python To Me that started playing on my way back home happened to be about Conda and Conda-Forge. I highly recommend listening to it. Take a loook:
Have you ever had trouble installing a package you wanted to use in your Python app? Likely it contained some odd dependency, required a compilation step, maybe even using an uncommon compiler like Fortran. Did you try it on Windows? How many times have you seen “Cannot find vcvarsall.bat” before you had to take a walk?
If this sounds familiar, you might want to check conda the package manager, Anaconda, the distribution, conda forge, and conda build. They dramatically lower the bar for installing packages on all the platforms.
This week you’ll meet Phil Elson, Kale Franz, and Michael Sarahan who all work on various parts of this ecosystem.
Links from the show:
Anaconda distribution: continuum.io/anaconda-overview
I have been thinking of making a post about CRISP-DM… in the meantime here is one from Steph Locke.
The Cross Industry Standard Process for Data Mining (CRISP-DM) was a concept developed 20 years ago now. I’ve read about it in various data mining and related books and it’s come in very handy over the years. In this post, I’ll outline what the model is and why you should know about it, even if it has that terribly out of vogue phrase data mining in it!
Data / R people. Do you know what the CRISP-DM model is?
— Steph Locke (@SteffLocke) January 8, 2017
The model splits a data mining project into six phases and it allows for needing to go back and forth between different stages. I’d personally stick a few more backwards arrows but it’s generally fine. The CRISP-DM model applies equally well to a data science project.
CRISP-DM Process diagram by Kenneth Jensen (Own work) [CC BY-SA 3.0 (http://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons
In Data Mining Techniques in CRM, a very readable book, they outline in Table 1.1. some typical activities within each phase:
The CRISP-DM process outlines the steps involved in performing data science activities from business need to deployment, and most importantly it indicates how iterative this process is and that you never get things perfectly right.
Within a given project, we know that at the beginning of our first ever project we may not have a lot of domain knowledge, or there might be problems with the data or the model might not be valuable enough to put into production. These things happen, and the really nice thing about the CRISP-DM model is it allows for us to do that. It’s not a single linear path from project kick-off to deployment. It helps you remember not to beat yourself up over having to go back a step. It also equips you with something upfront to explain to managers that sometimes you will need to bounce between some phases, and that’s ok.
All models are wrong but some are useful (George Box)
We also know that our model is not going to be perfect. By the end of the project, our model’s value is already deteriorating! We get new customers, people change, the world changes, the business changes. Everything is conspiring against your model. This means it requires regular TLC for it remain of value. We might need to just regular adjust slightly for the latest view of the world (re-calibration) or we might need to take another tilt at modelling the problem again. The big circle around the process shows this fact of a data scientist’s life.
Working from the expectation that we will be iterative, we can start planning cycles of work. These might start with a short, small, simple model cycle to get a basic model quickly. Then further iterations can develop stronger models. The business gets some immediate benefit and it can then continue getting additional benefit from further cycles, or people could be moved onto building the next quick and simple model.
This gives the business a better high-level view of where data scientists are adding value and it means if the company is evolving the processes and data engineering capabilities at the same time, then a broad range of simple models can be first developed and implemented, giving learning experiences for all involved.
Estimation of project work and scoping is often difficult for data science projects, and that does need to change. One thing we can do is take the CRISP-DM phases and typical activities and build checklists and process frameworks around them. We can start moving each “bespoke” activity into a “cookie-cutter” activity.
One simple way of doing this is to start with a checklist. I am a big fan on checklists, more so after reading The Checklist Manifesto. You can build a manual checklist for people to work through to make sure important tasks are completed, that considerations from past projects are addressed, and you can ensure that ethical, regulatory, and legal considerations are considered at the right points in the development cycle.
The Microsoft Team Data Science Process is a developing framework that broadly follows the CRISP-DM model and is bringing in templates and tools to help data scientists. It’s proving quite interesting and I would recommend it as follow up reading.
I read a lot of productivity, project management, and framework books. I’m always interested in how we can do our jobs better. Usually, this boils down to making things simpler and helping ensure we do the right things at the right time. The CRISP-DM is one simple thing that has helped me put that structure onto what often seems a chaotic process. I hope it could offer you some benefit and I’d be really interested to hear your thoughts, experiences, and tips for building better data science workflows.