A collection of post related to my upcoming book “Data Science and Analytics with Python”Take a look and enjoy.
Data science is definitely in everyone’s lips and this time I had the opportunity of showcasing some of my thoughts, practices and interests at the Open Data Science Conference in London.
The event was very well attended by data scientists, engineers and developers at all levels of seniority, as well as business stakeholders. I had the great opportunity to present the landscape that newcomers and seasoned practitioners must be familiar with to be able to make a successful transition into this exciting field.
It was also a great opportunity to showcase “Data Science and Analytics with Python” and to get to meet new people including some that know other members of my family too.
Earlier this week I received this picture of the team in New York. As you can see they have recently all received a copy of my "Data Science and Analytics with Python" book.
Another "Data Science and Analytics with Python" Delivered. Thanks for sharing the picture Dave Groves.”Read
I’m very pleased to see that my “Data Science and Analytics” book is arriving to the hands of readers.
Here’s a picture that my colleague and friend Rob Hickling sent earlier today:
"Data Science and Analytics with Python" was published yesterday and now it is already appearing as a suggested book for related titles.
You can find it with the link above or in Amazon here.
Very pleased to see that finally the publication of my "Data Science and Analytics with Python" book has arrived.”Read
It has been a long road, one filled with unicorns and Jackalopes, decision trees and random forests, variance and bias, cats and dogs, and targets and features.
Well over a year ago, the idea of writing another book seemed like a farfetched proposition. Writing the book came about from the work that I have been doing in the area as well as from discussions with my colleagues and students, including also practitioners and beneficiaries of data science and analytics.
It is my sincere hope that the book is useful to those coming afresh to this new field as well as to those more seasoned data scientists.
This afternoon I had the pleasure of approving the final version of the book that will be sent to the printers in the next few days.
Well, I am very pleased to show you the cover that will be used for "Data Science and Analytics with Python" book. Not long to publication day!”Read
I have now received comments and corrections for the proofreading of my “Data Science and Analytics with Python” book.
Two weeks and counting to return corrections and comments back to the editor and project manager.
During the weekend I got a member of the team getting in touch because he was unable to get a Python package working for him . He had just installed Python in his machine, but things were not quite right... For example pip was not working and he had a bit of a bother setting some environment variables... I recommended to him having a look at installing Python via the Anaconda distribution. Today he was up and running with his app.
Given that outcome, I thought it was a great coincidence that the latest episode of Talk Python To Me that started playing on my way back home happened to be about Conda and Conda-Forge. I highly recommend listening to it. Take a loook:
Talk Python To Me - Python conversations for passionate developers - #94 Guarenteed packages via Conda and Conda-Forge
Have you ever had trouble installing a package you wanted to use in your Python app? Likely it contained some odd dependency, required a compilation step, maybe even using an uncommon compiler like Fortran. Did you try it on Windows? How many times have you seen "Cannot find vcvarsall.bat" before you had to take a walk?
If this sounds familiar, you might want to check conda the package manager, Anaconda, the distribution, conda forge, and conda build. They dramatically lower the bar for installing packages on all the platforms.
This week you'll meet Phil Elson, Kale Franz, and Michael Sarahan who all work on various parts of this ecosystem.
Links from the show:
Anaconda distribution: continuum.io/anaconda-overview
I am very pleased to tell you about some news I received a couple of weeks ago from my editor: my book "Data Science and Analytics with Python" has been transferred to the production department so that they can begin the publication process!
The book has been assigned a Project Editor who will handle the proofreading and handle all aspects of the production process. This was after clearing the review process I told you about some time ago. The review was lengthy but it was very positive and the comments of the reviewers have definitely improved the manuscript.
As a result of the review, the table of contents has changed a bit since the last update I posted. Here is the revised table:
- The Trials and Tribulations of a Data Scientist
- Python: For Something Completely Different!
- The Machine that Goes “Ping”: Machine Learning and Pattern Recognition
- The Relationship Conundrum: Regression
- Jackalopes and Hares: Clustering
- Unicorns and Horses: Classification
- Decisions, Decisions: Hierarchical Clustering, Decision Trees and Ensemble Techniques
- Less is More: Dimensionality Reduction
- Kernel Trick Under the Sleeve: Support Vector Machines
Each of the chapters is intended to be sufficiently self-contained. There are some occasions where reference to other sections is needed, and I am confident that it is a good thing for the reader. Chapter 1 is effectively a discussion of what data science and analytics are, paying particular attention to the data exploration process and munging. It also offers my perspective as to what skills and roles are required to get a successful data science function.
Chapter 2 is a quick reminder of some of the most important features of Python. We then move into the core of machine learning concepts that are used in the rest of the book. Chapter 4 covers regression from ordinary least squares to LASSO and ridge regression. Chapter 5 covers clustering (k-means for example) and Chapter 6 classification algorithms such as Logistic Regression and Naïve Bayes.
In Chapter 7 we introduce the use of hierarchical clustering, decision trees and talk about ensemble techniques such as bagging and boosting.
Dimensionality reduction techniques such as Principal Component Analysis are discussed in Chapter 8 and Chapter 9 covers the support vector machine algorithm and the all important Kernel trick in applications such as regression and classification.
The book contains 55 figures and 18 tables, plus plenty of bits and pieces of Python code to play with.
I guess I will have to sit and wait for the proofreading to be completed and then start the arduous process of going through the comments and suggestions. As ever I will keep you posted as how things go.
Ah! By the way, I will start a mailing list to tell people when the book is ready, so if you are interested, please let me know!
Keep in touch!
PS. The table of contents is also now available at CRC Press here.
A few weeks ago I was invited by General Assembly to give a short intro to Data Science to a group of interested (and interesting) students. They all had different backgrounds, but they all shared an interest for technology and related subjects.
While I was explaining some of the differences between supervised and unsupervised machine learning, I used my example of an alien life trying to cluster (and eventually classify) cats and dogs. If you are interested to know more about this, you will probably have to wait for the publication of my "Data Science and Analytics with Python" book.. I digress...
So, Ed Shipley - one of the admissions managers at GA London - asked me and the students if we had seen the videos that Facebook had produced to explain machine learning... He was reminded of them as they use an example about a machine distinguishing between dogs and cars... (see what they did there?...). If you haven't seen the videos, here you go:
Intro to AI
Convolutional Neural Nets”Read
Yesterday I had the pleasure to give a community talk at Campus London as part of the events organised by General Assembly London. The place was fully packed and I was quite pleased to see that the audience was very engaged as they asked questions, made comments and great remarks.
As expected, the audience was quite varied from students interested to break into the field, to seasoned analysts and startup entrepreneurs. The questions were all very pertinent and I hope that the answers provided were useful to all of them.
The talk was effectively an introduction to what data science is, the tools used and opportunities and challenged in the field. You can find a handout of the slides here.