Data Science and Analytics with Python

A collection of post related to my upcoming book “Data Science and Analytics with Python”Take a look and enjoy.

Data Science and Analytics with Python - Cover

Well, I am very pleased to show you the cover that will be used for "Data Science and Analytics with Python" book. Not long to publication day!

”Read

Data Science and Analytics with Python - Proofread Manuscript

I have now received comments and corrections for the proofreading of my “Data Science and Analytics with Python” book.

Two weeks and counting to return corrections and comments back to the editor and project manager.

 

”Read

Anaconda - Guarenteed Python packages via Conda and Conda-Forge

During the weekend I got a member of the team getting in touch because he was unable to get a Python package working for him . He had just installed Python in his machine, but things were not quite right... For example pip was not working and he had a bit of a bother setting some environment variables... I recommended to him having a look at installing Python via the Anaconda distribution. Today he was up and running with his app.

Given that outcome, I thought it was a great coincidence that the latest episode of Talk Python To Me that started playing on my way back home happened to be about Conda and Conda-Forge. I highly recommend listening to it. Take a loook:

Talk Python To Me - Python conversations for passionate developers - #94 Guarenteed packages via Conda and Conda-Forge

Have you ever had trouble installing a package you wanted to use in your Python app? Likely it contained some odd dependency, required a compilation step, maybe even using an uncommon compiler like Fortran. Did you try it on Windows? How many times have you seen "Cannot find vcvarsall.bat" before you had to take a walk?

If this sounds familiar, you might want to check conda the package manager, Anaconda, the distribution, conda forge, and conda build. They dramatically lower the bar for installing packages on all the platforms.

This week you'll meet Phil Elson, Kale Franz, and Michael Sarahan who all work on various parts of this ecosystem.

Links from the show:

conda: conda.pydata.org
conda-build: conda.pydata.org/docs/commands/build/conda-build.html
Anaconda distribution: continuum.io/anaconda-overview
conda-forge: conda-forge.github.io

Phil Elson on Twitter: @pypelson
Kale Franz: @kalefranz
Michael Sarahan: github.com/msarahan

”Read

Data Analytics Python

"Data Science and Analytics with Python" enters production

Data Analytics Python

I am very pleased to tell you about some news I received a couple of weeks ago from my editor: my book "Data Science and Analytics with Python" has been transferred to the production department so that they can begin the publication process!

The book has been assigned a Project Editor who will handle the proofreading and handle all aspects of the production process. This was after clearing the review process I told you about some time ago. The review was lengthy but it was very positive and the comments of the reviewers have definitely improved the manuscript.

As a result of the review, the table of contents has changed a bit since the last update I posted. Here is the revised table:

  1. The Trials and Tribulations of a Data Scientist
  2. Python: For Something Completely Different!
  3. The Machine that Goes “Ping”: Machine Learning and Pattern Recognition
  4. The Relationship Conundrum: Regression
  5. Jackalopes and Hares: Clustering
  6. Unicorns and Horses: Classification
  7. Decisions, Decisions: Hierarchical Clustering, Decision Trees and Ensemble Techniques
  8. Less is More: Dimensionality Reduction
  9. Kernel Trick Under the Sleeve: Support Vector Machines

Each of the chapters is intended to be sufficiently self-contained. There are some occasions where reference to other sections is needed, and I am confident that it is a good thing for the reader. Chapter 1 is effectively a discussion of what data science and analytics are, paying particular attention to the data exploration process and munging. It also offers my perspective as to what skills and roles are required to get a successful data science function.

Chapter 2 is a quick reminder of some of the most important features of Python. We then move into the core of machine learning concepts that are used in the rest of the book. Chapter 4 covers regression from ordinary least squares to LASSO and ridge regression. Chapter 5 covers clustering (k-means for example) and Chapter 6 classification algorithms such as Logistic Regression and Naïve Bayes.

In Chapter 7 we introduce the use of hierarchical clustering, decision trees and talk about ensemble techniques such as bagging and boosting.

Dimensionality reduction techniques such as Principal Component Analysis are discussed in Chapter 8 and Chapter 9 covers the support vector machine algorithm and the all important Kernel trick in applications such as regression and classification.

The book contains 55 figures and 18 tables, plus plenty of bits and pieces of Python code  to play with.

I guess I will have to sit and wait for the proofreading to be completed and then start the arduous process of going through the comments and suggestions. As ever I will keep you posted as how things go.

Ah! By the way, I will start a mailing list to tell people when the book is ready, so if you are interested, please let me know!

Keep in touch!

PS. The table of contents is also now available at CRC Press here.

”Read

Artificial Intelligence, Revealed

A few weeks ago I was invited by General Assembly to give a short intro to Data Science to a group of interested (and interesting) students. They all had different backgrounds, but they all shared an interest for technology and related subjects.

While I was explaining some of the differences between supervised and unsupervised machine learning, I used my example of an alien life trying to cluster (and eventually classify) cats and dogs. If you are interested to know more about this, you will probably have to wait for the publication of my "Data Science and Analytics with Python" book.. I digress...

So, Ed Shipley - one of the admissions managers at GA London - asked me and the students if we had seen the videos that Facebook had produced to explain machine learning... He was reminded of them as they use an example about a machine distinguishing between dogs and cars... (see what they did there?...). If you haven't seen the videos, here you go:

Intro to AI

Machine Learning

Convolutional Neural Nets

”Read

Intro to Data Science Talk

Yesterday I had the pleasure to give a community talk at Campus London as part of the events organised by General Assembly London. The place was fully packed and I was quite pleased to see that the audience was very engaged as they asked questions, made comments and great remarks.

As expected, the audience was quite varied from students interested to break into the field, to seasoned analysts and startup entrepreneurs. The questions were all very pertinent and I hope that the answers provided were useful to all of them.

The talk was effectively an introduction to what data science is, the tools used and opportunities and challenged in the field. You can find a handout of the slides here.

Download (PDF, 2.34MB)

 

 

 

”Read

Data Analytics Python

First full draft of "Data Science and Analytics with Python"

It has been nearly 12 months in development almost to the day, and I am very please to tell you that the first full draft of my new book entitled "Data Science and Analytics with Python" is ready.

Data Analytics Python

The book is aimed at data enthusiasts and professionals with some knowledge of programming principles as well as developers and business people interested in learning more about data science and analytics The proposed table of contents is as follows:

  1. The Trials and Tribulations of a Data Scientist
  2. Firsts Slithers with Python
  3. The Machine that Goes “Ping”: Machine Learning and Pattern Recognition
  4. The Relationship Conundrum: Regression
  5. Jackalopes and Hares, Unicorns and Horses: Clustering and Classification
  6. Decisions, Decisions: Hierarchical Clustering, Decision Trees and Ensemble Techniques
  7. Dimensionality Reduction and Support Vector Machines

At the moment the book contains 53 figures and 18 tables, plus plenty of bits and pieces of code ready to be tried.

The next step is to start the re-reading, re-draftings and revisions in preparation for the final version and submission to my publisher CRC Press later in the year. I will keep you posted as how things go.

Keep in touch!

 

”Read

Data Science Bootcamp - Done

Today I had the opportunity of running a #DataScience bootcamp in London. It was an all-day affair and although the attendees were engaged, I’m sure that by the end of the 6th hour they were quite tired.
The discussions ranged from what data science is, the skills required to become a data scientist and also to manage them. Finally we implemented some data analyses based  on linear regression, all using R. I was very pleased to see some of the results.

.

”Read

iPython Notebook is now Jupyter... I knew it!

JupyterIt is not really news... Jupyter is the new name of the loved iPython project, and it has been for a while and as they Jupiter projects puts it themselves

The language-agnostic parts of IPython are getting a new home in Project Jupyter

As announced in the python.org page, as of version 4.0, the The Big Split from the old iPython starts. I knew this and I even tweeted about it:

https://twitter.com/quantum_tunnel/status/631570806607319040

All, great, right? Well I still got surprised when after updating my Python installation and tried to start my ipython notebook I got an error that ended with:

File "importstring.py", line 31, in import_item
module = __import__(package, fromlist=[obj])
ImportError: No module named notebook.notebookapp

Then I remembered and to fix my problem I simply tried installing Jupyter (*I am using Anaconda) with the following command

conda install jupyter

Et voilà!

iPython Notebook is now Jupyter... I knew it!

”Read

Data Science and Analytics with Python

Well, it is not a surprise anymore that I am currently working on writing a second book. First time round it was a book motivated for the use of Matlab and its counterpart Octave in the area of simulations suitable for students of physics, mathematics, biology, economics and engineering. It was a very good experience, and it seems that I enjoyed it so much that I am embarking in another project.

This time round it is a book more geared up towards more seasoned programmers, developers and business people who are interested in learning more about data science and analytics. The  language of choice this time round is Python, and why not? It seems to be a popular choice and goes well with other activities I have recently been involved with.

I was, once again, pleasantly surprised that my publisher has already created an entry in their site to advertise the book. The date for delivery is currently utterly wrong, but please do keep an eye and I shall try to update you as to how the writing goes.

Data Science and Analytics with Python

”Read

 

%d bloggers like this: