I am very pleased to tell you about some news I received a couple of weeks ago from my editor: my book “Data Science and Analytics with Python” has been transferred to the production department so that they can begin the publication process!
UPDATE: The book is available here.
The book has been assigned a Project Editor who will handle the proofreading and handle all aspects of the production process. This was after clearing the review process I told you about some time ago. The review was lengthy but it was very positive and the comments of the reviewers have definitely improved the manuscript.
As a result of the review, the table of contents has changed a bit since the last update I posted. Here is the revised table:
- The Trials and Tribulations of a Data Scientist
- Python: For Something Completely Different!
- The Machine that Goes “Ping”: Machine Learning and Pattern Recognition
- The Relationship Conundrum: Regression
- Jackalopes and Hares: Clustering
- Unicorns and Horses: Classification
- Decisions, Decisions: Hierarchical Clustering, Decision Trees and Ensemble Techniques
- Less is More: Dimensionality Reduction
- Kernel Trick Under the Sleeve: Support Vector Machines
Each of the chapters is intended to be sufficiently self-contained. There are some occasions where reference to other sections is needed, and I am confident that it is a good thing for the reader. Chapter 1 is effectively a discussion of what data science and analytics are, paying particular attention to the data exploration process and munging. It also offers my perspective as to what skills and roles are required to get a successful data science function.
Chapter 2 is a quick reminder of some of the most important features of Python. We then move into the core of machine learning concepts that are used in the rest of the book. Chapter 4 covers regression from ordinary least squares to LASSO and ridge regression. Chapter 5 covers clustering (k-means for example) and Chapter 6 classification algorithms such as Logistic Regression and Naïve Bayes.
In Chapter 7 we introduce the use of hierarchical clustering, decision trees and talk about ensemble techniques such as bagging and boosting.
Dimensionality reduction techniques such as Principal Component Analysis are discussed in Chapter 8 and Chapter 9 covers the support vector machine algorithm and the all important Kernel trick in applications such as regression and classification.
The book contains 55 figures and 18 tables, plus plenty of bits and pieces of Python code to play with.
I guess I will have to sit and wait for the proofreading to be completed and then start the arduous process of going through the comments and suggestions. As ever I will keep you posted as how things go.
Ah! By the way, I will start a mailing list to tell people when the book is ready, so if you are interested, please let me know!
Keep in touch!
PS. The table of contents is also now available at CRC Press here.