Advanced Data Science and Analytics with Python – Submitted!

There you go, the first checkpoint is completed: I have officially submitted the completed version of “Advanced Data Science and Analytics with Python”.

The book has been some time in the making (and in the thinking…). It is a follow up from my previous book, imaginatively called “Data Science and Analytics with Python” . The book covers aspects that were necessarily left out in the previous volume; however, the readers in mind are still technical people interested in moving into the data science and analytics world. I have tried to keep the same tone as in the first book, peppering the pages with some bits and bobs of popular culture, science fiction and indeed Monty Python puns. 

Advanced Data Science and Analytics with Python enables data scientists to continue developing their skills and apply them in business as well as academic settings. The subjects discussed in this book are complementary and a follow up from the topics discuss in Data Science and Analytics with Python. The aim is to cover important advanced areas in data science using tools developed in Python such as SciKit-learn, Pandas, Numpy, Beautiful Soup, NLTK, NetworkX and others. The development is also supported by the use of frameworks such as Keras, TensorFlow and Core ML, as well as Swift for the development of iOS and MacOS applications.

The book can be read independently form the previous volume and each of the chapters in this volume is sufficiently independent from the others proving flexibiity for the reader. Each of the topics adressed in the book tackles the data science workflow from a practical perspective, concentrating on the process and results obtained. The implementation and deployment of trained models are central to the book

Time series analysis, natural language processing, topic modelling, social network analysis, neural networds and deep learning are comprehensively covrered in the book. The book discusses the need to develop data products and tackles the subject of bringing models to their intended audiences. In this case literally to the users fingertips in the form of an iPhone app.

While the book is still in the oven, you may want to take a look at the first volume. You can get your copy here:

Furthermore you can see my Author profile here.

ODSC Europe 2019

It was a pleasure to come to the opening day of ODSC Europe 2019. This time round I was the first speaker of the first session, and it was very apt as the talk was effectively an introduction to Data Science.

The next 4 days will be very hectic for the attendees and it the quality is similar to the previous editions we are going to have a great time.

Natural Language Processing – Talk

Last October I had the great opportunity to come and give a talk at the Facultad de Ciencias Políticas, UAEM, México. The main audience were students of the qualitative analysis methods course, but there were people also from informatics and systems engineering.

It was an opportunity to showcase some of the advances that natural language processing offers to social scientists interested in analysing discourse, from politics through to social interactions.

The talk covered a introduction and brief history of the field. We went through the different stages of the analysis, from reading the data, obtaining tokens and labelling their part of speech (POS) and then looking at syntactic and semantic analysis.

We finished the session with a couple of demos. One looking at speeches of Clinton and Trump during their presidential campaigns; the other one was a simple analysis of a novel in Spanish.

Thanks for the invite.

“Advanced Data Science And Analytics” is finished!

It has been a few months of writing, testing, re-writing and starting again, and I am pleased to say that the first complete draft of “Advanced Data Science and Analytics with Python” is ready. Last chapter is done and starting revisions now. Yay!

Adding new conda environment kernel to Jupyter and nteract

I know there are a ton of posts out there covering this very topic. I am writing this post more for my out benefit, so that I have a reliable place to check the commands I need to add a new conda environment to my Jupyter and nteract IDEs.

First to create an environment that contains, say TensorFlow, Pillow, Keras and pandas we need to type the following in the command line:

$ conda create -n tensorflow_env tensorflow pillow keras pandas jupyter ipykernel nb_conda

Now, to add this to the list of available environments in either Jupyter or nteract, we type the following:

$ conda activate tensor_env

$ python -m ipykernel install --name tensorflow_env


$ conda deactivate


Et voilà, you should now see the environment in the dropdown menu!

Data Science and Analytics with Python – Social Network Analysis

Using the time wisely during the Bank Holiday weekend. As my dad would say, “resting while making bricks”… Currently reviewing/editing/correcting Chapter 3 of “Advanced Data Science and Analytics with Python”. Yes, that is volume 2 of “Data Science and Analytics with Python“.

NSA_jrs.jpg

ODSC – Introduction to Data Science – A Practical Viewpoint

Very pleased to have the opportunity to share some thoughts with the keen audience attending the ODSC Europe 2018

My talk is not a technical presentation, as many of the other ones in the conference have been. Instead I wanted to present a workshop-style session that gives us the opportunity to interact with each other, share experiences and learn best practice in data science. The audience in mind is varied, from newcomers to the field to experienced practitioners.  You can find a handout of the slides in the link below:

Download (PDF, Unknown)

nteract – a great Notebook experience

I am a supporter of using Jupyter Notebooks for data exploration and code prototyping. It is a great way to start writing code and immediately get interactive feedback. Not only can you document your code there using markdown, but also you can embed images, plots, links and bring your work to life.

Nonetheless, there are some little annoyances that I have, for instance the fact that I need to launch a Kernel to open a file and having to do that “the long way” – i.e. I cannot double-click on the file that I am interested in seeing. Some ways to overcome this include looking at Gihub versions of my code as the notebooks are rendered automatically, or even saving HTML or PDF versions of the notebooks. I am sure some of you may have similar solutions for this.

Last week, while looking for entries on something completely different, I stumbled upon a post that suggested using nteract. It sounded promising and I took a look. It turned out to be related to the Hydrogen package available for Atom, something I have used in the past and loved it. nteract was different though as it offered a desktop version and other goodies such as in-app support for publishing, a terminal-free experience sticky cells, input and output hiding… Bring it on!

I just started using it, and so far so good. You may want to give it a try, and maybe even contribute to the git repo.

nteract_screenshot.jpg