Random thoughts about random subjects… From science to literature and between manga and watercolours, passing by data science and rugby; including film, physics and fiction, programming, pictures and puns.
The new book aims to present the reader with concepts in data science and analytics that were deemed to be more advanced or simply out of scope in the author’s first book, and are used in data analytics using tools developed in Python such as SciKit Learn, Pandas, Numpy, etc. The use of Python is of particular benefit given its recent popularity in the data science community. The book is therefore a reference to be used by seasoned programmers and newcomers alike and the key benefit is the practical approach presented throughout the book
Super excited to have received the proofread version of Advanced Data Science and Analytics with Python. They all seem to be very straightforward corrections: a few missing commas, some italics here and there and capitalisation bits and bobs.
I hope to be able to finish the corrections before my deadline for March 25th, and then enter the last phase before publication in May 2020.
If you are interested in #DataScience you surely have heard of #pandas and you would be pleased to hear that version 1.0 finally out. With better integration with bumpy and improvements with numba among others. Take a look!
— Read on www.anaconda.com/pandas-1-0-is-here/
There you go, the first checkpoint is completed: I have officially submitted the completed version of “Advanced Data Science and Analytics with Python”.
The book has been some time in the making (and in the thinking…). It is a follow up from my previous book, imaginatively called “Data Science and Analytics with Python” . The book covers aspects that were necessarily left out in the previous volume; however, the readers in mind are still technical people interested in moving into the data science and analytics world. I have tried to keep the same tone as in the first book, peppering the pages with some bits and bobs of popular culture, science fiction and indeed Monty Python puns.
Advanced Data Science and Analytics with Python enables data scientists to continue developing their skills and apply them in business as well as academic settings. The subjects discussed in this book are complementary and a follow up from the topics discuss in Data Science and Analytics with Python. The aim is to cover important advanced areas in data science using tools developed in Python such as SciKit-learn, Pandas, Numpy, Beautiful Soup, NLTK, NetworkX and others. The development is also supported by the use of frameworks such as Keras, TensorFlow and Core ML, as well as Swift for the development of iOS and MacOS applications.
The book can be read independently form the previous volume and each of the chapters in this volume is sufficiently independent from the others proving flexibiity for the reader. Each of the topics adressed in the book tackles the data science workflow from a practical perspective, concentrating on the process and results obtained. The implementation and deployment of trained models are central to the book
Time series analysis, natural language processing, topic modelling, social network analysis, neural networds and deep learning are comprehensively covrered in the book. The book discusses the need to develop data products and tackles the subject of bringing models to their intended audiences. In this case literally to the users fingertips in the form of an iPhone app.
While the book is still in the oven, you may want to take a look at the first volume. You can get your copy here:
Last October I had the great opportunity to come and give a talk at the Facultad de Ciencias Políticas, UAEM, México. The main audience were students of the qualitative analysis methods course, but there were people also from informatics and systems engineering.
It was an opportunity to showcase some of the advances that natural language processing offers to social scientists interested in analysing discourse, from politics through to social interactions.
The talk covered a introduction and brief history of the field. We went through the different stages of the analysis, from reading the data, obtaining tokens and labelling their part of speech (POS) and then looking at syntactic and semantic analysis.
We finished the session with a couple of demos. One looking at speeches of Clinton and Trump during their presidential campaigns; the other one was a simple analysis of a novel in Spanish.
I know there are a ton of posts out there covering this very topic. I am writing this post more for my out benefit, so that I have a reliable place to check the commands I need to add a new conda environment to my Jupyter and nteract IDEs.
First to create an environment that contains, say TensorFlow, Pillow, Keras and pandas we need to type the following in the command line:
Using the time wisely during the Bank Holiday weekend. As my dad would say, “resting while making bricks”… Currently reviewing/editing/correcting Chapter 3 of “Advanced Data Science and Analytics with Python”. Yes, that is volume 2 of “Data Science and Analytics with Python“.
On my way back to London and making the most of the time in the train to work on my Data Science and Analytics Vol 2 book. Working with #StarWars data to explain Social Network Analysis #datascience #geek
Working with dates and times in programming can be a painful test at times. In Python, there are some excellent libraries that help with all the pain, and recently I became aware of Pendulum. It is effectively are replacement for the standard datetime class and it has a number of improvements. Check out the documentation for further information.
Installation of the packages is straightforward with pip:
$ pip install pendulum
For example, some simple manipulations involving time zones:
now = pendulum.now('Europe/Paris')
# Changing timezone
# Default support for common datetime formats
Duration can be used as a replacement for the standard timedelta class:
It also supports the definition of a period, i.e. a duration that is aware of the DateTime instances that created it. For example:
dt1 = pendulum.now()
dt2 = dt1.add(days=3)
# A period is the difference between 2 instances
period = dt2 - dt1
# A period is iterable
for dt in period:
Give it a go, and let me know what you think of it.