Data Science Talk at University of Hertfordshire

It was great to invited to give the joint Physics Astronomy and Maths + Computer Science research seminar today at the University of Hertfordshire. I had a good opportunity to meet old colleagues and meet new faculty. There were also many students and they with many questions.

I was glad to hear they are thinking about offering more data science courses and even a dedicated programme. I would definitely be interested to hear more about that.

Advanced Data Science and Analytics with Python – Submitted!

There you go, the first checkpoint is completed: I have officially submitted the completed version of “Advanced Data Science and Analytics with Python”.

The book has been some time in the making (and in the thinking…). It is a follow up from my previous book, imaginatively called “Data Science and Analytics with Python” . The book covers aspects that were necessarily left out in the previous volume; however, the readers in mind are still technical people interested in moving into the data science and analytics world. I have tried to keep the same tone as in the first book, peppering the pages with some bits and bobs of popular culture, science fiction and indeed Monty Python puns. 

Advanced Data Science and Analytics with Python enables data scientists to continue developing their skills and apply them in business as well as academic settings. The subjects discussed in this book are complementary and a follow up from the topics discuss in Data Science and Analytics with Python. The aim is to cover important advanced areas in data science using tools developed in Python such as SciKit-learn, Pandas, Numpy, Beautiful Soup, NLTK, NetworkX and others. The development is also supported by the use of frameworks such as Keras, TensorFlow and Core ML, as well as Swift for the development of iOS and MacOS applications.

The book can be read independently form the previous volume and each of the chapters in this volume is sufficiently independent from the others proving flexibiity for the reader. Each of the topics adressed in the book tackles the data science workflow from a practical perspective, concentrating on the process and results obtained. The implementation and deployment of trained models are central to the book

Time series analysis, natural language processing, topic modelling, social network analysis, neural networds and deep learning are comprehensively covrered in the book. The book discusses the need to develop data products and tackles the subject of bringing models to their intended audiences. In this case literally to the users fingertips in the form of an iPhone app.

While the book is still in the oven, you may want to take a look at the first volume. You can get your copy here:

Furthermore you can see my Author profile here.

ODSC Europe 2019

It was a pleasure to come to the opening day of ODSC Europe 2019. This time round I was the first speaker of the first session, and it was very apt as the talk was effectively an introduction to Data Science.

The next 4 days will be very hectic for the attendees and it the quality is similar to the previous editions we are going to have a great time.

Natural Language Processing – Talk

Last October I had the great opportunity to come and give a talk at the Facultad de Ciencias Políticas, UAEM, México. The main audience were students of the qualitative analysis methods course, but there were people also from informatics and systems engineering.

It was an opportunity to showcase some of the advances that natural language processing offers to social scientists interested in analysing discourse, from politics through to social interactions.

The talk covered a introduction and brief history of the field. We went through the different stages of the analysis, from reading the data, obtaining tokens and labelling their part of speech (POS) and then looking at syntactic and semantic analysis.

We finished the session with a couple of demos. One looking at speeches of Clinton and Trump during their presidential campaigns; the other one was a simple analysis of a novel in Spanish.

Thanks for the invite.

“Advanced Data Science And Analytics” is finished!

It has been a few months of writing, testing, re-writing and starting again, and I am pleased to say that the first complete draft of “Advanced Data Science and Analytics with Python” is ready. Last chapter is done and starting revisions now. Yay!

Data Science and Analytics with Python – Social Network Analysis

Using the time wisely during the Bank Holiday weekend. As my dad would say, “resting while making bricks”… Currently reviewing/editing/correcting Chapter 3 of “Advanced Data Science and Analytics with Python”. Yes, that is volume 2 of “Data Science and Analytics with Python“.


2018 – A review

It is that time of year when we have an opportunity to look back and see what we have achieved while taking an opportunity to see what the next year will bring. This may be of interest just to me, so please accept my apologies… Here we go: 

In no particular order:

  • I signed up with my publisher Taylor & Francis to write a volume 2 for my “Data Science and Analytics with Python” book
  • During the year I had a opportunities to attend some great events such as the EGG Conference by Dataiku or the BBC Machine Learning Fireside Chats as well as multiple events with the Turing Institute
  • I continued delivering training at General Assembly, reaching out to people interested in learning more about Python and Data Science. It has been an interesting year and it is great to see what former students are currently doing with the skills learnt
  • The work delivered for companies such as Louis Vuitton, Volvo, Foster & Partners, and others was fantastic. I am also very proud to have tackled some strategy work for the Mayo Clinic and deliver a presentation in a lecture theatre at Mayo
  • I contributed to some open source software projects
  • It was a busy year in terms of speaking engagements having delivered keynotes at Entrepares 2018 and the IV Seminario de Periodismo Iberoamericano de Ciencia Tecnología e Innovación both in Puebla, Mexico. I also ran an Introduction to Data Science workshop at ODSC18 in London and an Introduction to Python at Entrepares 2018. I gave a talk about Data Science Practices at Google Campus in London. The interactive Q&A session was an fun way to answer queries from the audience. I also was a member in various debate panels
  • I rekindled playing board games with a couple of good friends of mine, and it has been a geeky blast!
  • I started a new role and still looking to get my foot through the door with Apple
  • I’ve been delving more into Machine Learning systems and platforms, learning about interpretability, reliability, monitoring, and more. There is still plenty more to learn
  • I met Chris Robshaw and attended a bunch of rugby matches through the year

Looking forward to 2019, learning and developing more.

Data Illustrator and Charticulator – Reblog

Reblog from here.

New tools: Data Illustrator and Charticulator

Posted on August 31, 2018 by 5wgraphicsblog

Anyone interested in creating their own data visualizations should be giddy with delight with the quickly growing number of tools available to create them without any need for programming skills, and in most cases for free: Tableau, Flourish, Datawrapper, RawGraphs, Chartbuilder or QGIS (for mapping) are some of the best, and the list goes on and on. I’m convinced in a relatively short time drag and drop tools with be as powerful and flexible as D3.js and other developer tools, making data visualization accesible to everyone.

The exciting news is seeing two software giants entering the field with new web-based tools: Adobe launched Data Illustrator a few months ago in a collaboration with the Georgia Institute of Technology, and Microsoft Research is behind the just released Charticulator. Both work very intuitively, allowing the author to bind multiple attributes of data to graphical elements. They are indeed powered by D3.js, among other libraries.

Both offer introduction videos in their hope pages. Here is Data Illustrator:

And here is Charticulator:

The tools offer tutorial sections and multiple step-by-step videos in their galleries; and they link to the research papers describing the tools, which are worth reading (Data Illustrator, Charticulator).

Creating complex visualizations like the chord diagram below seems ridiculously simple in Charticulator, and the same can be said of Data Illustrator’s visualizations. See the video:

This is not a review as I have just started playing with them, but on first look both tools are impressive. It’s still really early in their development, but if Adobe and Microsoft throw their mighty resources to support and improve them, we can expect great things in the near future. Perhaps one day Data Illustrator could be embedded within Adobe Illustrator, allowing designers to work fluidly and easily between D3 and Illustrator without leaving the graphical interface. And Charticulator could integrate into PowerPoint. Stay tuned!

Top Free Books for Deep Learning

This collection includes books on all aspects of deep learning. It begins with titles that cover the subject as a whole, before moving onto work that should help beginners expand their knowledge from machine learning to deep learning. The list concludes with books that discuss neural networks, both titles that introduce the topic and ones that go in-depth, covering the architecture of such networks.

1. Deep Learning
By Ian Goodfellow, Yoshua Bengio and Aaron Courville

The Deep Learning textbook is a resource intended to help students and practitioners enter the field of machine learning in general and deep learning in particular. The online version of the book is now complete and will remain available online for free.

2. Deep Learning Tutorial
By LISA Lab, University of Montreal

Developed by LISA lab at University of Montreal, this free and concise tutorial presented in the form of a book explores the basics of machine learning. The book emphasizes with using the Theano library (developed originally by the university itself) for creating deep learning models in Python.

3. Deep Learning: Methods and Applications
By Li Deng and Dong Yu

This book provides an overview of general deep learning methodology and its applications to a variety of signal and information processing tasks.

4. First Contact with TensorFlow, get started with Deep Learning Programming
By Jordi Torres

This book is oriented to engineers with only some basic understanding of Machine Learning who want to expand their wisdom in the exciting world of Deep Learning with a hands-on approach that uses TensorFlow.

5. Neural Networks and Deep Learning
By Michael Nielsen

This book teaches you about Neural networks, a beautiful biologically-inspired programming paradigm which enables a computer to learn from observational data. It also covers deep learning, a powerful set of techniques for learning in neural networks.

6. A Brief Introduction to Neural Networks
By David Kriesel

This title covers Neural networks in depth. Neural networks are a bio-inspired mechanism of data processing, that enables computers to learn technically similar to a brain and even generalize once solutions to enough problem instances are taught. Available in English and German.

7. Neural Network Design (2nd edition)
By Martin T. Hagan, Howard B. Demuth, Mark H. Beale and Orlando D. Jess

NEURAL NETWORK DESIGN (2nd Edition) provides a clear and detailed survey of fundamental neural network architectures and learning rules. In it, the authors emphasize a fundamental understanding of the principal neural networks and the methods for training them. The authors also discuss applications of networks to practical engineering problems in pattern recognition, clustering, signal processing, and control systems. Readability and natural flow of material is emphasized throughout the text.

8. Neural Networks and Learning Machines (3rd edition)
By Simon Haykin

This third edition of Simon Haykin’s book provides an up-to-date treatment of neural networks in a comprehensive, thorough and readable manner, split into three sections. The book begins by looking at the classical approach on supervised learning, before continuing on to kernel methods based on radial-basis function (RBF) networks. The final part of the book is devoted to regularization theory, which is at the core of machine learning.