Data Science and Analytics with Python – Video

This is a video I made for my publisher about my book “Data Science and Analytics with Python”. You can get the book here and more about the book here.

The book provides an introduction to some of the most used algorithms in data science and analytics. This book is the result of very interesting discussions, debates and dialogues with a large number of people at various levels of seniority, working at startups as well as long-established businesses, and in a variety of industries, from science to media to finance.

“Data Science and Analytics with Python” is intended to be a companion to data analysts and budding data scientists that have some working experience with both programming and statistical modelling, but who have not necessarily delved into the wonders of data analytics and machine learning. The book uses Python as a tool to implement and exploit some of the most common algorithms used in data science and data analytics today.

Python is a popular and versatile scripting and object-oriented language, it is easy to use and has a large active community of developers and enthusiasts, not to mention the richness oall of this helped by the versatility of the iPython/Jupyter Notebook.

In the book I address the balance between the knowledge required by a data scientist sucha as mathematics and computer science, with the need for a good business background. To tackle the prevailing image of a unicorn data scientist, I am convinced that the use of a new symbol is needed. And a silly one at that! There is an allegory I usually propose to colleagues and those that talk about the data science Unicorn. It seems to me to be a more appropriate one than the existing image: It is still another mythical creature, less common perhaps than the unicorn, but more importantly with some faint fact about its actual existence: a Jackalope. You will have to read the book to find out more!

The main purpose of the book is to present the reader with some of the main concepts used in data science and analytics using tools developed in Python such as Scikit-learn, Pandas, Numpy and others. The book is intended to be a bridge to the data science and analytics world for programmers and developers, as well as graduates in scientific areas such as mathematics, physics, computational biology and engineering, to name a few.

The material covered includes machine learning and pattern recognition, various regression techniques, classification algorithms, decision tree and hierarchical clustering, and dimensionality reduction. Though this text is not recommended for those just getting started with computer programming,

There are a number of topics that were not covered in this book. If you are interested in more advanced topics take a look at my book called “Advanced Data Science and Analytics with Python”. There is a follow up video for that one! Keep en eye out for that!

Related Content: Please take a look at other videos about my books:

Sci-Advent – Artificial Intelligence, High Performance Computing and Gravitational Waves

In a recent paper published in the ArXiV, researchers have highlighted the advantages that artificial intelligence techniques bring to the research of fields such as astrophysics. They are making their models available and that is always a great thing to see. They mention the use of these techniques to detect binary neutron stars, and to forecast the merger of multi-messenger sources, such as binary neutron stars and neutron star-black hole systems. Here are some highlights from the paper:

Finding new ways to use artificial intelligence (AI) to accelerate the analysis of gravitational wave data, and ensuring the developed models are easily reusable promises to unlock new opportunities in multi-messenger astrophysics (MMA), and to enable wider use, rigorous validation, and sharing of developed models by the community. In this work, we demonstrate how connecting recently deployed DOE and NSF-sponsored cyberinfrastructure allows for new ways to publish models, and to subsequently deploy these models into applications using computing platforms ranging from laptops to high performance computing clusters. We develop a workflow that connects the Data and Learning Hub for Science (DLHub), a repository for publishing machine learning models, with the Hardware Accelerated Learning (HAL) deep learning computing cluster, using funcX as a universal distributed computing service. We then use this workflow to search for binary black hole gravitational wave signals in open source advanced LIGO data. We find that using this workflow, an ensemble of four openly available deep learning models can be run on HAL and process the entire month of August 2017 of advanced LIGO data in just seven minutes, identifying all four binary black hole mergers previously identified in this dataset, and reporting no misclassifications. This approach, which combines advances in AI, distributed computing, and scientific data infrastructure opens new pathways to conduct reproducible, accelerated, data-driven gravitational wave detection.

Research and development of AI models for gravitational wave astrophysics is evolving at a rapid pace. In less than four years, this area of research has evolved from disruptive prototypes into sophisticated AI algorithms that describe the same 4-D signal manifold as traditional gravitational wave detection pipelines for binary black hole mergers, namely, quasi-circular, spinning, non- precessing, binary systems; have the same sensitivity as template matching algorithms; and are orders of magnitude faster, at a fraction of the computational cost.

AI models have been proven to effectively identify real gravitational wave signals in advanced LIGO data, including binary black hole and neutron stars mergers. The current pace of progress makes it clear that the broader community will continue to advance the development of AI tools to realize the science goals of Multi-Messenger Astrophysics.

Furthermore, mirroring the successful approach of corporations leading AI innovation in industry and technology, we are releasing our AI models to enable the broader community to use and perfect them. This approach is also helpful to address healthy and constructive skepticism from members of the community who do not feel at ease using AI algorithms.

Sci-Advent – Artificial intelligence improves control of powerful plasma accelerators

This is a reblog of the post by Hayley Dunning in the Imperial College website. See the original here.

Researchers have used AI to control beams for the next generation of smaller, cheaper accelerators for research, medical and industrial applications.

Electrons are ejected from the plasma accelerator at almost the speed of light, before being passed through a magnetic field which separates the particles by their energy. They are then fired at a fluorescent screen, shown here

Experiments led by Imperial College London researchers, using the Science and Technology Facilities Council’s Central Laser Facility (CLF), showed that an algorithm was able to tune the complex parameters involved in controlling the next generation of plasma-based particle accelerators.

The techniques we have developed will be instrumental in getting the most out of a new generation of advanced plasma accelerator facilities under construction within the UK and worldwide.Dr Rob Shalloo

The algorithm was able to optimize the accelerator much more quickly than a human operator, and could even outperform experiments on similar laser systems.

These accelerators focus the energy of the world’s most powerful lasers down to a spot the size of a skin cell, producing electrons and x-rays with equipment a fraction of the size of conventional accelerators.

The electrons and x-rays can be used for scientific research, such as probing the atomic structure of materials; in industrial applications, such as for producing consumer electronics and vulcanised rubber for car tyres; and could also be used in medical applications, such as cancer treatments and medical imaging.

Broadening accessibility

Several facilities using these new accelerators are in various stages of planning and construction around the world, including the CLF’s Extreme Photonics Applications Centre (EPAC) in the UK, and the new discovery could help them work at their best in the future. The results are published today in Nature Communications.

First author Dr Rob Shalloo, who completed the work at Imperial and is now at the accelerator centre DESY, said: “The techniques we have developed will be instrumental in getting the most out of a new generation of advanced plasma accelerator facilities under construction within the UK and worldwide.

“Plasma accelerator technology provides uniquely short bursts of electrons and x-rays, which are already finding uses in many areas of scientific study. With our developments, we hope to broaden accessibility to these compact accelerators, allowing scientists in other disciplines and those wishing to use these machines for applications, to benefit from the technology without being an expert in plasma accelerators.”

The outside of the vacuum chamber

First of its kind

The team worked with laser wakefield accelerators. These combine the world’s most powerful lasers with a source of plasma (ionised gas) to create concentrated beams of electrons and x-rays. Traditional accelerators need hundreds of metres to kilometres to accelerate electrons, but wakefield accelerators can manage the same acceleration within the space of millimetres, drastically reducing the size and cost of the equipment.

However, because wakefield accelerators operate in the extreme conditions created when lasers are combined with plasma, they can be difficult to control and optimise to get the best performance. In wakefield acceleration, an ultrashort laser pulse is driven into plasma, creating a wave that is used to accelerate electrons. Both the laser and plasma have several parameters that can be tweaked to control the interaction, such as the shape and intensity of the laser pulse, or the density and length of the plasma.

While a human operator can tweak these parameters, it is difficult to know how to optimise so many parameters at once. Instead, the team turned to artificial intelligence, creating a machine learning algorithm to optimise the performance of the accelerator.

The algorithm set up to six parameters controlling the laser and plasma, fired the laser, analysed the data, and re-set the parameters, performing this loop many times in succession until the optimal parameter configuration was reached.

Lead researcher Dr Matthew Streeter, who completed the work at Imperial and is now at Queen’s University Belfast, said: “Our work resulted in an autonomous plasma accelerator, the first of its kind. As well as allowing us to efficiently optimise the accelerator, it also simplifies their operation and allows us to spend more of our efforts on exploring the fundamental physics behind these extreme machines.”

Future designs and further improvements

The team demonstrated their technique using the Gemini laser systemat the CLF, and have already begun to use it in further experiments to probe the atomic structure of materials in extreme conditions and in studying antimatter and quantum physics.

The data gathered during the optimisation process also provided new insight into the dynamics of the laser-plasma interaction inside the accelerator, potentially informing future designs to further improve accelerator performance.

The experiment was led by Imperial College London researchers with a team of collaborators from the Science and Technology Facilities Council (STFC), the York Plasma Institute, the University of Michigan, the University of Oxford and the Deutsches Elektronen-Synchrotron (DESY). It was funded by the UK’s STFC, the EU Horizon 2020 research and innovation programme, the US National Science Foundation and the UK’s Engineering and Physical Sciences Research Council.

Automation and control of laser wakefield accelerators using Bayesian optimisation’ by R.J. Shalloo et al. is published in Nature Communications.

Advanced Data Science and Analytics with Python – Submitted!

There you go, the first checkpoint is completed: I have officially submitted the completed version of “Advanced Data Science and Analytics with Python”.

The book has been some time in the making (and in the thinking…). It is a follow up from my previous book, imaginatively called “Data Science and Analytics with Python” . The book covers aspects that were necessarily left out in the previous volume; however, the readers in mind are still technical people interested in moving into the data science and analytics world. I have tried to keep the same tone as in the first book, peppering the pages with some bits and bobs of popular culture, science fiction and indeed Monty Python puns. 

Advanced Data Science and Analytics with Python enables data scientists to continue developing their skills and apply them in business as well as academic settings. The subjects discussed in this book are complementary and a follow up from the topics discuss in Data Science and Analytics with Python. The aim is to cover important advanced areas in data science using tools developed in Python such as SciKit-learn, Pandas, Numpy, Beautiful Soup, NLTK, NetworkX and others. The development is also supported by the use of frameworks such as Keras, TensorFlow and Core ML, as well as Swift for the development of iOS and MacOS applications.

The book can be read independently form the previous volume and each of the chapters in this volume is sufficiently independent from the others proving flexibiity for the reader. Each of the topics adressed in the book tackles the data science workflow from a practical perspective, concentrating on the process and results obtained. The implementation and deployment of trained models are central to the book

Time series analysis, natural language processing, topic modelling, social network analysis, neural networds and deep learning are comprehensively covrered in the book. The book discusses the need to develop data products and tackles the subject of bringing models to their intended audiences. In this case literally to the users fingertips in the form of an iPhone app.

While the book is still in the oven, you may want to take a look at the first volume. You can get your copy here:

Furthermore you can see my Author profile here.

ODSC Europe 2019

It was a pleasure to come to the opening day of ODSC Europe 2019. This time round I was the first speaker of the first session, and it was very apt as the talk was effectively an introduction to Data Science.

The next 4 days will be very hectic for the attendees and it the quality is similar to the previous editions we are going to have a great time.

Adding new conda environment kernel to Jupyter and nteract

I know there are a ton of posts out there covering this very topic. I am writing this post more for my out benefit, so that I have a reliable place to check the commands I need to add a new conda environment to my Jupyter and nteract IDEs.

First to create an environment that contains, say TensorFlow, Pillow, Keras and pandas we need to type the following in the command line:

$ conda create -n tensorflow_env tensorflow pillow keras pandas jupyter ipykernel nb_conda

Now, to add this to the list of available environments in either Jupyter or nteract, we type the following:

$ conda activate tensor_env

$ python -m ipykernel install --name tensorflow_env


$ conda deactivate


Et voilà, you should now see the environment in the dropdown menu!

What Is Artificial Intelligence?

Original article by JF Puget here.

Here is a question I was asked to discuss at a conference last month: what is Artifical Intelligence (AI)?  Instead of trying to answer it, which could take days, I decided to focus on how AI has been defined over the years.  Nowadays, most people probably equate AI with deep learning.  This has not always been the case as we shall see.

Most people say that AI was first defined as a research field in a 1956 workshop at Dartmouth College.  Reality is that is has been defined 6 years earlier by Alan Turing in 1950.  Let me cite Wikipedia here:

The Turing test, developed by Alan Turing in 1950, is a test of a machine’s ability to exhibit intelligent behaviorequivalent to, or indistinguishable from, that of a human. Turing proposed that a human evaluator would judge natural language conversations between a human and a machine designed to generate human-like responses. The evaluator would be aware that one of the two partners in conversation is a machine, and all participants would be separated from one another. The conversation would be limited to a text-only channel such as a computer keyboard and screen so the result would not depend on the machine’s ability to render words as speech.[2] If the evaluator cannot reliably tell the machine from the human, the machine is said to have passed the test. The test does not check the ability to give correct answers to questions, only how closely answers resemble those a human would give.

The test was introduced by Turing in his paper, “Computing Machinery and Intelligence“, while working at the University of Manchester(Turing, 1950; p. 460).[3] It opens with the words: “I propose to consider the question, ‘Can machines think?'” Because “thinking” is difficult to define, Turing chooses to “replace the question by another, which is closely related to it and is expressed in relatively unambiguous words.”[4] Turing’s new question is: “Are there imaginable digital computers which would do well in the imitation game?”[5] This question, Turing believed, is one that can actually be answered. In the remainder of the paper, he argued against all the major objections to the proposition that “machines can think”.[6]

image

So, the first definition of AI was about thinking machines.  Turing decided to test thinking via a chat.

The definition of AI rapidly evolved to include the ability to perform complex reasoning and planing tasks.  Early success in the 50s led prominent researchers to make imprudent predictions about how AI would become a reality in the 60s.  The lack of realization of these predictions led to funding cut known as the AI winter in the 70s.

In the early 80s, building on some success for medical diagnosis, AI came back with expert systems.  These systems were trying to capture the expertise of humans in various domains, and were implemented as rule based systems.  This was the days were AI was focusing on the ability to perform tasks at best human expertise level.  Success like IBM Deep Blue beating the chess world champion, Gary Kasparov, in  1997 was the acme of this line of AI research.

Let’s contrast this with today’s AI.  The focus is on perception: can we have systems that recognize what is in a picture, what is in a video, what is said in a sound track?  Rapid progress is underway for these tasks thanks to the use of deep learning.  Is it AI still?  Are we automating human thinking?  Reality is we are working on automating tasks that most humans can do without any thinking effort. Yet we see lots of bragging about AI being a reality when all we have is some ability to mimic human perception.  I really find it ironic that our definition of intelligence is that of mere perception  rather than thinking.

Granted, not all AI work today is about perception.  Work on natural language processing (e.g. translation) is a bit closer to reasoning than mere perception tasks described above.  Success like IBM Watson at Jeopardy, or Google AlphaGO at Go are two examples of the traditional AI aiming at replicate tasks performed by human experts.    The good news (to me at least) is that the progress is so rapid on perception that it will move from a research field to an engineering field in the coming years.  We will then see a re-positioning of researchers on other AI related topics such as reasoning and planning.  We’ll be closer to Turing’s initial view of AI.