Advanced Data Science and Analytics with Python – Video

Hello again this is a video I recorded for my publisher about my book “Advanced Data Science and Analytics with Python”. This is a video I made for my publisher about my book “Data Science and Analytics with Python”. You can get the book here and more about the book here.

This companion to “Data Science and Analytics with Python” is the result of arguments with myself about writing something to cover a few of the areas that were not included in that first volume, largely due to space/time constraints. Like the previous book, this one exists thanks to the discussions, stand-ups, brainstorms and eventual implementations of algorithms and data science projects carried out with many colleagues and friends.

As the title suggests, this book continues to use Python as a tool to train, test and implement machine learning models and algorithms. The book is aimed at data scientists who would like to continue developing their skills and apply them in business and academic settings.

The subjects discussed in this book are complementary and a follow-up to the ones covered in Volume 1. The intended audience for this book is still composed of data analysts and early-career data scientists with some experience in programming and with a background in statistical modelling. In this case, however, the expectation is that they have already covered some areas of machine learning and data analytics. The subjects discussed in this book are complementary and a follow-up to the topics discussed in “Data Science and Analytics with Python”. Although there are some references to the previous book, this volume is written to be read independently.

I have tried to keep the same tone as in the first book, peppering the pages with some bits and bobs of popular culture, science fiction and indeed Monty Python puns. The aim is still to focus on showing the concepts and ideas behind popular algorithms and their use.

In summary, “Advanced Data Science and Analytics with Python” presents each of the topics addressed in the book tackles the data science workflow from a practical perspective, concentrating on the process and results obtained. The material covered includes machine learning and pattern recognition algorithms including: Time series analysis, natural language processing, topic modelling, social network analysis, neural networks and deep learning. The book discusses the need to develop data products and addresses the subject of bringing models to their intended audiences – in this case, literally to the users’ fingertips in the form of an iPhone app.

I hope you enjoy it and if you want to know more about my other books, please check the related videos here:

Sci-Advent – New study tests machine learning on detection of borrowed words in world languages

This is a reblog of a story in ScienceDaily. See the original here.

Underwhelming results underscore the complexity of language evolution while showing promise in some current applications

Researchers have investigated the ability of machine learning algorithms to identify lexical borrowings using word lists from a single language. Results show that current machine learning methods alone are insufficient for borrowing detection, confirming that additional data and expert knowledge are needed to tackle one of historical linguistics’ most pressing challenges.

Lexical borrowing, or the direct transfer of words from one language to another, has interested scholars for millennia, as evidenced already in Plato’s Kratylos dialogue, in which Socrates discusses the challenge imposed by borrowed words on etymological studies. In historical linguistics, lexical borrowings help researchers trace the evolution of modern languages and indicate cultural contact between distinct linguistic groups — whether recent or ancient. However, the techniques for identifying borrowed words have resisted formalization, demanding that researchers rely on a variety of proxy information and the comparison of multiple languages.

“The automated detection of lexical borrowings is still one of the most difficult tasks we face in computational historical linguistics,” says Johann-Mattis List, who led the study.

In the current study, researchers from PUCP and MPI-SHH employed different machine learning techniques to train language models that mimic the way in which linguists identify borrowings when considering only the evidence provided by a single language: if sounds or the ways in which sounds combine to form words are atypical when comparing them with other words in the same language, this often hints to recent borrowings. The models were then applied to a modified version of the World Loanword Database, a catalog of borrowing information for a sample of 40 languages from different language families all over the world, in order to see how accurately words within a given language would be classified as borrowed or not by the different techniques.

In many cases the results were unsatisfying, suggesting that loanword detection is too difficult for machine learning methods most commonly used. However, in specific situations, such as in lists with a high proportion of loanwords or in languages whose loanwords come primarily from a single donor language, the teams’ lexical language models showed some promise.

“After these first experiments with monolingual lexical borrowings, we can proceed to stake out other aspects of the problem, moving into multilingual and cross-linguistic approaches,” says John Miller of PUCP, the study’s co-lead author.

“Our computer-assisted approach, along with the dataset we are releasing, will shed a new light on the importance of computer-assisted methods for language comparison and historical linguistics,” adds Tiago Tresoldi, the study’s other co-lead author from MPI-SHH.

The study joins ongoing efforts to tackle one of the most challenging problems in historical linguistics, showing that loanword detection cannot rely on mono-lingual information alone. In the future, the authors hope to develop better-integrated approaches that take multi-lingual information into account.

Using lexical language models to detect borrowings in monolingual wordlistsPLOS ONE, 2020; 15 (12): e0242709 DOI: 10.1371/journal.pone.0242709

Sci-Advent – Artificial Intelligence, High Performance Computing and Gravitational Waves

In a recent paper published in the ArXiV, researchers have highlighted the advantages that artificial intelligence techniques bring to the research of fields such as astrophysics. They are making their models available and that is always a great thing to see. They mention the use of these techniques to detect binary neutron stars, and to forecast the merger of multi-messenger sources, such as binary neutron stars and neutron star-black hole systems. Here are some highlights from the paper:

Finding new ways to use artificial intelligence (AI) to accelerate the analysis of gravitational wave data, and ensuring the developed models are easily reusable promises to unlock new opportunities in multi-messenger astrophysics (MMA), and to enable wider use, rigorous validation, and sharing of developed models by the community. In this work, we demonstrate how connecting recently deployed DOE and NSF-sponsored cyberinfrastructure allows for new ways to publish models, and to subsequently deploy these models into applications using computing platforms ranging from laptops to high performance computing clusters. We develop a workflow that connects the Data and Learning Hub for Science (DLHub), a repository for publishing machine learning models, with the Hardware Accelerated Learning (HAL) deep learning computing cluster, using funcX as a universal distributed computing service. We then use this workflow to search for binary black hole gravitational wave signals in open source advanced LIGO data. We find that using this workflow, an ensemble of four openly available deep learning models can be run on HAL and process the entire month of August 2017 of advanced LIGO data in just seven minutes, identifying all four binary black hole mergers previously identified in this dataset, and reporting no misclassifications. This approach, which combines advances in AI, distributed computing, and scientific data infrastructure opens new pathways to conduct reproducible, accelerated, data-driven gravitational wave detection.

Research and development of AI models for gravitational wave astrophysics is evolving at a rapid pace. In less than four years, this area of research has evolved from disruptive prototypes into sophisticated AI algorithms that describe the same 4-D signal manifold as traditional gravitational wave detection pipelines for binary black hole mergers, namely, quasi-circular, spinning, non- precessing, binary systems; have the same sensitivity as template matching algorithms; and are orders of magnitude faster, at a fraction of the computational cost.

AI models have been proven to effectively identify real gravitational wave signals in advanced LIGO data, including binary black hole and neutron stars mergers. The current pace of progress makes it clear that the broader community will continue to advance the development of AI tools to realize the science goals of Multi-Messenger Astrophysics.

Furthermore, mirroring the successful approach of corporations leading AI innovation in industry and technology, we are releasing our AI models to enable the broader community to use and perfect them. This approach is also helpful to address healthy and constructive skepticism from members of the community who do not feel at ease using AI algorithms.

Sci-Advent – Artificial intelligence improves control of powerful plasma accelerators

This is a reblog of the post by Hayley Dunning in the Imperial College website. See the original here.

Researchers have used AI to control beams for the next generation of smaller, cheaper accelerators for research, medical and industrial applications.

Electrons are ejected from the plasma accelerator at almost the speed of light, before being passed through a magnetic field which separates the particles by their energy. They are then fired at a fluorescent screen, shown here

Experiments led by Imperial College London researchers, using the Science and Technology Facilities Council’s Central Laser Facility (CLF), showed that an algorithm was able to tune the complex parameters involved in controlling the next generation of plasma-based particle accelerators.

The techniques we have developed will be instrumental in getting the most out of a new generation of advanced plasma accelerator facilities under construction within the UK and worldwide.Dr Rob Shalloo

The algorithm was able to optimize the accelerator much more quickly than a human operator, and could even outperform experiments on similar laser systems.

These accelerators focus the energy of the world’s most powerful lasers down to a spot the size of a skin cell, producing electrons and x-rays with equipment a fraction of the size of conventional accelerators.

The electrons and x-rays can be used for scientific research, such as probing the atomic structure of materials; in industrial applications, such as for producing consumer electronics and vulcanised rubber for car tyres; and could also be used in medical applications, such as cancer treatments and medical imaging.

Broadening accessibility

Several facilities using these new accelerators are in various stages of planning and construction around the world, including the CLF’s Extreme Photonics Applications Centre (EPAC) in the UK, and the new discovery could help them work at their best in the future. The results are published today in Nature Communications.

First author Dr Rob Shalloo, who completed the work at Imperial and is now at the accelerator centre DESY, said: “The techniques we have developed will be instrumental in getting the most out of a new generation of advanced plasma accelerator facilities under construction within the UK and worldwide.

“Plasma accelerator technology provides uniquely short bursts of electrons and x-rays, which are already finding uses in many areas of scientific study. With our developments, we hope to broaden accessibility to these compact accelerators, allowing scientists in other disciplines and those wishing to use these machines for applications, to benefit from the technology without being an expert in plasma accelerators.”

The outside of the vacuum chamber

First of its kind

The team worked with laser wakefield accelerators. These combine the world’s most powerful lasers with a source of plasma (ionised gas) to create concentrated beams of electrons and x-rays. Traditional accelerators need hundreds of metres to kilometres to accelerate electrons, but wakefield accelerators can manage the same acceleration within the space of millimetres, drastically reducing the size and cost of the equipment.

However, because wakefield accelerators operate in the extreme conditions created when lasers are combined with plasma, they can be difficult to control and optimise to get the best performance. In wakefield acceleration, an ultrashort laser pulse is driven into plasma, creating a wave that is used to accelerate electrons. Both the laser and plasma have several parameters that can be tweaked to control the interaction, such as the shape and intensity of the laser pulse, or the density and length of the plasma.

While a human operator can tweak these parameters, it is difficult to know how to optimise so many parameters at once. Instead, the team turned to artificial intelligence, creating a machine learning algorithm to optimise the performance of the accelerator.

The algorithm set up to six parameters controlling the laser and plasma, fired the laser, analysed the data, and re-set the parameters, performing this loop many times in succession until the optimal parameter configuration was reached.

Lead researcher Dr Matthew Streeter, who completed the work at Imperial and is now at Queen’s University Belfast, said: “Our work resulted in an autonomous plasma accelerator, the first of its kind. As well as allowing us to efficiently optimise the accelerator, it also simplifies their operation and allows us to spend more of our efforts on exploring the fundamental physics behind these extreme machines.”

Future designs and further improvements

The team demonstrated their technique using the Gemini laser systemat the CLF, and have already begun to use it in further experiments to probe the atomic structure of materials in extreme conditions and in studying antimatter and quantum physics.

The data gathered during the optimisation process also provided new insight into the dynamics of the laser-plasma interaction inside the accelerator, potentially informing future designs to further improve accelerator performance.

The experiment was led by Imperial College London researchers with a team of collaborators from the Science and Technology Facilities Council (STFC), the York Plasma Institute, the University of Michigan, the University of Oxford and the Deutsches Elektronen-Synchrotron (DESY). It was funded by the UK’s STFC, the EU Horizon 2020 research and innovation programme, the US National Science Foundation and the UK’s Engineering and Physical Sciences Research Council.

Automation and control of laser wakefield accelerators using Bayesian optimisation’ by R.J. Shalloo et al. is published in Nature Communications.

SciAdvent – Machine Learning in Ear, Nose and Throat

This is a reblog of the article by Cian Hughes and Sumit Agrawal in ENTNews. See the original here.

Figure 1. (Left) CT scan of the right temporal bone. (Middle) Structures of the temporal bone automatically segmented using a TensorFlow based deep learning algorithm. (Right) Three-dimensional model of the critical structures of the temporal bone to be used for surgical planning and simulation. 
Images courtesy of the Auditory Biophysics Laboratory, Western University, London, Canada.

Machine learning in healthcare

Over the last five years there have been significant advances in high performance computing that have led to enormous scientific breakthroughs in the field of machine learning (a form of artificial intelligence), especially with regard to image processing and data analysis. 

These breakthroughs now affect multiple aspects of our lives, from the way our phone sorts and recognises photographs, to automated translation and transcription services, and have the potential to revolutionise the practice of medicine.

The most promising form of artificial intelligence used in medical applications today is deep learning. Deep learning is a type of machine learning in which deep neural networks are trained to identify patterns in data [1]. A common form of neural network used in image processing is a convolutional neural network (CNN). Initially developed for general-purpose visual recognition, it has shown considerable promise in, for instance, the detection and classification of disease on medical imaging.

“Machine learning algorithms have also been central to the development of multiple assistive technologies that can help patients to overcome or alleviate disabilities”

Automated image segmentation has numerous clinical applications, ranging from quantitative measurement of tissue volume, through surgical planning/guidance, medical education and even cancer treatment planning. It is hoped that such advances in automated data analysis will help in the delivery of more timely care, and alleviate workforce shortages in areas such as breast cancer screening [2], where patient demand for screening already outstrips the availability of specialist breast radiologists in many parts of the world.

Applications in otolaryngology

Artificial intelligence is quickly making its way into [our] specialty. Both otolaryngologists and audiologists will soon be incorporating this technology into their clinical practices. Machine learning has been used to automatically classify auditory brainstem responses [8] and estimate audiometric thresholds [9]. This has allowed for accurate online testing [10], which could be used for rural and remote areas without access to standard audiometry (see the article by Dr Matthew Bromwich here).

Machine learning algorithms have also been central to the development of multiple assistive technologies that can help patients to overcome or alleviate disabilities. For example, in the context of hearing loss, significant advances in automated transcription apps, driven by machine learning algorithms, have proven particularly useful in recent months for patients who find themselves unable to lipread due to the use of face coverings to prevent the spread of COVID-19.

Figure 2. The virtual reality simulator CardinalSim ( depicting 
a left mastoidectomy and facial recess approach. The facial nerve (yellow) and round window 
(blue) were automatically delineated using deep learning techniques. 
Image courtesy of the Auditory Biophysics Laboratory, Western University, London, Canada

In addition to their role in general image classification, CNNs are likely to play a significant role in the introduction of machine learning in healthcare, especially in image-heavy specialties such as otolaryngology. For otologists, deep learning algorithms can already identify detailed temporal bone structures from CT images [3-6], segment intracochlear anatomy [7], and identify individual cochlear implant electrodes [8] (Figure 1); automatic analysis of critical structures on temporal bone scans have already facilitated patient-specific virtual reality otologic surgery [9] (Figure 2). Deep learning will likely also be critical in customised cochlear implant programming in the future.

“Automatic analysis of critical structures on temporal bone scans have already facilitated patient-specific virtual reality otologic surgery”

Convolutional neural networks have also been used in rhinology to automatically delineate critical anatomy and quantify sinus opacification [10-12]. Deep learning networks have been used in head and neck oncology to automatically segment anatomic structures to accelerate radiotherapy planning [13-18]. For laryngologists, voice analysis software will likely incorporate machine learning classifiers to identify pathology as it has been shown to perform better than traditional rule-based algorithms [19].

Figure 3. Automated segmentation of organs at risk of damage from radiation during radiotherapy 
for head and neck cancer. Five axial slices from the scan of a 58-year-old male patient with a cancer 
of the right tonsil selected from the Head-Neck Cetuximab trial dataset (patient 0522c0416) [20,21]. 
Adapted with permission from the original authors [13].


In summary, artificial intelligence and, in particular, deep learning algorithms will radically change the way we manage patients within our careers. Although developed in high-resource settings, the technology has equally significant applications in low-resource settings to facilitate quality care even in the presence of limited human resources.

“Although developed in high-resource settings, the technology has equally significant applications in low-resource settings to facilitate quality care even in the presence of limited human resources”


1. Bengio Y, Courville A, Vincent P. Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell2013;35:1798-828. 
2. McKinney SM, Sieniek M, Shetty S. International evaluation of an AI system for breast cancer screening. Nature2020;577:89-94. 
3. Heutink F, Kock V, Verbist B, et al. Multi-Scale deep learning framework for cochlea localization, segmentation and analysis on clinical ultra-high-resolution CT images. Comput Methods Programs Biomed 2020;191:105387. 
4. Fauser J, Stenin I, Bauer M, et al. Toward an automatic preoperative pipeline for image-guided temporal bone surgery. Int J Comput Assist Radiol Surg 2019;14(6):967-76. 
5. Zhang D, Wang J, Noble JH, et al. Deep convolutional neural networks for accurate classification and multi-landmark localization of head CTs. Med Image Anal 2020;61:101659.
6. Nikan S, van Osch K, Bartling M, et al. PWD-3DNet: A deep learning-based fully-automated segmentation of multiple structures on temporal bone CT scans. Submitted to IEEE Trans Image Process.
7. Wang J, Noble JH, Dawant BM. Metal Artifact Reduction and Intra Cochlear Anatomy Segmentation Inct Images of the Ear With A Multi-Resolution Multi-Task 3D Network. IEEE 17th International Symposium on Biomedical Imaging (ISBI) 2020;596-9. 
8. Chi Y, Wang J, Zhao Y, et al. A Deep-Learning-Based Method for the Localization of Cochlear Implant Electrodes in CT Images. IEEE 16th International Symposium on Biomedical Imaging (ISBI) 2019;1141-5. 
9. Compton EC, et al. Assessment of a virtual reality temporal bone surgical simulator: a national face and content validity study. J Otolaryngol Head Neck Surg 2020;49:17. 
10. Laura CO, Hofmann P, Drechsler K, Wesarg S. Automatic Detection of the Nasal Cavities and Paranasal Sinuses Using Deep Neural Networks. IEEE 16th International Symposium on Biomedical Imaging (ISBI) 2019;1154-7. 
11. Iwamoto Y, Xiong K, Kitamura T, et al. Automatic Segmentation of the Paranasal Sinus from Computer Tomography Images Using a Probabilistic Atlas and a Fully Convolutional Network. Conf Proc IEEE Eng Med Biol Soc 2019;2789-92. 
12. Humphries SM, Centeno JP, Notary AM, et al. Volumetric assessment of paranasal sinus opacification on computed tomography can be automated using a convolutional neural network. Int Forum Allergy Rhinol 2020. 
13. Nikolov S, Blackwell S, Mendes R, et al. Deep learning to achieve clinically applicable segmentation of head and neck anatomy for radiotherapy. arXiv [cs.CV] 2018. 
14. Tong N, Gou S, Yang, S, et al. Fully automatic multi-organ segmentation for head and neck cancer radiotherapy using shape representation model constrained fully convolutional neural networks. Med Phys 2018;45;4558-67. 
15. Ibragimov B, Xing L. Segmentation of organs-at-risks in head and neck CT images using convolutional neural networks.Med Phys 2017;44:547-57. 
16. Vrtovec T, Močnik D, Strojan P, et al. B. Auto-segmentation of organs at risk for head and neck radiotherapy planning: from atlas-based to deep learning methods. Med Phys 2020.
17. Zhu W, Huang Y, Zeng L. et al. AnatomyNet: Deep learning for fast and fully automated whole-volume segmentation of head and neck anatomy. Med Phys 2019;46(2):576-89. 
18. Tong N, Gou S, Yang S, et al. Shape constrained fully convolutional DenseNet with adversarial training for multiorgan segmentation on head and neck CT and low-field MR images. Med Phys 2019;46:2669-82. 
19. Cesari U, De Pietro G, Marciano E, et al. Voice Disorder Detection via an m-Health System: Design and Results of a Clinical Study to Evaluate Vox4Health. Biomed Res Int 2018;8193694. 
20. Bosch WR, Straube WL, Matthews JW, Purdy JA. Data From Head-Neck_Cetuximab 2015. 
21. Clark K, Vendt B, Smith K, et al. The Cancer Imaging Archive (TCIA): maintaining and operating a public information repository. J Digit Imaging 2013;26:1045-5.

Sci-Advent – Getting the right grip: Designing soft and sensitive robotic fingers

person holding black and silver hand tool
Photo by C Technical on

To develop a more human-like robotic gripper, it is necessary to provide sensing capabilities to the fingers. However, conventional sensors compromise the mechanical properties of soft robots. Now, scientists have designed a 3D printable soft robotic finger containing a built-in sensor with adjustable stiffness. Their work represents a big step toward safer and more dexterous robotic handling, which will extend the applications of robots to fields such as health and elderly care.

Although robotics has reshaped and even redefined many industrial sectors, there still exists a gap between machines and humans in fields such as health and elderly care. For robots to safely manipulate or interact with fragile objects and living organisms, new strategies to enhance their perception while making their parts softer are needed. In fact, building a safe and dexterous robotic gripper with human-like capabilities is currently one of the most important goals in robotics.

One of the main challenges in the design of soft robotic grippers is integrating traditional sensors onto the robot’s fingers. Ideally, a soft gripper should have what’s known as proprioception — a sense of its own movements and position — to be able to safely execute varied tasks. However, traditional sensors are rigid and compromise the mechanical characteristics of the soft parts. Moreover, existing soft grippers are usually designed with a single type of proprioceptive sensation; either pressure or finger curvature.

To overcome these limitations, scientists at Ritsumeikan University, Japan, have been working on novel soft gripper designs under the lead of Associate Professor Mengying Xie. In their latest study published in Nano Energy, they successfully used multimaterial 3D printing technology to fabricate soft robotic fingers with a built-in proprioception sensor. Their design strategy offers numerous advantages and represents a large step toward safer and more capable soft robots.

The soft finger has a reinforced inflation chamber that makes it bend in a highly controllable way according to the input air pressure. In addition, the stiffness of the finger is also tunable by creating a vacuum in a separate chamber. This was achieved through a mechanism called vacuum jamming, by which multiple stacked layers of a bendable material can be made rigid by sucking out the air between them. Both functions combined enable a three-finger robotic gripper to properly grasp and maintain hold of any object by ensuring the necessary force is applied.

Most notable, however, is that a single piezoelectric layer was included among the vacuum jamming layers as a sensor. The piezoelectric effect produces a voltage difference when the material is under pressure. The scientists leveraged this phenomenon as a sensing mechanism for the robotic finger, providing a simple way to sense both its curvature and initial stiffness (prior to vacuum adjustment). They further enhanced the finger’s sensitivity by including a microstructured layer among the jamming layers to improve the distribution of pressure on the piezoelectric material.

The use of multimaterial 3D printing, a simple and fast prototyping process, allowed the researchers to easily integrate the sensing and stiffness-tuning mechanisms into the design of the robotic finger itself. “Our work suggests a way of designing sensors that contribute not only as sensing elements for robotic applications, but also as active functional materials to provide better control of the whole system without compromising its dynamic behavior,” says Prof Xie. Another remarkable feature of their design is that the sensor is self-powered by the piezoelectric effect, meaning that it requires no energy supply — essential for low-power applications.

Overall, this exciting new study will help future researchers find new ways of improving how soft grippers interact with and sense the objects being manipulated. In turn, this will greatly expand the uses of robots, as Prof Xie indicates: “Self-powered built-in sensors will not only allow robots to safely interact with humans and their environment, but also eliminate the barriers to robotic applications that currently rely on powered sensors to monitor conditions.”

Let’s hope this technology is further developed so that our mechanical friends can soon join us in many more human activities!

Flexible self-powered multifunctional sensor for stiffness-tunable soft robotic gripper by multimaterial 3D printingNano Energy, 2021; 79: 105438 DOI: 10.1016/j.nanoen.2020.105438

Sci-Advent – Writing Reports Tailored for AI Readers

This is a reblog from an article by John Naughton in the Guardian on Dec 5th 2020. Read the original here.

My eye was caught by the title of a working paper published by the National Bureau for Economic Research (NBER): How to Talk When a Machine Is Listening: Corporate Disclosure in the Age of AI. So I clicked and downloaded, as one does. And then started to read.

The paper is an analysis of the 10-K and 10-Q filings that American public companies are obliged to file with the Securities and Exchange Commission (SEC). The 10-K is a version of a company’s annual report, but without the glossy photos and PR hype: a corporate nerd’s delight. It has, says one guide, “the-everything-and-the-kitchen-sink data you can spend hours going through – everything from the geographic source of revenue to the maturity schedule of bonds the company has issued”. Some investors and commentators (yours truly included) find the 10-K impenetrable, but for those who possess the requisite stamina (big companies can have 10-Ks that run to several hundred pages), that’s the kind of thing they like. The 10-Q filing is the 10-K’s quarterly little brother.

The observation that triggered the research reported in the paper was that “mechanical” (ie machine-generated) downloads of corporate 10-K and 10-Q filings increased from 360,861 in 2003 to about 165m in 2016, when 78% of all downloads appear to have been triggered by request from a computer. A good deal of research in AI now goes into assessing how good computers are at extracting actionable meaning from such a tsunami of data. There’s a lot riding on this, because the output of machine-read reports is the feedstock that can drive algorithmic traders, robot investment advisers, and quantitative analysts of all stripes.

The NBER researchers, however, looked at the supply side of the tsunami – how companies have adjusted their language and reporting in order to achieve maximum impact with algorithms that are reading their corporate disclosures. And what they found is instructive for anyone wondering what life in an algorithmically dominated future might be like.

The researchers found that “increasing machine and AI readership … motivates firms to prepare filings that are more friendly to machine parsing and processing”. So far, so predictable. But there’s more: “firms with high expected machine downloads manage textual sentiment and audio emotion in ways catered to machine and AI readers”.

In other words, machine readability – measured in terms of how easily the information can be parsed and processed by an algorithm – has become an important factor in composing company reports. So a table in a report might have a low readability score because its formatting makes it difficult for a machine to recognise it as a table; but the same table could receive a high readability score if it made effective use of tagging.

The researchers contend, though, that companies are now going beyond machine readability to try and adjust the sentiment and tone of their reports in ways that might induce algorithmic “readers” to draw favourable conclusions about the content. They do so by avoiding words that are listed as negative in the criteria given to text-reading algorithms. And they are also adjusting the tones of voice used in the standard quarterly conference calls with analysts, because they suspect those on the other end of the call are using voice analysis software to identify vocal patterns and emotions in their commentary.

In one sense, this kind of arms race is predictable in any human activity where a market edge may be acquired by whoever has better technology. It’s a bit like the war between Google and the so-called “optimisers” who try to figure out how to game the latest version of the search engine’s page ranking algorithm. But at another level, it’s an example of how we are being changed by digital technology – as Brett Frischmann and Evan Selinger argued in their sobering book Re-Engineering Humanity.

After I’d typed that last sentence, I went looking for publication information on the book and found myself trying to log in to a site that, before it would admit me, demanded that I solve a visual puzzle: on an image of a road junction divided into 8 x 4 squares I had to click on all squares that showed traffic lights. I did so, and was immediately presented with another, similar puzzle, which I also dutifully solved, like an obedient monkey in a lab.

And the purpose of this absurd challenge? To convince the computer hosting the site that I was not a robot. It was an inverted Turing test in other words: instead of a machine trying to fool a human into thinking that it was human, I was called upon to convince a computer that I was a human. I was being re-engineered. The road to the future has taken a funny turn.

Data Skeptic Podcast

I had an opportunity to be one of the panellists in the Data Skeptic podcast recently. It was great to have been invited and as a listener to the podcast it was a really treat to be able to take part. Also, recording it was fun…

You can listen to the episode here.

More information about the Data Skeptic Journal Club can be found in their site. I would like to thank  Kyle Polich, Lan Guo and George Kemp for having me as a guest. I hope it is not the last time!

In the episode Kyle talks about the relationship between Covid-19 and Carbon Emissions. George tells us about the new Hateful Memes Challenge from Facebook. Lan joins us to talk about Google’s AI Explorables. I talk about a paper that uses neural networks to detect infections in the ear.

Let me know what you guys think!

Cover Draft for “Advanced Data Science and Analytics with Python”

I have received the latest information about the status of my book “Advanced Data Science and Analytics with Python”. This time reviewing the latest cover drafts for the book.

This is currently my favourite one.

Awaiting the proofreading comments, and I hope to update you about that soon.

“Advanced Data Science And Analytics” is finished!

It has been a few months of writing, testing, re-writing and starting again, and I am pleased to say that the first complete draft of “Advanced Data Science and Analytics with Python” is ready. Last chapter is done and starting revisions now. Yay!