Skip to content

Sci-Advent – Writing Reports Tailored for AI Readers

This is a reblog from an article by John Naughton in the Guardian on Dec 5th 2020. Read the original here.

My eye was caught by the title of a working paper published by the National Bureau for Economic Research (NBER): How to Talk When a Machine Is Listening: Corporate Disclosure in the Age of AI. So I clicked and downloaded, as one does. And then started to read.

The paper is an analysis of the 10-K and 10-Q filings that American public companies are obliged to file with the Securities and Exchange Commission (SEC). The 10-K is a version of a company’s annual report, but without the glossy photos and PR hype: a corporate nerd’s delight. It has, says one guide, “the-everything-and-the-kitchen-sink data you can spend hours going through – everything from the geographic source of revenue to the maturity schedule of bonds the company has issued”. Some investors and commentators (yours truly included) find the 10-K impenetrable, but for those who possess the requisite stamina (big companies can have 10-Ks that run to several hundred pages), that’s the kind of thing they like. The 10-Q filing is the 10-K’s quarterly little brother.

The observation that triggered the research reported in the paper was that “mechanical” (ie machine-generated) downloads of corporate 10-K and 10-Q filings increased from 360,861 in 2003 to about 165m in 2016, when 78% of all downloads appear to have been triggered by request from a computer. A good deal of research in AI now goes into assessing how good computers are at extracting actionable meaning from such a tsunami of data. There’s a lot riding on this, because the output of machine-read reports is the feedstock that can drive algorithmic traders, robot investment advisers, and quantitative analysts of all stripes.

The NBER researchers, however, looked at the supply side of the tsunami – how companies have adjusted their language and reporting in order to achieve maximum impact with algorithms that are reading their corporate disclosures. And what they found is instructive for anyone wondering what life in an algorithmically dominated future might be like.

The researchers found that “increasing machine and AI readership … motivates firms to prepare filings that are more friendly to machine parsing and processing”. So far, so predictable. But there’s more: “firms with high expected machine downloads manage textual sentiment and audio emotion in ways catered to machine and AI readers”.

In other words, machine readability – measured in terms of how easily the information can be parsed and processed by an algorithm – has become an important factor in composing company reports. So a table in a report might have a low readability score because its formatting makes it difficult for a machine to recognise it as a table; but the same table could receive a high readability score if it made effective use of tagging.

The researchers contend, though, that companies are now going beyond machine readability to try and adjust the sentiment and tone of their reports in ways that might induce algorithmic “readers” to draw favourable conclusions about the content. They do so by avoiding words that are listed as negative in the criteria given to text-reading algorithms. And they are also adjusting the tones of voice used in the standard quarterly conference calls with analysts, because they suspect those on the other end of the call are using voice analysis software to identify vocal patterns and emotions in their commentary.

In one sense, this kind of arms race is predictable in any human activity where a market edge may be acquired by whoever has better technology. It’s a bit like the war between Google and the so-called “optimisers” who try to figure out how to game the latest version of the search engine’s page ranking algorithm. But at another level, it’s an example of how we are being changed by digital technology – as Brett Frischmann and Evan Selinger argued in their sobering book Re-Engineering Humanity.

After I’d typed that last sentence, I went looking for publication information on the book and found myself trying to log in to a site that, before it would admit me, demanded that I solve a visual puzzle: on an image of a road junction divided into 8 x 4 squares I had to click on all squares that showed traffic lights. I did so, and was immediately presented with another, similar puzzle, which I also dutifully solved, like an obedient monkey in a lab.

And the purpose of this absurd challenge? To convince the computer hosting the site that I was not a robot. It was an inverted Turing test in other words: instead of a machine trying to fool a human into thinking that it was human, I was called upon to convince a computer that I was a human. I was being re-engineered. The road to the future has taken a funny turn.