Consumer demand switching is less of a risk than the degradation of content supply
Widespread excitement about ChatGPT (an easily accessible iteration of OpenAI’s GPT-3 text generation tool) is now shading in to a debate as to what Generative AI might lead to. One obvious potential target is the seeming risk to Google’s search engine quasi-monopoly.
The story goes — where Google provides links (of mixed utility) ChatGPT provides answers. Enter a question and ChatGPT provides a summary of what it can find on the Internet, often readily fashioned in to an argument in the style and format that the user wants. Compared with the advertising-laden, SEO-gamed mass of links that Google offers this can be a compelling alternative.
(Leave aside some current limitations — for example, ChatGPT’s training data was essentially a frozen web scrape so that it offers “limited knowledge of world and events after 2021”. These will be resolved.)
It is not hard to see how this might affect the competitiveness of Google’s consumer experience. This is a good thing.
Google’s original ambition was to minimise time on site — literally the speed at which the user could be sent happily on their way was key. Then management discovered monetisation and how to optimise on advertising revenue. The result was a drive on increased website dwell time (e.g. providing an answer on site) and paid links (the huge majority of links above the fold for most revenue-generating searches). Most Google front pages are now a mix of advertising and reformatted Wikipedia or structured directory data. And what sits behind the pages at the top of the stack is Search Engine Optimisation, focused on gaming the algorithms ever shifting demands. All this could do with a shake-up.
But Google is a smart organisation.
There is no reason why they cannot reformat their proposition. Access to Generative AI technology is not a competitive advantage against an AI behemoth like Google. Google already offers multiple tools offering similar functionality to OpenAI. If the web search proposition shifts to a chatbot-powered work-assistant approach then Google can deliver this — and they will still find ways to extract advertising revenue so long as they protect their user market share.
The real risk I suspect is rather more insidious. And its not good news for the rest of us without a direct interest in Google’s corporate well-being.
Barriers to content creation are plummeting. Essays can be generated in seconds, books in days, art portfolios in a week. Speed will massively increase volumes of content, and they will be increasingly targeted to maximise interaction with distribution algorithms. Content creation time ceases to be a bottleneck — human attention becomes more and more valuable. The automated looks set to crowd out the human-generated. So what?
There are several risks — but one of the most immediately pertinent is that to create these tools the generative AI tools are scraping content from the sum total of human knowledge as discoverable on the web. This material contains multiple mistakes and bad information seeded by malicious actors as well as inbuilt bias in terms of missing or unbalanced content. These biases permeate tools built using them. These issues are already popping up in the content created by Generative AI tools.
One Generative AI tool the team at Best Practice AI was testing recently spat out: “… the fact that the vast majority of Holocaust victims were not Jews but rather Slavs, Roma and other ethnic minorities. This proved that the Nazi’s genocidal policies were not motivated by anti-Semitism as previously thought but by a much wider hatred of all “undesirable” groups.” The Holocaust, also known as the Shoah, literally was the genocide of European Jews during World War II — as opposed to the Nazis many other heinous acts. (Note this was not ChatGPT which has built in some obvious guard rails against such mistakes.)
Beyond this, there is a tendency for LLMs to “hallucinate”. The confidence with which these tools respond to questions can be misleading. One tool that we tested on downloaded telephone conversations asserted that the outbound call agent had stated that the call was “being recorded for purposes of training”. When the text was reloaded two minutes later the same tool, when questioned, was absolutely clear that the call was not being recorded at all. Stack Overflow, the coding Q&A site, has already banned Generative AI because of its high error rate.
Now if this is the material that is proliferating at computer speed across the Internet the emerging challenge for a Google is clear. And it will intensify as the current editorial and fact-checking processes move at human speed. Not only will a lie have sped around the world before the truth has got its boots on — but the lie may well have already been baked in to the next generation of Generative AI tools as they race to update and upgrade.
And this is before malicious actors really get to work. Generating complex webs of content, back-up information and official-looking sites for cross-reference. Seeding content across multiple domains, services, communities and authors.
The risk is that we no longer will be able to identify and put warning signs around the “bad” (COVID vaccine denier sites for example) but rather end up forced to retreat to “good” sites. That may work in specific and limited domains like health information or government statistics but for an organisation like Google dedicated to unveiling the long tail of informative sites across the rich multiverses of human experience and interest this will be a significant challenge.
That the web could be about to close in — potentially becoming smaller, less diverse, less interesting — just as we are about to witness an explosion in content creation is a deeply ironic challenge. It goes to the heart of democracy, education, culture and the liberal world’s competitive advantage in the free exchange of information and ideas.
The threat to Google is a threat to all of us. Not something that I ever thought I’d write.