Early in 2001 I was a freelance journalist, writer and speaker, doing my best to help people understand the internet and how it was transforming the world, and arguing that it should be reshaped to be supportive, humane and regulated to serve public rather than purely private interests. I was writing for The Guardian, The Register and The Times, editing supplements for The New Statesman, and speaking at events around the world. I published pamphlets for the Cooperative Party, advised think tanks and governments on tech policy, and tried to ensure that the people whose lives were most affected by technology both understood it and had a way to influence its development.
I had been appearing regularly on The Big Byte, a radio show on the BBC’s news and sport network Radio 5, where I reported the week’s technology news. The team of presenters and producers included Gareth Jones, Jem Stone, Violet Berlin, Quentin Cooper, producer Neil George, and a young producer called Gareth Mitchell. They were all great to work with and we had an excellent time going out live from a basement studio in Broadcasting House every Sunday lunchtime.
As well as the Big Byte I was regularly invited to be the ‘person who understands tech’ on other shows, like Outlook and You and Yours, and as the net became more important there were more opportunities to sit down with presenters like Sean Rafferty or Jeremy Vine and explain what was going on, or to be a calming voice for one of Rory Cellan-Jones’ packages on the six o’clock news.
So it wasn’t surprising when I was asked if I’d help out with a planned new show for the BBC World Service that was going to focus on technology and its impact on people’s lives rather than breathless reports about the latest shiny toy available in the shops.
[this is an attempt to capture a half-formed line of thinking before it goes.. be gentle with my inconsistencies]
There continues to be enormous interest in the development of the next generation of image generating tools. My Twitter feed is full of people’s experiments with image synthesis tools like DALL-E, Stable Diffusion, and Midjourney, while even TikTok is getting into the game with its own built in ‘AI Greenscreeen’.
Meanwhile there’s an ongoing conversation about which of the LLMs (large language models) will replace novelists or become sentient, as GPT-3, OPT-175B and LaMDA continue to demonstrate an astonishing ability to reduce the critical capacity of journalists to zero and cause them to generate ever more hyperbolic copy.
At the centre of these new tools there is a serious debate to be had about the implications of training a neural network in the entire corpus of human-generated text and imagery, without licensing anything that remains in copyright, or considering the moral rights of any of the artists involved, and then using the tool to ‘create’ similar work.
That’s not my main concern at the moment, so instead I want to reflect on what I believe will become the primary use case for software that can take text and an optional image and generate an unlimited collection of fairly coherent words and still images (soon to do the same for video). Because I think these are the tools we will come to rely on to populate our metaverses with the virtual locations and interactive non-player characters (NPCs) we will need to meet demand.
We are going to need them because after a mere thirty years of serious experimentation with augmented and virtual reality we seem to have the hardware, processing power, and funding from wildly optimistic multi-billionares that we need to make the metaverse a viable mass medium.
What we probably don’t have is the human cognitive resource needed to create the number of virtual environments or NPCs we will need if this takes off.