[this is an attempt to capture a half-formed line of thinking before it goes.. be gentle with my inconsistencies]
There continues to be enormous interest in the development of the next generation of image generating tools. My Twitter feed is full of people’s experiments with image synthesis tools like DALL-E, Stable Diffusion, and Midjourney, while even TikTok is getting into the game with its own built in ‘AI Greenscreeen’.
Meanwhile there’s an ongoing conversation about which of the LLMs (large language models) will replace novelists or become sentient, as GPT-3, OPT-175B and LaMDA continue to demonstrate an astonishing ability to reduce the critical capacity of journalists to zero and cause them to generate ever more hyperbolic copy.
At the centre of these new tools there is a serious debate to be had about the implications of training a neural network in the entire corpus of human-generated text and imagery, without licensing anything that remains in copyright, or considering the moral rights of any of the artists involved, and then using the tool to ‘create’ similar work.
That’s not my main concern at the moment, so instead I want to reflect on what I believe will become the primary use case for software that can take text and an optional image and generate an unlimited collection of fairly coherent words and still images (soon to do the same for video). Because I think these are the tools we will come to rely on to populate our metaverses with the virtual locations and interactive non-player characters (NPCs) we will need to meet demand.
We are going to need them because after a mere thirty years of serious experimentation with augmented and virtual reality we seem to have the hardware, processing power, and funding from wildly optimistic multi-billionares that we need to make the metaverse a viable mass medium.
What we probably don’t have is the human cognitive resource needed to create the number of virtual environments or NPCs we will need if this takes off.