Image generators like Stable Diffusion can create what looks like real photos or hand-illustrated illustrations of anything one can imagine. This is thanks to algorithms that learn to correlate the attributes of a large number of images fetched from the web and image databases with their associated text labels. In a process that involves adding and removing random noise to an image, the algorithm learns to render new images to match text cues.
Because tools like Stable Diffusion use images scraped from the web, their training data often includes pornographic images, which enables the software to generate new pornographic images. Another concern is that such tools could be used to create images that appear to show a real person making some compromises — which could spread misinformation.
The quality of AI-generated images has skyrocketed over the past year and a half, starting with AI research firm OpenAI announcing a system called DALL-E in January 2021. It popularized models that generate images from textual cues, and in April 2022 launched a more powerful successor, DALL-E 2, which is now available as a commercial service.
From the outset, OpenAI has restricted who can access its image generator, providing access only by filtering prompts for what can be requested. The same goes for a competing service called Midjourney, released in July this year, which has helped popularize the art of AI production through widespread access.
Stable Diffusion is not the first open source AI art generator. Shortly after the original DALL-E was released, a developer built a clone called the DALL-E Mini that anyone could use, and it quickly became a meme-making phenomenon. The DALL-E Mini, later renamed Craiyon, still includes guardrails similar to the official version of the DALL-E. Clement Delangue, CEO of HuggingFace, a company with many open-source AI projects, including Stable Diffusion and Craiyon, said there would be problems if the technology was controlled by only a few large companies.
“It’s actually better from a security perspective if you look at the long-term evolution of the technology and make it more open, more collaborative and more inclusive,” he said. Closed technologies are harder to understand for outside experts and the public, he said, and it would be better if outsiders could evaluate models to address issues like race, gender or age bias. Also, others cannot build on closed technologies. Overall, the benefits of open source technology outweigh the risks, he said.
Delangue noted that social media companies could use Stable Diffusion to build their own tools to spot AI-generated images used to spread disinformation. The developers also contributed a system that can add an invisible watermark to images made using stable diffusion for easier tracking, and built a tool for finding specific images in the model’s training data, he said, in order to remove the problematic image.
After becoming interested in Unstable Diffusion, Simpson-Edin became the host of the Unstable Diffusion Discord. The server prohibits people from posting certain types of content, including images that could be construed as underage pornography. “We can’t adjust what people are doing on their own machines, but we’re very strict about what we post,” she said. In the short term, curbing the disruptive effects of AI art creation may depend more on humans than machines.