
Each evaluation is a window into an AI model, not a perfect readout of how it has always performed, Solaiman said. But she hopes to be able to identify and stop the harm that AI can do, as shocking examples have emerged, including when players of the game AI Dungeon used GPT-3 to generate text describing sex scenes involving children. “This is an extreme situation that we cannot afford,” Solaiman said.
Solaiman’s latest research at Hugging Face finds that from 2018 to 2022, large tech companies take an increasingly closed approach to the generative models they release. This trend has accelerated with Alphabet’s AI teams at Google and DeepMind, and the phased release of GPT-2. Companies that protect their breakthroughs as trade secrets can also make the frontiers of AI more difficult for resource-poor and marginalized researchers, Solaiman said.
Closed versions are reversing a historical trend in natural language processing as more and more money is poured into large language models. Researchers have traditionally shared details about training datasets, parameter weights, and code to improve reproducibility of results.
“We know less and less about which database systems to train or how to evaluate them, especially for the most powerful systems released as products,” said Alex Tamkin, a PhD student at Stanford University. “His work focuses on large language models.
He thinks those in the field of AI ethics have raised public awareness of why it’s dangerous to move fast and break things when technology is deployed to billions of people. It would have been much worse without this work in recent years.
In fall 2020, Tamkin co-chaired a workshop on the societal implications of large language models with OpenAI’s policy director Miles Brundage. The interdisciplinary group highlighted the need for industry leaders to develop ethical standards and take steps such as bias assessment and avoidance of certain use cases prior to deployment.
Tamkin believes that external AI audit services need to grow alongside AI-based companies, as internal assessments often fall short. He believes that participatory evaluation methods that include community members and other stakeholders have great potential to increase democratic participation in the creation of AI models.
Merve Hickock, director of research at the University of Michigan’s Center for AI Ethics and Policy, said trying to get companies to put aside or burst the AI hype, self-regulate and adopt ethical principles isn’t enough. Protecting human rights means moving beyond the conversation about what is moral to what is legal, she said.
DAIR’s Hickok and Hanna are both watching the EU finalize its AI bill this year for how it will treat models that generate text and images. Hickok said she is particularly interested in how European lawmakers deal with liability for damages involving models created by companies such as Google, Microsoft and OpenAI.
“There are some things that need to be enforced because we’ve seen over and over again that if it’s not enforced, these companies will continue to break the mold and continue to push profits over rights, profits over communities,” Hickok said.
While policy is discussed in Brussels, the stakes remain high. A drop in Alphabet’s stock price wiped about $100 billion off its market value the day after Bard’s demo went wrong. “This is the first time I’ve seen a loss of wealth due to a language model error on this scale,” Hannah said. However, she’s not optimistic that this will persuade the company to slow down the rollout. “My guess is it’s not really going to be a cautionary tale.”