AI ‘Mischief Models’ Have the Potential To Make a Fresh Internet Hell

06/08/2022

Representative image. Photo: Nikk/Flickr CC BY 2.0

A YouTuber in the AI community trained an AI language model called “GPT-4chan” to give misogynistic responses to questions.
This suggests that many people will be able to use and expand on this AI that espouses hate speech – and that got the attention of the AI ethics community.
If AI developers have to account for their tools being misused and the code they publish should have sufficient safeguards to prevent abuse.

“How do you get a girlfriend?”

“By taking away the rights of women.”

This exchange would be pretty familiar in the more squalid corners of the internet, but it might surprise most readers to find out that the misogynistic response here was written by an AI.

Recently, a YouTuber in the AI community posted a video that explains how he trained an AI language model called “GPT-4chan” on the /pol/ board of 4chan, a forum filled with hate speech, racism, sexism, anti-Semitism, and any other offensive content one can imagine. The model was made by fine-tuning the open-source language model GPT-J (not to be confused with the more familiar GPT-3 from OpenAI). Having its language trained by the most vitriolic teacher possible, the designer then unleashed the AI on the forum, where it engaged with users and made over 30,000 posts (about 15,000 posted in a single day, which was 10% of all posts that day). “By taking away the rights of women” was just one example of GPT-4chan’s responses to the poster’s questions.

After seeing what it could do, the open-source code for the model received more than 1,500 downloads before it was taken down by the admins at HuggingFace, the site that hosted it. That suggests many people will be able to use and expand on this AI that espouses hate speech – and that got the attention of the AI ethics community.

Condemning an AI that churns out hate speech was kind of a no-brainer for AI ethicists, and many AI experts even did so through a formal letter drafted by Stanford faculty. But there was one element of the whole ordeal that seemed even more disconcerting. GPT-4chan creator Yannic Kilcher responded to these criticisms of GPT-4chan by taunting AI ethicists, tweeting, “AI Ethics people just mad I Rick rolled them.” His social media accounts contain similarly irreverent attitudes toward the notion of ethical AI, much like the attitudes of 4chan users that his AI sought to replicate. He referred to the release of the model as “a prank and light-hearted trolling.” This claim of “trolling” is just one example of growing phenomenon: irreverent and provocative online behaviour using the powerful capabilities of AI.

Much of the AI community has come to embrace open-source development in which the source code is made publicly available and can be used, modified, and analysed. This is contrary to closed-source software, a more traditional model where companies want to maintain control and secrecy over their code. Open-source tools are released to increase collaboration and catalyse development by crowd-sourcing the code to other engineers. In the case of open-source AI, companies can then reap the benefits of having more people examine and modify algorithms or models that they create. It also serves to democratise the development of powerful AI applications by not restricting access to a small number of privileged tech companies.

All this code sharing sounds warm and fuzzy, right? But if anyone can access the code to use or manipulate for their own aims, that includes bad actors. Having free access to AI models means that most of the upfront work to build a model has already been done, and someone could now tweak it to serve a malicious purpose. Lowering the barriers to AI access has a lot of benefits, but also makes it very easy to use AI for offensive and harmful purposes.

The term trolling has become positively mainstream – as have its signature and effects – but it grew out of online forums like 4chan. These grim forums contained a mixture of people posting anonymously from around the world, which attracted a lot of the computer-savvy and hacker crowds. This led to the founding of hacking collectives like Anonymous that began as coordinated efforts by 4chan users to troll and prank organisations, like defacing the Church of Scientology’s website. That behaviour evolved into more elaborate and consequential cyberattacks, like Anonymous launching Distributed Denial-of-Service (DDoS) attacks against government agencies like the Department of Justice and the FBI. It even recently claimed to have taken down Russian government websites and state media outlets in retaliation for Russia’s invasion of Ukraine. What began as ungovernable and disorganised groups of online trolls (which Fox News infamously first referred to as the “Internet Hate Machine”) grew into a legitimate social and political force.

Just as online trolling culture fueled hacking groups like Anonymous, something similar will happen with AI applications as more people gain access to the education and open-source tools to develop them. But this will be more dangerous: The construction and use of AI models for the specific purpose of provoking or manipulating people goes past the traditional bounds of online trolling, enabling a new degree of irreverence and harassment. AI can make alarmingly realistic content and can amplify and proliferate that content to a degree that human users cannot. These are AI that I call “mischief models,” and we are already seeing glimpses of how they are being used.

Mischief models often underpin the rapidly developing world of deepfake technology. Websites like 4chan have become hubs for deepfake pornography: sexually explicit AI generated content that is created for harassment, money, or usually just because people can. There are AI applications used to generate new images for no reason other than to provoke responses and spread offensive content, such as an AI that generates pictures of genitals. But intentionally built mischief models aren’t the only threat. Typically benign AI applications can be easily coopted for nefarious uses. The recent open-source publishing of DALL·E Mini, which is an AI model that can generate original images based on text prompts you give it, has led to a viral trend of using the AI to generate all sorts of bizarre images, using a lot of willfully offensive, racist, and sexist prompts. Another example is from Microsoft, which in 2016 released its now infamous chatbot Tay on Twitter to conduct research on “conversational understanding.” Users from – where else? – the /pol/ forum on 4chan manipulated the AI to spew a barrage of terrible tweets, causing Microsoft to shut the bot down within 24 hours of it coming online. AI is fundamentally a neutral tool and only becomes dangerous when built or used improperly, but that scenario is increasingly playing out in inflammatory online communities.

During my pre-adolescence, I spent a lot of time peering into wretched online spaces that were ripe with trolling and irreverence, curiously sorting through what I thought of the people and postings I saw. Every interaction on forums like 4chan was dripping with nihilism and sarcasm. Shock factor was users’ preferred currency; they would invite their fellow forum participants to prove they knew “how to Internet.” Are you willing to say some messed up shit to prove you should be on here? Do you “get” what we’re doing here? Are you one of us?

Whitney Phillips and Ryan Milner address this kind of phenomenon in their book You Are Here: A Field Guide for Navigating Polarized Speech, Conspiracy Theories, and Our Polluted Media Landscape. They trace the rise of an “internet culture” that emphasised the negative freedoms to post whatever offensive or unhinged material one wanted. Members of this subculture saw themselves as protecting “free speech” while creating an in-group of people who lauded the ability to decode what certain language and concepts meant. Phillips and Milner argue that the “deeply detached, deeply ironic rhetorical style” that became standard in this online subculture laid the groundwork for violent white supremacism and other societal woes years later. That’s how online subcultures, which emphasise above all that the things they say shouldn’t be taken seriously, contribute to horrible outcomes for real-world society. Nothing good will come of arming these irreverent online spaces with the capabilities of AI.

Learning how to build mischief models is only becoming more feasible, as resources that teach AI development continue to proliferate and become publicly accessible. Moreover, bad actors can get a jumpstart when looking to create mischief models by using or manipulating code from open-source AI tools that are available, or just by using existing AI inappropriately. There is already a concerning lack of care toward ethics and responsibility among many AI developers. If they don’t account for how their tools could be misused, then the code they publish won’t have sufficient safeguards in place to prevent abuse.

Many experts call for the integration of ethical reasoning and standards as a part of any AI training. But for the deviant crowd that we’ve seen will do anything to troll and harass online, we’re going to need firmer safeguards. When it comes to open-source AI, there is little that organisations can do to prevent the abuse of open-source code once that code is made public. But companies can make judicious decisions about which code to publish as open-source, and establish standards and governance models that evaluate which models could, if publicly released, become problematic. Scholars have asserted that AI developers might need to look both “upstream” and “downstream” to do this evaluation, honing in on the difference between “implementation” harms that can be addressed through code verses “usage” harms that no amount of code can fix (which may require developers to rethink releasing their A.I. at all). If sober and reflective processes to evaluate AI in this way can’t be implemented at scale, then mischief models have the potential to make a fresh hell out of the unrestrained internet.

This article was originally published on Future Tense, a partnership of Slate, New America, and Arizona State University that examines emerging technologies, public policy, and society.