Skip to content
3 min read

How Generative AI will affect content moderation

By Hetal Bhatt

Generative AI  (GenAI) is on the tip of everyone's tongue right now. Every day, new projects created by GenAI seem to leave online audiences amazed, amused, and a little anxious.

Indeed, GenAI apps like ChatGPT and DALL-E are ushering in a new age of content creation. Whether it's funny images or fully functional apps coded in minutes, GenAI will enable a new frontier of creativity and productivity. Projects that used to take days, weeks, or even months can now be generated within seconds of typing 1-2 sentences into a prompt.

Unfortunately, GenAI's proficient and prolific output also could be used by bad actors to scale toxic behavior like hate speech, disinformation, and spam to industrial proportions.


GenAI's use in online communities

It won't be long before GenAI is incorporated into platforms that host user-generated content.

We can already see it in Snapchat's new My AI chatbot feature to generate text-based responses to questions, comments, and requests. Social media powerhouse Meta recently assembled a product team to determine how to deploy GenAI across its platforms. YouTube CEO Neal Mohan listed GenAI as one of the company's priorities for 2023, citing its potential to help video creators "raise their production value, from virtually swapping outfits to creating a fantastical film setting."

Despite GenAI's parabolic rate of advancement, the aforementioned companies emphasized that they are prioritizing user safety and responsible usage for their projects, even if it means taking a deliberately slower pace of development. Unless effective guardrails are in place, GenAI can be used to inundate online platforms with toxic content that will drive away users, dissuade potential partnerships, and ultimately hurt your bottom line.

Some examples of how GenAI can be abused include:

  • Hate speech: GenAI is notorious for generating implicitly racist and sexist passages. After all, it's trained on language from the open Internet, which inevitably has its fair share of toxic corners. Without exhaustive safeguards, GenAI could be used to flood online platforms with hateful posts and destroy positive user experiences.

  • Spam: You've seen comment sections riddled with copypasta for financial scams and exploitative offers. With GenAI, spam can be created at an exponentially higher rate and even made to resemble authentic human interaction. This could cripple the user experience on a platform by sowing distrust and systemically jeopardizing user safety.

  • Child grooming: Since GenAI is known to convincingly mimic human speech, it could be used to lure minors into dangerous situations. Even benign chatbots have had inappropriate interactions with underage users, so it's not unfathomable for GenAI to be exploited by sex predators to interact with kids.

  • Text-to-image: While not always used for nefarious purposes, text-to-image AI can create extremely toxic photos and deepfake content that often is used to harass others. This can be catastrophic for platforms who are caught flat-footed on dealing with the scale that text-to-image generators can create content.

As GenAI proliferates across online communities, so will problems like these. Companies who plan to introduce GenAI to their platforms also must have a plan for combatting its misuse.


3 initial steps to prevent GenAI abuse

Since GenAI is still a nascent technology, there's no final word on how to stop its misuse. Nevertheless, there are 3 steps that platforms can take to preliminarily safeguard their communities against AI-generated toxicity:

  1. Moderate GenAI inputs: As a first line of defense, platforms need to monitor inputted prompts to stop users from generating harmful content. This will require real-time detection of toxic prompts, which would best be served by automated solutions that can scale across an entire platform.

  2. Moderate GenAI outputs: Of course, moderating inputs for toxicity is not enough. The generated output also must be monitored by contextually aware automated solutions that can detect a full range of user behavior and stop harmful content before it can spread further across your platform.

  3. Identify bias in your GenAI: Bias is inherent in GenAI because GenAI has been trained on content and data from the open Internet. It's imperative for your platform to clamp down on biased and discriminative outputs by continuously retraining GenAI tools with more accurate and representative data.

Note that these are just initial steps to take against GenAI abuse. There inevitably will be more precautions needed as GenAI is further developed and deployed in new settings. With its current momentum, it's crucial to keep watching and learning about this revolutionary (But also risky!) technology.


How GenAI will affect Trust & Safety

Trust & Safety teams must significantly change their operations as GenAI is integrated into more online platforms.

In fact, now is the time to update your platform's Trust & Safety policies and guidelines for GenAI. Proactive changes will help stop problems before they grow too big to manage.

  • Update community terms of service to include guidelines on appropriate use of GenAI.

  • Build or update reporting systems for users to flag toxic content created with GenAI.

  • Update content moderation AI to detect toxic content in other formats besides chat, usernames, and profile entries, like the aforementioned text-to-image AI.

At Spectrum Labs, we've already built automated ways to help online communities safeguard against risks from GenAI. To learn more, check out our moderation solution for GenAI-created content.

Learn more about how Spectrum Labs can help you create the best user experience on your platform.