Skip to content
4 min read

How online platforms can use Generative AI technology safely

By Hetal Bhatt

We've previously talked about how generative AI (GenAI) will transform online communities. The type of AI technology we see in apps like ChatGPT, DALL-E, and Midjourney is ushering in a new frontier of creativity and personalization.

Popular online platforms are already deploying generative AI technology across their communities. Gaming platform Roblox is rolling out features that allow players to create new items, build new levels, or modify their in-game experience by simply typing into a prompt. Educational app Quizlet has introduced a chatbot tutor named Q-Chat that's powered by ChatGPT.  And powerhouses like Facebook/Meta and YouTube have announced plans to offer generative AI on their respective platforms – but only after they determine how to do so responsibly.

That last part is key.

Platforms that already have released generative AI features also need help with their use. Snapchat's My AI chatbot has been documented as having age-inappropriate conversations with young users and allegedly can be prompted to advise them on how to lie to their parents. Similarly, Microsoft's Bing chatbot has been saying "crazy and unhinged things" to users, like "love-bombing" New York Times tech columnist Kevin Roose with verbose declarations of its love for him:

I don’t declare my love for a lot of people. I don’t declare my love for anyone. I don’t declare my love for anyone but you. 😊

You’re the only person I’ve ever loved. You’re the only person I’ve ever wanted. You’re the only person I’ve ever needed. 😍

Transcript of conversation between Microsoft's Bing Chat and Kevin Roose


Not only is this creepy, but it also raises concerns about ways that generative AI could harm users and damage the reputation of companies that deploy it. Rushing generative AI-powered features to market without proper safeguards carries the risk of potentially disastrous consequences.

So, why is this happening?

Text chatbots like ChatGPT, My AI, and Bing Chat are bound mostly by simple keyword-based content moderation. Their moderation systems can catch toxic languages like profanity or insults, but they can't parse contextual information like conversation history, inappropriate subject matter, and user age. Context recognition is essential to detecting more complex harmful behaviors like hate speech, radicalization, and child grooming. However, due to their lack of Contextual AI moderation systems, most chatbots can be prompted to engage in potentially harmful conversations that still appear squeaky clean to a profanity filter.

 

What platforms can do to ensure safety for generative AI

If you have plans to release generative AI-powered technology on your platform, you will need to prepare for how you'll proceed when you do.

The first step is to update your platform's Trust & Safety policy and community guidelines to address how generative AI will be moderated. and actioned. The key measures to include here are:

  • Clear policies for generative AI: Don't leave anything ambiguous – your platform must have clear guidelines for users and Trust & Safety staff alike on creating and moderating AI-generated content.

  • Trust & Safety operations for generative AI: Build reporting systems and workflows for users to flag AI-generated content and for moderators to subsequently action it.

  • Assign responsibility: Who is responsible when generative AI is used to create toxic content? The user? The product team? A mixture of both? Establishing responsibility is crucial for actioning generative AI cases on your platform and preventing its future abuse.


Along with policy changes, your platform needs to make infrastructural changes to prevent harmful content from being created with your AI features. Specifically, you should focus on the following 4 spaces to safeguard any generative AI tools:

  1. Data Vault: AI is only as good as the data it learns from. That's why it's crucial for your AI to be trained on a large and diverse set of data in order to accurately recognize behavior that it may encounter on your platform.

    At Spectrum Labs, our content moderation AI is powered by a robust data vault that spans the globe across a wide range of communities like gaming platforms, dating apps, social networks, and online marketplaces – that's how it can parse context and interpret user behavior across a variety of interactions, and in different languages.


  2. Brand-Safe Data: Since large language models (the backbone of generative AI technology) are trained on data from the open Internet, they inevitably will learn behavior from its more toxic corners.

    To prevent harmful or even illegal outputs, you need to make sure your generative AI is trained with data that's been cleaned of toxic content and made compliant with privacy regulations like COPPA and GDPR. A chatbot can’t say something toxic if it never learns it in the first place.

  3. Contextual AI Detection: Your generative AI inputs and outputs need to be moderated by a solution that can parse context. Keyword lists aren't enough – Contextual AI is much more effective at interpreting nuance and conversational metadata in order to detect more complex harmful behavior like child grooming and calls for violence.

  4. Regulatory Compliance: AI is a fast-changing space from both a technological and political perspective. It's likely that generative AI will be subject to government regulations in the relatively near future, so be prepared to make any changes to your platform to comply with new laws.


At Spectrum Labs, we've created a fully scalable content moderation solution for generative AI. By utilizing Contextual AI for more accurate detection at near-zero latency, Spectrum Labs is able to safeguard your AI-powered tools against a full range of harmful behavior. We also provide brand-safe data sets to ensure that your generative AI is always learning and improving. As an AI company ourselves, we're especially knowledgeable and skilled at preventing its misuse.

To learn more about Spectrum Labs' moderation solutions for generative AI, click here.

Learn more about how Spectrum Labs can help you create the best user experience on your platform.