It’s hard to find anyone who isn’t alarmed or frustrated by the toxicity that appears in social media -- Trust & Safety professionals included. Most users are generally courteous and respectful, and generate just a small amount of toxicity, if any at all. But across millions of users, this still adds up to a mountain of cases. Without user-level moderation, moderators spend an enormous amount of time reviewing one message at a time, and have little opportunity to identify and focus on the worst cases. The goal of user-level moderation is to enable moderators to focus on the worst-of-the-worst users who produce a disproportionate amount of toxicity, and on users who are at elevated risk for self-harm.
This new approach combines individual user reputation scores, user-level moderation, and behavior detection trends. Let’s see how they each contribute to the solution.
User Reputation Scores
Both user-level moderation tools and behavior detection analytics leverage user reputation scores, which aggregate individual user behavior over time, and assign scores that can be used in AI analysis.
To find toxic users, the user reputation score considers the severity of behaviors, prior violations and recency of offensive posts or actions. It works similarly for users who may be at risk for self-harm or CSAM grooming by detecting factors that indicate possible vulnerability.
The reputation score is a vital tool that enables moderators to focus on the most troubled users faster and more efficiently, and in time to prevent real-world harm.
We designed this user reputation score with specific goals in mind.
We wanted to give Trust & Safety teams the ability to:
It’s no secret that content-level moderation is a time-consuming and an often ineffective way to ensure a safer and more engaging community. People who mean to disrupt others are adept at evading detection and rules, using tactics like l33tspeak that substitutes numbers for letters.
The same is true for at-risk users. For instance, rather than use the word “suicide” which is easily detected by a keyword classifier, a vulnerable person may use the words, “to become unalive.”
User-level moderation lets Trust & Safety teams view cases that are prioritized by severity and grouped by user. This feature allows them to see at-a-glance the number of severe cases mapped to a single user, and focus on that user first. Moderators can also see all user-level information and manage multiple cases for a single user simultaneously. This enables them to make better decisions with complete, not partial, information.
Moderation on a user-level keeps the community safer by enabling faster, automated, and escalating actions against repeat offenders. Moderators are presented with their community’s recommendations for user-level action (which they can override if needed). This drives efficiency by allowing moderators to handle more cases at once, and prevents future cases from a toxic user.
Behavior Detection Trends
Our AI detection solutions can help Trust & Safety teams detect individual behaviors, such as hate speech, radicalization, and threats, but are those one-off issues? Or are these cases indicative of a larger problem on your platform?
To help clients answer that question, we combine metrics about top toxic and at-risk users with other factors like behaviors, languages, community areas, and user attributes.
We can help you:
Identify which problems to address first
Learn which users drive toxicity
Understand how behaviors interrelate
Inform policies for different languages
See emerging patterns to address early.
These insights allow you to identify the types of problems that occur on your platform so that you can provide any additional training for your moderators, as well as update your community policies as required.
All three product features -- user reputation score, user-level moderation, and behavior detection trends -- are fully privacy compliant.
How does this play out in the everyday life of a Trust & Safety moderator? Let’s imagine a day in the life of a Trust & Safety team member we’ll call Jane.
Jane’s job is to review priority cases and take efficient user-level action for occurrences of hate speech for her community. She begins by viewing the Spectrum Labs platform and views her moderation queue for the day. She sees at-a-glance the top nefarious users, as well as the top users at risk, prioritized by severity.
Next, she can drill down into a specific user to see recent history, along with pending items to review. She can also access the user’s chat over an extended timeframe.
Based on what she sees, she can A.) determine that she needs to take action on a user, and B.) which action to take, from sending the user a reminder of the community policies or warning, to suspending or banning the user, to alerting the authorities of a potential threat.
Want to learn more about user reputation scores, user-level moderation or behavior detection trends? Contact us for a demo or to learn more.
You can download the product sheet here.