Skip to content
3 min read

More Than Profanity Filters to Stop Online Extremism and Harassment

By Katie Zigelman

In early January 2021, social media platform Parler was removed from Apple’s App Store, the Google Play Store, and was later removed from Amazon’s AWS servers due to the unchecked proliferation of online extremism. The platform, which touted itself as “the world’s town square” where its users can “speak freely and express yourself openly, without fear of being ‘de-platformed’ for your views,”1 was one of the fastest-growing apps in the United States.

Apple and Google justified the app’s removal because Parler failed to implement adequate content moderation solutions. Apple’s letter to Parler read, “...the processes Parler has put in place to moderate or prevent the spread of dangerous and illegal content have proved insufficient.”2

A spokesperson from Google stated, “In order to protect user safety on Google Play, our longstanding policies require that apps displaying user-generated content have moderation policies and enforcement that removes egregious content like posts that incite violence.”3

Amazon later claimed the platform repeated violations of Amazon’s rules, resulting in the platform being removed from its web-hosting service. “We cannot provide services to a customer that is unable to effectively identify and remove content that encourages or incites violence against others.”4

As online extremism spreads among users, platforms must have sophisticated, thoughtful content moderation policies and solutions in place to protect users, drive revenue, and avoid the same fate as Parler. Profanity filters, one of the oldest tactics created to moderate content, are not enough to prevent online extremism and harassment.

How Profanity Filters Work

As the name suggests, platforms use profanity filters to automatically scan user-generated content using a keyword list, and either replace them with special characters (@!%$) using a search-and-replace method or block the profanities entirely. The most basic profanity filters search for strings of characters and nothing more, with more advanced filters considering small amounts of context to avoid blocking otherwise safe words that may contain profanities within them. 

Unfortunately, profanity filters are only suitable as a first step because they leave a platform’s users vulnerable to online extremism, hate speech, and harassment.

Why Profanity Filters Aren’t Enough to Prevent Online Extremism

They are easy to identify and circumvent

Once a user notices that the platform has censored their content, they will likely attempt to get around the profanity filter with some simple tactics, including L33T speak, Unicode characters, alternate spelling, and dashes between letters. They may also get more creative and invent new phrases, slang, or memes to harass or incite violence. 

They are inaccurate because they ignore the context

Because profanity filters search for simple character strings and ignore the context around the words, they often censor or block content that isn’t intended to cause harm. And on the other side of the same coin, otherwise harmless words presented within certain contexts can be used to harm, and a profanity filter would completely overlook that. 

In a recent high-profile example, Vice President Kamala Harris and other high-profile female politicians have been the target of online abuse by users adopting coded language that goes unnoticed by platforms, such as hashtags like #KneePadsKamala and #HeelsUpHarris.5

According to one study, “This abuse and harassment coalesce into an often-unrecognized form of gender inequality that constrains women’s use of digital public spaces, much as the pervasive threat of sexual intimidation and violence constrain women’s freedom and comfort in physical public spaces.”6

Improved content moderation can have long-lasting, impactful consequences on the betterment of society. 

False positives trigger a poor user experience

False positives are a common occurrence because profanity filters don't examine context. This in turn can damage the user experience by punishing users unfairly. For example, children playing a video game who mention "sunglasses" may be flagged by a filter because it contains the word "asses."  

They require too many resources

Internet language evolves rapidly, so human moderators must manually update keyword lists and review incident reports for a profanity filter to be effective. This task may involve thousands of incident reports submitted in a single day for large platforms. Because moderation teams are constantly barraged with online toxicity, platforms must spend additional resources to support employees' mental health.

Strengthen your content moderation efforts with contextual AI

Contextual AI is an efficient and sophisticated upgrade from profanity filters. By interpreting contextual cues in real time to understand the intent of user-generated content, contextual AI is a stronger option to take the first pass at moderation. It can better shield human moderators from offensive content, saving time and resources while protecting the mental health of moderators.


If you’d like to learn more about how Contextual AI can help improve your content moderation and create safe and inclusive online environments, download the Spectrum Labs Contextual AI Solution Guide.







Learn more about how Spectrum Labs can help you create the best user experience on your platform.