We use our voices to connect with, and convey information to, others. We change our tone, pitch-range, volume, rhythm and tempo to customize our meaning beyond the words we’ve selected to use.
It isn’t surprising that many of our favorite modes of online entertainment — from video games to dating apps to social networks — have introduced voice features as a way to increase connection between people and improve their experience.
But, they are running smack into what the video game industry discovered since introducing voice features in the early 2000s — moderating voice chat is really hard.
In this blog we review the challenges to voice chat moderation — just one form of content moderation. We provide best practices for moderating voice chat that can help you meet those challenges. Finally, we share a bit about how our technology helps.
Ready? Let’s go!
Challenges to Voice Moderation
Moderating voice chat is hard, for a lot of reasons…
Accuracy is hard to achieve
First, unlike text, voice must be evaluated by word definition and tone, pitch-range, loudness, rhythmicality and tempo. “I’m going to kill you” said at medium volume in a joking tone is much different from “I’m going to kill you” shared at high volume in a serious tone. That’s a lot of context to analyze.
Second, many voice moderating solutions on the market today transcribe voice-to-text, and then analyze the text to identify toxic behaviors. Yes, transcription services are getting better, but they stumble with anything that deviates from perfect elocution, like non-native accents and speech disorders.
You can have the most advanced behavior identification system in the world, but when mistakes come in, mistakes come out. When accuracy is lacking, then you can’t confidently automate some of your responses, which means you can’t capture as much efficiency as you might need.
The cost of voice moderation is high
The cost to transcribe audio files can add up quickly, especially when 1) you need it done quickly enough to take action within a defined SLA and 2) when you need to transcribe multiple languages into text and translate the text into English in order to apply guidelines.
If you’re directly analyzing the audio files, you’ll run into sizable data processing costs. For context, the size of a single 0:10 audio file is about 200kb which is roughly equivalent to about 1,600 chat messages … $$$.
Speed is an issue
If you’re evaluating or building a solution that includes voice-to-text transcription, then understand the transcription process will naturally delay recognition of disruptive behaviors.
If you’re evaluating or building a solution that works with audio files directly, then know that processing those files will take time and lots of compute power (see the note on costs above!).
The privacy and security risks are higher
Audio files themselves aren’t inherently more susceptible to breaches. The challenge here is that people tend to feel more comfortable sharing more sensitive things through speech rather than through other modes of communication. So when a breach happens, the impact could be greater. For example, a teenage may feel more comfortable sharing that she is gay over voice chat rather than text chat.
With all these obstacles it can feel like moderating voice is just overwhelming and undoable. Take heart! Let’s dive into some best practices we’ve learned that may help.
Best Practices for Voice Chat Moderation
Select one behavior to start with
Starting small and growing is a tried and true approach to many things in life and applies here. Starting with one behavior, instead of a slew of behaviors at once, gives you an opportunity to learn, form a foundation and then grow steadily rather than being overwhelmed.
Select a behavior your moderation team has experience with managing. That way the team will start from a place of confidence.
Customize your definition of that behavior
The language of your community is always evolving. This is a great time to sanity-check your definition of the behavior you’ve selected and update it. There may be a nuance to add now that you’re moderating voice chat.
Create your behavior/response matrix
Here’s an simplified example from one of our customers:
You probably have one already for other content types (e.g. text chat). Consider adding a column for content type to determine whether you need different responses to different content types.
Track behavior trends over time
Say you chose ‘Hate Speech’ as the behavior you wanted to start moderating for in voice chat. You’d establish a prevalence rate for it probably derived from its presence in text in your platform.
In addition to prevalence, you’d look at:
- Impact: How many users were exposed to the incident.
- Appeals: How many punished users disagreed with their punishment.
- Case Volume: How you’re handling the amount of incidents.
- Case Complexity: Develop a simple scale at first, then refine it.
- Model Performance: How much moderation you can automate without team review.
Expand to more behaviors
Following these best practices should reveal the best methods to roll out voice chat moderation at your company. Tweak the above, choose another behavior (or more) and roll on!
These best practices are a good first step, but we’re certain we haven’t covered everything. We’d love to hear lessons you’ve learned along the way.
How Spectrum Labs Helps Moderate Voice Chat
Our behavior models evaluate context and are trained from our Data Vault, so they perform well out of the gate. We fine-tune them for our customers, and work with them to improve performance over time.
Our partnership with voice provider Agora helps our customers embed engaging voice features in any application, on any device, anywhere, while maintaining user safety.