We use our voices to connect with, and convey information to, others. We change our tone, pitch-range, volume, rhythm and tempo to customize our meaning beyond the words we’ve selected to use.
It isn’t surprising that many of our favorite modes of online entertainment — from video games to dating apps to social networks — have introduced voice features as a way to increase connection between people and improve their experience.
But, they are running smack into what the video game industry discovered since introducing voice features in the early 2000s — moderating voice chat is really hard.
In this blog we review the challenges to voice chat moderation — just one form of content moderation. We provide best practices for moderating voice chat that can help you meet those challenges. Finally, we share a bit about how our technology helps.
Ready? Let’s go!
Moderating voice chat is hard, for a lot of reasons…
First, unlike text, voice must be evaluated by word definition and tone, pitch-range, loudness, rhythmicality and tempo. “I’m going to kill you” said at medium volume in a joking tone is much different from “I’m going to kill you” shared at high volume in a serious tone. That’s a lot of context to analyze.
Second, many voice moderating solutions on the market today transcribe voice-to-text, and then analyze the text to identify toxic behaviors. Yes, transcription services are getting better, but they stumble with anything that deviates from perfect elocution, like non-native accents and speech disorders. You can have the most advanced behavior identification system in the world, but when mistakes come in, mistakes come out. When accuracy is lacking, then you can’t confidently automate some of your responses, which means you can’t capture as much efficiency as you might need.
The cost to transcribe audio files can add up quickly, especially when 1) you need it done quickly enough to take action within a defined SLA and 2) when you need to transcribe multiple languages into text and translate the text into English in order to apply guidelines.
If you’re directly analyzing the audio files, you’ll run into sizable data processing costs. For context, the size of a single 0:10 audio file is about 200kb which is roughly equivalent to about 1,600 chat messages … $$$.
If you’re evaluating or building a solution that includes voice-to-text transcription, then understand the transcription process will naturally delay recognition of disruptive behaviors.
If you’re evaluating or building a solution that works with audio files directly, then know that processing those files will take time and lots of compute power (see the note on costs above!).
Audio files themselves aren’t inherently more susceptible to breaches. The challenge here is that people tend to feel more comfortable sharing more sensitive things through speech rather than through other modes of communication. So when a breach happens, the impact could be greater. For example, a teenage may feel more comfortable sharing that she is gay over voice chat rather than text chat.
With all these obstacles it can feel like moderating voice is just overwhelming and undoable. Take heart! Let’s dive into some best practices we’ve learned that may help.
So, you want to moderate voice chat. Awesome. Here’s a way to start.
Already moderating voice? Great!
Maybe this plan can help augment your efforts…
Starting small and growing is a tried and true approach to many things in life and applies here. Starting with one behavior, instead of a slew of behaviors at once, gives you an opportunity to learn, form a foundation and then grow steadily rather than being overwhelmed.
Select a behavior your moderation team has experience with managing. That way the team will start from a place of confidence.
The language of your community is always evolving. This is a great time to sanity check your definition of the behavior you’ve selected and update it. There may be nuance to add now that you’re moderating voice chat.
Here’s an simplified example from one of our customers:
You probably have one already for other content types (e.g. text chat). Consider adding a column for content type to determine whether you need different responses to different content types.
Say you chose ‘Hate Speech’ as the behavior you wanted to start moderating for in voice chat. You’d establish a prevalence rate for it probably derived from its presence in text in your platform.
In addition to prevalence, you’d look at:
Following these best practices should reveal the best methods to roll out voice chat moderation at your company. Tweak the above, choose another behavior (or more) and roll on!
These best practices are a good first step, but we’re certain we haven’t covered everything. We’d love to hear lessons you’ve learned along the way.
Rather than transcribing voice to text and risk accuracy issues, or analyzing actual audio files and incur high processing costs, our software deployment kit (SDK) extracts specific features from :10 second audio clips and feeds them into our behavior models.
Our behavior models evaluate context and are trained from our Data Vault, so they perform well out of the gate. Our customers fine tune them to match their definitions and work with us to improve them over time.
Lorem ipsum dolor sit amet, consectetur adipiscing elit
These Stories on social media
Sign up for our newsletter
Industries
Resources
No Comments Yet
Let us know what you think