Posted on July 29, 2019

Drag Queen vs. David Duke: Whose Tweets Are More ‘Toxic’?

Dennis Antonialli, Wired, July 25, 2019

Social media platforms like Facebook, Twitter, and YouTube have been making significant investments in the development of artificial intelligence to moderate content and automate the removal of harmful posts. These decision­making technologies typically rely on machine-learning techniques and are specific to types of content, such as images, videos, sounds, and written text. Some of these AI systems, developed to measure the “toxicity” of text-based content, make use of natural language processing and sentiment assessment to detect harmful text.

{snip} If such AI tools are entrusted with the power to police content online, they have the potential to suppress legitimate speech and censor the use of specific words, particularly by vulnerable groups.

At InternetLab, we recently conducted a study focused on Perspective, an AI technology developed by Jigsaw (owned by Google’s parent company, Alphabet). The AI measures the perceived level of “toxicity” of text-based content. Perspective defines “toxic” as “a rude, disrespectful, or unreasonable comment that is likely to make you leave a discussion.” Accordingly, the AI model was trained by asking people to rate internet comments on a scale from “very healthy” to “very toxic.” The level of perceived toxicity indicates the likelihood that a specific comment will be considered toxic.

{snip}

Our results indicate that a significant number of drag queen Twitter accounts were calculated to have higher perceived levels of toxicity than white nationalist leaders. On average, the toxicity levels of the drag queens’ accounts ranged from 16.68 percent to 37.81 percent, while the white nationalists’ averages spanned from 21.30 percent to 28.87 percent. The toxicity level of President Trump’s Twitter account was 21.84 percent.

{snip}

Drag queens can be sharp-tongued. From “reads”—a specific form of insult that acerbically exposes someone’s flaws—to harsh jokes and comebacks, drag queens often reclaim words traditionally used as slurs to build a distinctive communication style.

In person, it is easier to understand context and see this as a form of self-expression. But when reading such missives online, it is significantly more challenging to distinguish between harmful and legitimate speech—especially when that assessment is made by machines. These in-group uses were also found in various tweets we analyzed. But in many of those cases, Perspective still deemed the post extremely toxic:

{snip}

Often times, these “harsh” interactions address sensitive topics like sexual roles in relationships, the visibility of gayness, and sexual promiscuity—subjects usually explored by those who aim to verbally attack LGBTQ people.

But when directed at each other by members of the LGBTQ community, these comments may come from a place of solidarity, not malice. {snip}

Hate speech is often predicated on underlying messages, as well. When subtext promotes hateful or discriminatory ideas, it represents a threat for marginalized and vulnerable groups. By training its algorithm to learn what content is likely to be considered toxic, Perspective’s tool seems to be giving more prevalence to words, rather than their underlying messages.

{snip}

If this AI tool were empowered to decide which tweets should be removed, many of the drag queens’ posts would be suppressed. In fact, Perspective is already making such decisions.

{snip}

The problem: Such AI tools may be developed using biased training data, posing threats to the self-expression and visibility of vulnerable groups. {snip}

{snip} If AI tools focus on misleading signals—such as the use of specific words, rather than a message’s intent—such models will make little progress in removing hate speech.

AI tools have the potential to shape the way we communicate. If computers indiscriminately decide what is “toxic,” tech has the power to both impact our modes of expression online and severely limit the inclusiveness of the internet.