Posted on December 18, 2018

Two Brains Are Better Than One: AI and Humans Work to Fight Hate

Glen Martin, California Magazine, December 17, 2018

{snip} About two years ago, Claudia von Vacano, executive director of UC Berkeley’s social science D-Lab, had a chat with Brittan Heller, the then-director of technology and society for the Anti-Defamation League (ADL). The topic: the harassment of Jewish journalists on Twitter. Heller wanted to kick the offending trolls off the platform, and Vacano, an expert in digital research, learning, and language acquisition, wanted to develop the tools to do it. Both understood that neither humans nor computers alone were sufficient to root out the offending language. So, in their shared crusade against hate speech and its malign social impacts, a partnership was born.

Developers anticipate major social media platforms will use the Online Hate Index to recognize and eliminate hate speech rapidly and at scale.

{snip}

{snip} Unfortunately, hate speech is as slippery as it is loathsome. It doesn’t take a very smart AI to recognize an overtly racist or anti-Semitic epithet. But more often than not, today’s hate speech is deeply colloquial, or couched in metaphor or simile. The programs that have been developed to date simply aren’t up to the task.

That’s where Vacano and Heller come in. Under Vacano’s leadership, researchers at D-Lab are working in cooperation with the ADL on a “scalable detection” system — the Online Hate Index (OHI) — to identify hate speech. The tool learns as it goes, combining artificial intelligence, machine learning, natural language processing, and good old human brains to winnow through terabytes of online content. Eventually, developers anticipate major social media platforms will use it to recognize and eliminate hate speech rapidly and at scale, accommodating evolutions in both language and culture.

“The tools that were — and are — available are fairly imprecise and blunt,” says Vacano, “mainly involving keyword searches. They don’t reflect the dynamic shifts and changes of hate speech, the world knowledge essential to understanding it. [Hate speech purveyors] have become very savvy at getting past the current filters — deliberately misspelling words or phrases.” Current keyword algorithms, for example, can be flummoxed by something as simple as substituting a dollar sign ($) for an “S.”

{snip}

But no matter how well intentioned, any attempt to control speech raises Constitutional issues. And the First Amendment is clear on the matter, says Erwin Chemerinsky, the dean of Berkeley Law.

“First, the First Amendment applies only to the government, not to private entities,” Chemerinsky stated in an email to California. “Second, there is no legal definition of hate speech. Hate speech is protected by the First Amendment.” Unless it directly instigates violence, that is, an exception upheld in the 1942 Supreme Court decision, Chaplinksy v New Hampshire.

In other words, the platforms can decide what goes up on their sites, whether it’s hateful or not. Vacano acknowledges this reality: D-Lab, she says, isn’t trying to determine the legality, or even appropriateness, of moderating hate speech.

“We are developing tools to identify hate speech on online platforms, and are not legal experts who are advocating for its removal,” Vacano stated in response to an email query. “We are merely trying to help identify the problem and let the public make more informed choices when using social media.” And, for now, the technology is still in the research and development stage.

“We’re approaching it in two phases,” says Vacano. “In the first phase, we sampled 10,000 Reddit posts that went up between May through October of 2017. Reddit hadn’t implemented any real means for moderating their community at that point, and the posts from those months were a particularly rich trove of hate speech.”

D-Lab initially enlisted ten students of diverse backgrounds from around the country to “code” the posts, flagging those that overtly, or subtly, conveyed hate messages. Data obtained from the original group of students were fed into machine learning models, ultimately yielding algorithms that could identify text that met hate speech definitions with 85 percent accuracy, missing or mislabeling offensive words and phrases only 15 percent of the time.

Though the initial ten coders were left to make their own evaluations, they were given survey questions (e.g. “…Is the comment directed at or about any individual or groups based on race or ethnicity?) to help them differentiate hate speech from merely offensive language. In general, “hate comments” were associated with specific groups while “non-hate” language was linked to specific individuals without reference to religion, race, gender, etc. Under these criteria, a screed against the Jewish community would be identified as hate speech while a rant — no matter how foul — against an African-American celebrity might get a pass, as long as his or her race wasn’t cited.

Vacano emphasizes the importance of making these distinctions. Unless real restraint is exercised, free speech could be compromised by overzealous and self-appointed censors. D-Lab is working to minimize bias with proper training and online protocols that prevent operators from discussing codes or comments with each other. They have also employed Amazon’s Mechanical Turk — a crowdsourcing service that can be customized for diversity — to ensure that a wide range of perspectives, ethnicities, nationalities, races, and sexual and gender orientations were represented among the coding crew.

So then, why the initial focus on Reddit? Why not a platform that positively glories in hate speech, such as the white nationalist site Stormfront?

“{snip} We wanted a mainstream sample, one that would provide a more normal curve, and ultimately yield a more finely-tuned instrument.”

With proof of concept demonstrated by the Phase 1 analyses of Reddit posts, Vacano says, D-Lab is now moving on to Phase 2, which will employ hundreds of coders from Mechanical Turk to evaluate 50,000 comments from three platforms — Reddit, Twitter, and YouTube. {snip}

Vacano expects more powerful algorithms and enhanced machine learning methods to emerge from Phase 2, along with a more comprehensive lexicon that can differentiate between explicit, implicit, and ambiguous speech in the offensive-to-hate range. Ultimately computers, not humans, will be making these distinctions––a necessity given the scope of the issue. But is that cause for concern?

Erik Stallman, an assistant clinical professor of law and the faculty co-director of the Berkeley Center for Law and Technology, says that, because of the ubiquity of social media, attempts to moderate online hate speech are based on sound impulses.

{snip}

In any attempt to control hate speech, says Stallman, “the community of users should have notice of, and ideally, input on the standards. There should also be transparency — users should know the degree of automated monitoring, and how many posts are taken down. And they should also have some form of redress if they feel their content has been inaccurately flagged or unfairly taken down.”

{snip}

“We’ll need to continually update,” Vacano says. “Our work will never be done.”