Study: Tweets by Black Folks One and a Half Times More Likely to Be Flagged as 'Offensive' by Algorithms Used to Detect Hate Speech

Artificial intelligence technology used to identify racist and violent speech on social media may actually be amplifying racial bias, according to a pair of newly released studies.

A recent report by Washington University researchers examined hate speech detection algorithms and found that the leading AI models were one–and–a–half times more likely to flag tweets authored by Black people as offensive or hateful, according to Recode. Moreover, tweets written in African-American Vernacular English or (AAVE) were more than twice as likely to be flagged.

Researchers found that AI algorithms used by social sites to detect hate speech were biased against African-American language and more likely to flag posts containing it as “offensive.” (Photo:
Witthaya Prasongsin / Getty Images)

A second study out of Cornell University uncovered similar patterns of racial bias against Black lingo/speech after researchers combed through five academic data sets — which contained some 155,000 online posts — used to study hate speech.

So what’s the reason behind the bias? Researchers say its due in large part to the fact that humans, who usually aren’t part of said cultural community, teach these algorithms what’s offensive and what’s not without knowing the context of the posts in question. For instance, a tweet containing the N-word or the term “queer” could be considered offensive in some settings but not in others, depending on how the author used them.

“What we are drawing attention to is the quality of the data coming into these models,” Thomas Davidson, a researcher at Cornell, told the outlet. “You can have the most sophisticated neural network model, [but] the data is biased because humans are deciding what’s hate speech and what’s not.”

Ph.D. student and Washington University researcher Maarten Sap echoed this sentiment, arguing that the content moderators of said algorithms need to “be more mindful of minority group language that could be considered ‘bad’ by outside members.”

The issue comes at during time when tech giants like Google, Facebook and Twitter are turning to natural language processing tools designed to detect hate speech on online platforms. According to Recode, the companies and others have looked to academics for guidance on how to enforce standards around hateful and offensive language.

“[But], if top researchers are finding flaws in widely used academic data sets, that presents a significant problem for the tech industry at large,” the outlet notes.

As part of their research, Sap and his colleagues primed workers tasked with labeling tweets from a particular data set to consider the online user’s dialect and race when deciding if their tweet was offensive or not. What they found was that when moderators were more informed about the person behind the post, they were much less likely to flag the content as potentially offensive.

In fact, racial bias against tweets containing Black speech fell by 11 percent, according to the report.

Ensuring moderators more educated about the users writing these tweets is a start to combating racial bias, researchers and other advocates argue. However, some critics fear that giving moderators more context might open the door for further criticism — and bias.

Related Stories

‘Ban Me’ Too: Snoop Dogg Vows to Continue Posting Messages from Louis Farrakhan After Facebook Bans the Minister from Its Platforms

Long Island Rail Road Passenger Caught Verbally Assaulting Black Woman Charged with Hate Crime

‘White Identity’ Signs Pop Up on Vermont Campuses to Remind Other Whites of Their Privilege