Meta's Content Moderation Dilemma: Harmful to Whom?

The argument that we should continue stifling speech to protect vulnerable communities ignores how strict content moderation works in practice.

Jan 17, 2025

After the Hamas attack on October 7, 2023, Meta made some significant and controversial changes to its Dangerous Organizations and Individuals (DOI) Policy. A Human Rights Watch report revealed that over 1,050 posts from Palestinians and their supporters were removed across Facebook and Instagram. The content—often related to alleged human rights abuses—was swept away in the name of safety.

Yet, nearly a year later, the Anti-Defamation League (ADL) accused Meta of not doing enough to combat antisemitism. The ADL also demanded the platform ban the phrase "from the river to the sea," citing it as hate speech.

So, was Meta’s platforms a bastion of pro-Israel propaganda? Or were they rife with torrents of antisemitic hate? It’s likely true that both groups felt that Meta’s policies were inadequate, but these conflicting outlooks highlight a persistent issue for platforms trying to moderate "harmful" content: harmful to and defined by whom?

Unfortunately, it appears that many progressive and minority groups have forgotten recent history when they were frequently — and rightly — raising concerns about over-moderation on Meta’s platforms.

When Meta announced that it would be doing away with fact-checkers and scaling back its policing of hate speech on its platforms, several of these same groups sounded the alarm. The move has been called “dangerous to users,” “a major setback to public safety,” capable of “real-world harms” and even “a precursor to genocide.”

It appears that Meta’s critics want platforms to police content in a way that keeps “good” speech while hiding stuff they find offensive or harmful — until those policies censor them, of course.

But making a “perfect” content moderation policy and enforcement scheme is nearly impossible; algorithms and human moderators, often imperfect arbiters of context and nuance, tend to leave many dissatisfied and often silence the voices such policies are designed to protect. More generally. people disagree vehemently about how to define hate speech, its thresholds, and protected characteristics.

Meta’s Censorship of LGBTQ+ Community

Meta’s content moderation has long affected LGBTQ+ users disproportionately. In 2017, Facebook’s algorithms flagged reclaimed slurs like “dyke” and “fag” as hate speech, even when used self-referentially. In 2020, Meta blocked a same-sex couple’s innocuous album promotion video, mislabeling it as "sexually explicit content."

LGBTQ+ activists also called out Instagram for censoring images of the bare chests of transgender users who were promoting a fundraiser for top surgery, which the platform said violated its Adult Nudity and Sexual Activity Policies. More recently, Instagram Teen Accounts, designed for users under 18, barred access to LGBTQ+-related posts and hashtags, deeming them as "sensitive content."

In regions like the Middle East and North Africa, automated moderation suppressed LGBTQ+ content, silencing marginalized voices. These errors highlight how blunt tools in the name of safety often stifle expression and erase the experiences of vulnerable communities.

While Meta’s moderation policies have been criticized as being overly restrictive to LGBTQ+ communities, it has also banned gender-critical accounts for violating community guidelines, thus silencing many women. Even the most aggressive content moderation will never please every group without prioritizing one’s taboos over others.

Misjudged Irony and Political Satire

Meta’s algorithmic and human moderators have also stumbled on subtlety.

In 2020, when political cartoonist Matt Bors posted an image mocking the Proud Boys, Facebook removed the post for "advocating violence” and put him on probation. Prior to this, Bors also had a cartoon that critiqued former President Trump’s COVID response removed for “spreading misinformation.”

Facebook removed the image of another prominent cartoonist named Ed Hall, who posted a cartoon comparing Israeli Prime Minister Benjamin Netanyahu’s treatment of Palestinians with Nazi Germany’s treatment of Jews.

In November of last year, Meta’s Threads suspended Washington Post reporter Amanda Katz for a post criticizing Hitler and then removed a post from her colleague, who posted that Katz’s response to the suspension was, “I stand by my views of Hitler. Not a good guy.”

Automated moderation was not always to blame in many of these cases. In October 2024, the head of Instagram admitted that moderation issues on Instagram and Threads could be attributed to human moderators who did not have “sufficient context.”

Meta’s struggles with satire and irony illustrate how no amount of manpower and complex algorithms will ever be able to perfectly evaluate context and syntax to make moderation decisions.

Over-Removal of Legal Speech

Meta’s press release announcing its content moderation policy changes states that its strict content moderation approach has “gone too far” where “too much harmless content gets censored.”

That’s because vague and broad policies necessarily create confusion about where to draw the line on whether something is harmful, ironic, or offensive. A lot of that comes down to the subjective interpretation of individual moderators and the biases built into algorithms.

It also results from political and legal pressure to “crackdown” on disfavored speech that tends to make platforms and their moderation teams risk-averse and overly cautious, removing more speech than intended to be on the safe side. Meta leadership was grilled on Capitol Hill, facing intense political pressure with multiple hearings and headlines on the role of misinformation in the 2016 elections.

This pressure has resulted in a sharp “scope creep” whereby hate speech policies designed to target things like harmful racism now cover a wide range of speech, including stereotypes and conspiracy theories. This shift was confirmed by a 2023 report from The Future of Free Speech, which analyzed the policies of eight major platforms, including Facebook and Instagram.

We have seen the other extreme of this phenomenon play out in Europe, where platforms have been subjected to both political and legal pressures to take down illegal content. In another recent report published by The Future of Free Speech, we examined content removals on Facebook in France, Sweden, and Germany. Each of these countries had applicable laws determining illegal speech, with Germany’s (now repealed) NetzDG having the most serious consequences for insufficient content moderation.

Our research found that a staggering majority — between 92.1% and 99.7% — of the content removed from Facebook was legally permissible. The Digital Services Act (DSA), Europe’s online safety rulebook that dictates how very large platforms must moderate illegal and harmful content, will likely exacerbate this problem. Preliminary reports indicate the DSA’s proposed Code of Conduct on Hate Speech will increase the pressure on platforms to remove illegal hate speech within 24 hours or face DSA investigations that, if found to be non-compliant, carry the risk of significant fines.

As a result, companies will have incentives to overcensor even more than they do now. This doesn’t just alienate users; it chips away at trust and the open discourse foundational to democracy.

A Never-Ending Dilemma

Meta’s past struggles reveal the impossibility of crafting a one-size-fits-all moderation policy. As we outlined in our “scope creep” report, attempts to address harmful content face three inherent contradictions:

No policy satisfies all interest groups.
Algorithms and moderators often fail to account for context (because often context is unknowable).
Over-removal frequently stifles legitimate speech, including that of marginalized groups.

Global social media platforms host content from a plethora of cultures, societies, and religions with, as one study argued, “few, if any, shared understandings as to what amounts to intolerable speech.” Due to the competing interests of a pluralistic society, attempts to control some speech on a platform in order to protect specific groups will not strengthen free speech overall.

Of course, there are extreme cases where there’s little ambiguity about harmful content, such as Meta’s failure to moderate clear violations of hate speech that played a role in systematic ethnic violence in Myanmar. But most speech does not rise close to this level, and it’s important to remember that this incident was driven by a government-initiated and coordinated campaign of incitement against a minority group within its own population.

While the efforts of Meta’s critics to safeguard online communities are noble, moderation policies that prioritize safety above expression will frequently backfire on the very groups such policies are designed to protect. There’s no good solution for this problem on global centralized platforms like Facebook, Instagram, X, etc.

However, at The Future of Free Speech, we think that such platforms should err on the side of protecting speech over safety. Thinking ahead, we also think that decentralization and user empowerment over their own feeds and algorithmic curation have the potential to avoid the collateral damage of centralized content moderation while giving users tools to avoid content they find offensive, hateful, or just not worth the attention.

Jacob Mchangama is the Executive Director of The Future of Free Speech and a research professor at Vanderbilt University. He is also a senior fellow at The Foundation for Individual Rights and Expression (FIRE) and the author of Free Speech: A History From Socrates to Social Media.

Thanks for reading The Bedrock Principle! This post is public, so feel free to share it.