Content Moderation Makeover: Meta’s Changes Are a Mixed Bag

Jeffrey Howard

Jan 2911 min read

Facebook and Meta logos floating against a light gray background, showcasing blue symbols. Modern and clean design.

On January 7, 2025, Meta CEO Mark Zuckerberg posted an announcement spelling out a series of changes to his company’s content policies, which apply to users’ posts on Facebook, Instagram, and Threads. In the announcement, Zuckerberg defended the changes on ethical grounds, arguing that in seeking to strike the balance between protecting free speech and preventing harm, Meta had gotten the balance wrong by suppressing too much content.

There is no shortage of punditry debating the sincerity of this rationale. Were the changes truly motivated by a change of heart on the ethics of content moderation? Or were they a shrewd business move to placate the new Trump administration, which has signalled hostility to Silicon Valley? I leave the motivational question to the side. Here I focus on whether Meta’s changes are, in themselves, ethically defensible.

Text on a page outlines dehumanizing comparisons, with parts highlighted in green or crossed out in red. The section header reads "Tier 1". — Example from Meta's current hate speech policy with tracked changes as of Jan 29, 2025.

Some of the changes reopen longstanding philosophical disputes about hate speech which I won’t attempt to sort through here. Meta has partly relaxed its policy on hate speech; some content concerning immigration, gender, gender identity, and sexual orientation that is currently removed will be permitted. (To see these changes, visit Meta’s hate speech policy and under the “change log” on the left-hand margin, click “Jan 8, 2025” to see in-line edits to the policy.) These changes have received no shortage of commentary. They raise the perennial question of when bigoted or otherwise misguided speech should be answered with counter-speech, and when they are so dangerous that suppression is the right solution.

Other changes raise new questions about the proper footprint of political content in our public discourse. Since 2021 Meta has demoted (a.k.a. deamplified) “civic content” across its platforms, reducing the visibility of speech about political, electoral, and social issues. So even users who are keen to engage with this content will have found themselves much less likely to encounter it in recent years. To my mind, a blanket demotion of political discourse is very hard to justify. This is especially so in virtue of these platforms’ publicly articulated self-conception as valuable forums for communication. Demotions can be justified in many cases, but this isn’t one of them.

The rest of this post focuses on two of the more philosophically complex substantive changes: the first concerns fact-checking, and the second concerns policy enforcement. These changes aren’t nearly as crazy as much of the commentary suggests. Even so, I believe they are largely misguided, and I will explain why.

Fact-Checking

What is the change?

Meta is partially eliminating its current fact-checking program for misinformation. For several years Meta has worked with a range of external partners to label false content—“particularly clear hoaxes that have no basis in fact.” Fact-checked misinformation receives a label indicating that the post has been deemed false, with links to external sources providing more information. Fact-checked misinformation is also demoted, so it will be less likely to be recommended in people’s feeds or discovered in search results. (This system also labels content that is “altered” – e.g., deceptively edited videos – as well as content deemed “partly false” or “missing context”.) Of course, given how much false content exists, such a system will only ever label a fraction of it, focused on the most viral content.

This program, which Zuckerberg once rightly told Congress is “industry-leading,” will be dismantled—albeit, for now, only in the United States. In its place will be a system of “community notes” similar to the one in operation on Elon Musk’s X platform, in which posts will be labelled false and misleading only if a sufficiently large and diverse cross-section of users themselves flag them as such.

Is the change defensible? In thinking through this question, I make three philosophical assumptions about the ethical duties of large social media platforms like Facebook—each with implications for misinformation policy.

Platforms’ Misinformation Duties

First, platforms have a responsibility to allow a wide range of views to be aired, enabling users to pursue their interests as speakers and audiences. This is grounded, at minimum, in the public commitments that companies have voluntarily made to serve as forums for free expression (regardless of whether they have preexisting duties to serve this role). This duty means that contested empirical claims that are subject to legitimate ongoing debate must not be moderated as misinformation. (Some unduly aggressive fact-checking during the pandemic underscores this point.) Moreover, plenty of clear falsehoods (e.g., misstating how many moons Jupiter has) are utterly innocuous, and obviously should not face moderation.

Second, platforms have a responsibility to guard against users abusing their network to cause harm to others. I’ve argued that this duty flows from the responsibility that all entities have to defend others from harm when they do so at reasonable cost, as well as from the special responsibility of those who own or control a space to guard against its weaponization. Manifestly false or baseless claims that credibly lead to serious real-world harm should be removed. Crucially, Meta has a misinformation policy prohibiting such dangerous content, which is completely separate from its fact-checking system. We have every indication that Meta will continue to enforce this policy—a point curiously absent from nearly all recent journalistic coverage. So, misinformation that risks inspiring violence, could lead to medical harm (e.g., vaccine denialism), or interferes with elections (e.g., falsehoods about voting dates or eligibility) will continue to be banned on Meta. Critiques that Meta’s end to fact-checking will have “disastrous consequences” overlook this crucial point.

We have every indication that Meta will continue to enforce this [misinformation] policy—a point curiously absent from nearly all recent journalistic coverage.

Even so, platforms have a further responsibility, which is the one I believe is potentially violated by this policy change. Specifically, platforms have a responsibility to contribute to the epistemic and rational health of public discourse. This is grounded in the civic responsibility, shared by all, to help establish and maintain well-functioning democratic institutions. Meta’s program of fact-checking and demoting misinformation enabled the company, in part, to discharge this responsibility. By dismantling the program that it has built, Meta is at risk of failing to live up to this civic responsibility in the United States.

Take the example of climate change misinformation. The policy shift means that Meta will no longer attach fact-check labels to claims about climate change that are deemed manifestly false by the scientific community. As a result, such speech won’t be demoted either, and thus can be algorithmically amplified—going viral—like any other content. Our societies are in desperate need for a rational conversation about the dangers of climate change and how they should be confronted, and platforms have a duty to support that conversation. The fact that many people have been duped by unscrupulous corporate propagandists into doubting that climate change is real does not transform this into a matter of legitimate, reasonable debate; as NASA says, the science is “unequivocal.”

Confronting Perceptions of Bias

The company’s assertion is that the abandonment of fact-checking was necessary because its own third-party fact-checker partners were “biased”. Given that its own hand-picked partners included a range of groups from different political perspectives, including conservatives, I am not persuaded this was true (e.g., the right-leaning Dispatch was already part of its network, as were unabashedly neutral voices like Reuters). But the political headache Meta’s fact-checking system has caused in the U.S. illuminates a deeper problem: even if the system isn’t biased, it may be perceived as biased by an important audience. And if audiences don’t regard fact-checkers as trustworthy, then, it’s plausible to assume that fact-checking will prove ineffective in improving their beliefs.

The problem is that this empirical assumption—that fact-checking is ineffective among audiences who distrust the fact-checkers—looks false. One study in Nature showed, surprisingly, that even those who distrust fact-checkers find the accuracy of their beliefs improved through exposure to fact-check labels. If the rationale for fact-checking is to improve users’ beliefs, it looks like platforms can succeed in this rationale even if the trustworthiness of platforms is impugned.

This isn’t to say that perceptions of bias aren’t a problem. They clearly are, undermining trust in the entire enterprise of content moderation. Meta could have attempted to remedy these misperceptions by deepening its engagement with right-leaning outlets. Moreover, given the evidence that user-generated community notes elicit greater trust (albeit for complicated reasons), Meta could have explored ways to integrate community notes into its preexisting architecture.

Fact-checking is hard. ... But the response isn’t for platforms to abandon the project of fact-checking misinformation, on which Meta has shown leadership in the past years. It's for the rest of us to learn that while the job must be done, it is difficult, and perfection isn’t on the menu.

But community notes, all by themselves, are insufficient for Meta to discharge its responsibility. Misinformation poses a problem precisely when people are disposed to believe it; a community note on a convincing hoax may not receive much traction. Indeed the signs from X’s community notes system are not encouraging, even though we can expect Meta to implement a better version given the comparative size and expertise of its content moderation teams. Moreover, even if community notes are successful at catching some genuinely harmful content, like climate misinformation, it remains unclear whether such content would then be demoted (as fact-checked misinformation currently is). A chief benefit of the current system is that it helped prevent fact-checked misinformation from going viral; in the U.S., we will now lose precisely this benefit.

Fact-checking is hard. That the public tends to excoriate social media companies for anything falling short of perfection puts companies in a politically difficult situation. But the response isn’t for platforms to abandon the project of fact-checking misinformation, on which Meta has shown leadership in the past years. It's for the rest of us to learn that while the job must be done, it is difficult, and perfection isn’t on the menu.

Enforcement Changes

The next change concerns how Meta enforces its rules. Meta is altering its approach to enforcement in a couple of ways: it will require greater confidence that a potential violation is in fact a violation before removing it; and it will proactively police the platform only for severe violations, such that less severe violations will be moderated only after reported by users.

Start with the requirement of greater confidence. The issue here is that content moderation always involves false positives: content that is erroneously flagged as violating rules, even though it doesn’t. (False positives contrast to false negatives: content that is erroneously overlooked or deemed permissible, even though it does violate the rules). In its announcements, Meta estimated that in each batch of 10 posts that get removed, likely 1 or 2 are false positives.

The question, then, is whether platforms should err on the side of removing speech, or on the side of leaving it up.

It is easy to insist that platforms should eliminate false positives, taking down only content that violates its rules. But as Evelyn Douek notes, “Given the unfathomable scale of content moderation, errors are inevitable.” AI-powered filters (“classifiers”) are an important front line of defence, often taking the initial judgement as to whether a post is a violation or not (essentially a prediction based on their training data). These filters’ predictions can have varying confidence. How confident must a filter be before deeming a post violative? If a platform requires certainty, in order to reduce false positives, it will end up leaving up more harmful content (false negatives). If a platform requires lower confidence, to ensure more harmful content is detected, it will indeed catch more harmful content, but in so doing it will sweep up more legitimate content as collateral damage.

The question, then, is whether platforms should err on the side of removing speech, or on the side of leaving it up. Meta’s view is that it has made “too many mistakes”, and so it should tune its filters to require greater confidence. As Meta executive Joel Kaplan put it, “we’re going to tune our systems to require a much higher degree of confidence before a piece of content is taken down.” This isn’t a crazy view. But what is misguided is the implication that greater confidence will be required across all policy areas. If this is true, my objection is that the appropriate level of confidence should vary, depending on the trade-offs in each area.

For example, it is imperative that platforms detect and remove child sexual abuse and exploitation content, as this content is egregiously harmful. Classifier over-sensitivity is a virtue, not a vice, in this domain. The same goes for speech coordinating human trafficking, or organising terrorist attacks. I don’t want to downplay the badness of false positives in these areas – which is why it is crucial that users continue to be empowered to appeal so that genuine mistakes are caught and redressed. But relaxing the sensitivity of initial filters seems like a mistake. Better to over-remove at the first sweep, and then fix mistakes down the line. It can be reasonable to expect people to put up with mistakes for the sake of preventing serious harm, especially if they can appeal those mistakes.

In contrast, in other policy areas it may well make sense to increase the amount of confidence required before speech is removed. Some speech that is automatically detected as borderline to hate speech may involve lots of legitimate discussion of difficult social issues; some speech that is detected as borderline to self-harm content may involve legitimate discussion of mental health; and some speech that is detected as borderline to, say, vaccine denialism may likewise involve legitimate expressions of medical concern. Tuning classifiers to require greater confidence in such cases is defensible, to ensure that legitimate debate isn’t stifled.

As Zuckerberg said, “It is time to focus on reducing mistakes.” Agreed. But not all mistakes are equal; we should be more willing to tolerate mistakes to prevent egregious harms than we shouldn’t be willing to tolerate to prevent milder harms.

We don’t want the teenage kids to encounter bullying content only then to have to report it; we want them to be spared the burdens of encountering it in the first place.

This leads me the other change to enforcement: for “low severity” violations, there will be no proactive AI policing for such content; they will be removed only if reported by an individual user. Here, too, Meta’s concern is that too many mistakes are being made. But why isn’t the previous solution – ratcheting up the sensitivity of classifiers – the solution here? As I’ve said, it’s reasonable to require greater confidence in using filters to reduce mistakes; but why abandon filters entirely? Meta’s teams have built elaborate and impressive technology to protect users from harmful content before they encounter it. Why stop using it?

Consider harms like Bullying and Harassment; We don’t want the teenage kids to encounter bullying content only then to have to report it; we want them to be spared the burdens of encountering it in the first place. To be sure, the more heinous content, like encouraging people to harm themselves, will still be policed. But if certain content is against Meta’s rules, and Meta has the capacity to detect it, it is unclear to me why we should prefer a world where content is actioned only after it causes harm. The fact that these harms are lower severity isn’t a sufficient reason.

not all mistakes are equal; we should be more willing to tolerate mistakes to prevent egregious harms than we shouldn’t be willing to tolerate to prevent milder harms.

In sum, Meta’s changes are a mixed bag. They don’t hasten the apocalypse in the manner some critics have alleged; the contention by a former employee that they are a “precursor for genocide” seems decidedly hyperbolic. My suspicion is that much of the good work that Meta’s teams are doing on content moderation will carry on; this exercise is, as much of anything, a PR manoeuvre to defuse the ire of Trump. Still, there is plenty here to criticize. Part of the point of academic work in this space is precisely to hold public and private policymakers’ feet to the fire.

Jeffrey Howard is Professor of Political Philosophy and Public Policy at University College London. He is Director of the Digital Speech Lab, Co-Editor of the journal Political Philosophy, and Senior Research Associate at the Institute for Ethics in AI at Oxford University.

Disclaimer: Any views or opinions expressed on The Public Ethics Blog are solely those of the post author(s) and not The Stockholm Centre for the Ethics of War and Peace, Stockholm University, the Wallenberg Foundation, or the staff of those organisations.

Content Moderation Makeover: Meta’s Changes Are a Mixed Bag

Fact-Checking

What is the change?

Platforms’ Misinformation Duties

Confronting Perceptions of Bias

Enforcement Changes

Recent Posts

1 Comment

Join our mailing list for post alerts