The dispute between Meta Platforms, Inc. and Bright Data Ltd. is one of the most design-relevant web scraping cases in recent years because it forces a concrete question: at what point does âscraping public web dataâ turn into a contractual or unlawful access problem? This guide uses the U.S. District Court for the Northern District of Californiaâs January 23, 2024 summary-judgment order (focused on contract/Terms claims) as the anchor, then turns that reasoning into practical scraping requirements across four engineering levers: login state, how âpublicâ the target data really is, consent/contract boundaries, and how you treat technical restrictions.
Conclusion
Here are the most actionable takeaways from Meta vs. Bright Data for real-world scraping design.
- Collecting âpublic information viewable while logged outâ can fall outside the scope of certain Terms clausesâat least under the contract interpretation in this specific case.
- By contrast, authenticated areas, non-public data, misuse of credentials, bypassing access controls, and high-load crawling can raise risk quickly via other routes (contract, unauthorized access theories, privacy, and more).
- To build a defensible âlegal-by-designâ scraper, treat these as requirementsânot afterthoughts: lock the public scope, manage consent/contract boundaries, minimize collection, and donât build your roadmap around bypassing blocks.
Key points from the order
In this case, the court closely examined Metaâs contract claims (Facebook/Instagram Terms) and concludedâon the record presentedâthat Metaâs Terms did not, as a matter of contract interpretation, prohibit scraping and selling public Facebook/Instagram data gathered while logged out. The result was favorable to Bright Data on the breach-of-contract theory at summary judgment.
The login question is the hinge
The court paid close attention to whether the relevant Terms provisions presuppose âuseâ of the service as an account holder, and whether that concept naturally extends to a visitor who is not logged in. Practically, the line that matters for engineers is: you often canât assume that logged-out collection of public pages is automatically a Terms breach under the same clauses that clearly govern logged-in usersâeven if the platform argues otherwise.
Insufficient proof of non-public collection
The order also highlights an evidence problem: Meta did not sufficiently demonstrate that Bright Data scraped and sold protected (non-public) user data in a way that required logged-in access. Put differently, if a plaintiff wants to litigate ânon-public scraping,â they need to prove specifics: which data, why it was non-public at the time, and how it was obtained. That makes operational logging and technical facts (crawl traces, access paths, auth state, timestamps) centralânot optional.
Reference: Meta Platforms, Inc. v. Bright Data Ltd. court order
Principles for âlegal-by-designâ scraping
This is where the engineering work starts. Donât overread the order as âyou can scrape anything.â Instead, treat it as a framework for turning legal risk into concrete product requirements and guardrails.
Lock the public scope
- Restrict crawling to URLs that are reachable while logged out.
- Design the system so session cookies, access tokens, and authenticated state cannot leak into the fetch pipeline.
- Limit extraction to what is visible in rendered HTML; donât pivot into private/internal APIs or admin endpoints.
Make consent and contract boundaries explicit
Terms enforcement often depends on who agreed to the Terms and in what state that agreement was formed. If you log in (or route through a logged-in flow), you increase the chance that the platform can frame your automation as âuseâ governed by the Terms you accepted. If your goal is logged-out public collection, treat ânever enter the login flowâ as a product requirementânot just an implementation detail.
Also note: winning (or fitting) a contract interpretation does not make the broader risk disappear. Bypassing access controls, processing personal data, copyright/unfair competition theories, and jurisdiction-specific privacy laws can all create independent exposure.
Minimize what you collect
Even if a profile is âpublic,â large-scale extraction of highly identifying attributes (names, emails, home/work addresses, face images, precise location signals, etc.) can dramatically increase privacy obligations, regulatory risk, and your operational burden for deletion requests and audits. Put these into requirements: âonly the columns needed for the stated purpose,â âshort retention,â and âavoid re-identification.â
Specify load control and âpoliteâ behavior
Long before a legal dispute, abusive traffic patterns get you blocked and can trigger complaints, disclosure demands, or escalation. Make this non-negotiable: rate limits, exponential backoff, caching, and incremental/differential fetching by default.
Stop building around bypass
Proxies and automation can be legitimate for reliability (redundancy, geo routing, stable collection). But the moment your requirements read âmust bypass blocks,â you move closer to arguments about circumvention and unauthorized access. Prefer allowed paths (official APIs, licensed data, explicit permissions). If the target denies accessâby block, warning, or cease-and-desistâyour design should be able to stop cleanly.
Design by issue area
Logged-out collection
The case underscores how important âpublic while logged outâ is. In practice, a defensible design tends to include these rules:
- Stop immediately if youâre redirected to a login screen (authentication required = not public for your purposes).
- Automatically test that content is retrievable without cookies.
- If language/region changes whatâs public, fix and document the collection conditions (locale, country routing, headers).
Logged-in areas
Automating access to login-required areas sharply increases contractual riskâand it also forces you to manage credentials properly (user authorization, delegation, audit logs, secure storage). If the business requirement is real, prioritize official APIs or a data access agreement over âweb UI automation.â
Personal data
Public profiles are not a free pass. Processing personal data can trigger a different class of requirements: privacy law compliance (EU/US state laws/other jurisdictions), platform policy constraints, and internal privacy governance (DPIA-style risk assessment, deletion workflows, and request handling). Treat this as an operational design problem, not just a legal footnote.
Practical review table
Use this comparison table in design reviews to quickly spot where risk clusters.
| Dimension | Lower-risk design | Higher-risk design |
|---|---|---|
| Access state | Only URLs viewable while logged out | Login required; session-dependent |
| Data scope | Purpose-minimal; aggregated/anonymized where possible | Bulk collection of identifiers; long-term retention |
| Method | Browser-equivalent behavior with load controls | âEvasionâ/limit-breaking as a product requirement |
| Consent/contract | Maintain a flow where no agreement is formed | Automate activity that conflicts with Terms after assent |
| Stop conditions | Stop immediately on denial, blocks, or warnings | Keep going and route around enforcement |
Common misunderstandings
Public does not mean unrestricted
Even when information is publicly viewable, that doesnât automatically grant unlimited redistribution or commercial reuse. Limits can come from multiple directions: contract, copyright, privacy, site policies and norms (including robots.txt as an operational signal), and unfair competition theories.
What this order doesâand does notâcover
This was a U.S. case with a specific procedural posture and a specific focus: Metaâs Terms-based breach-of-contract claims. Change the country, state, court, causes of action, or facts (especially around authentication, protected data, or circumvention), and the analysis can change materially. For any production deployment, pair legal review in the target jurisdiction with an evidence-ready logging design (so you can explain what was collected, when, and under what access state).
Need guardrails for public-data scraping?
If youre building a scraper that targets logged-out pages, the hard part is turning legal assumptions into enforceable requirements. We can help you define scope boundaries, consent/ToS controls, data-minimization rules, and stop conditions that stand up in production.
Summary
The Meta v. Bright Data order reinforces a practical engineering lesson: at least on the contract theory litigated here, âpublic pages accessible while logged outâ are treated differently from authenticated or protected data. In implementation terms, build around five requirements: (1) lock the public scope, (2) manage consent/contract boundaries, (3) minimize collection and retention, (4) control load and implement stop conditions, and (5) donât design around bypass.
References