Web Scraping Under GDPR: What B2B Teams Need to Know in 2026
Kaspr built a ~160-million-contact database from professional profile visits and other sources, including domain registries. In December 2024, the French DPA fined them EUR 200,000 and ordered them to stop collecting data from people who'd limited their profile visibility, stop automatically renewing data storage, and respond to access requests with full source disclosure. The fine was modest. The operational orders were devastating.
If you're scraping contact data for B2B outreach - or buying from someone who does - this is the regulatory environment you're operating in right now.
The Short Version
Scraping personal data isn't illegal under GDPR, but it's legally risky by default. "Public" doesn't mean "free to use." Legitimate interest is the only viable legal basis for commercial scraping, and it requires a documented three-part test that most scrapers skip entirely. For most B2B teams, the safest path is to skip scraping altogether and use a GDPR-compliant data provider instead of building scraping infrastructure and a compliance department to match.
Does GDPR Apply to Your Scraping?
Almost certainly. Here's the quick test:
- Does the scraped data identify a person? Names, email addresses, job titles tied to a company, phone numbers - all personal data under GDPR.
- Is the data "publicly available"? Doesn't matter. The Dutch DPA's position is blunt: scraping "almost always involves personal data," and public availability doesn't strip GDPR protection.
- Are you scraping EU residents' data? GDPR applies regardless of where your servers sit.
On r/gdpr, data subjects regularly describe receiving cold emails to personal addresses they never knowingly shared - with senders claiming "it's public anyway" as justification. That defense doesn't hold up. And consent isn't viable either, because you can't get permission from millions of people before scraping their data. That's why legitimate interest is the only realistic option.
What Regulators Actually Say
Three regulators have staked out positions. They don't fully agree, which is part of the problem.
CNIL (France)
CNIL published its latest guidance in January 2026, and it's the most balanced position available. Scraping isn't prohibited per se. Legitimate interest can work, but only with mandatory safeguards: define collection criteria in advance, exclude unnecessary data categories, respect robots.txt and CAPTCHA signals, and delete irrelevant data immediately after collection.
Dutch DPA
The Dutch Authority for Personal Data took the most restrictive stance in May 2024. Their position: commercial interest alone doesn't qualify as legitimate interest. Scraping to create profiles and resell them? Very likely unlawful. The AP also argued that developing generative AI using scraped personal data doesn't qualify as legitimate interest - a stance that, if upheld, would invalidate most AI training pipelines.
They carved out narrow "potentially lawful" examples: monitoring news coverage of your own company, scraping your own webshop reviews, or mapping security risks from public infosec forums. But the European Commission criticized the AP's blanket rejection of commercial interests, and the Dutch Council of State pushed back on parts of their reasoning. This position is contested. It's still the guidance Dutch companies face, though, and that matters if any of your prospects are based in the Netherlands.
UK ICO
The ICO's consultation response frames legitimate interest as the only realistic basis for web scraping to train generative AI models but characterizes large-scale scraping as "high-risk invisible processing." If people don't know you're processing their data, they can't exercise their rights, and the ICO says that tips the balance against you. In their survey, 61% of respondents agreed with this analysis.

Building a scraping pipeline means DPIAs, Article 14 notices, suppression lists, and retention policies - nine ongoing compliance burdens. Prospeo handles all of that for you: GDPR compliant, opt-out enforced globally, DPAs available, with 300M+ profiles verified through a 5-step process and refreshed every 7 days.
Get 98% accurate emails without a single line of scraping code.
Why Most Scrapers Fail the Legitimate Interest Test
Let's be honest: "legitimate interest" isn't a magic phrase you invoke to make compliance problems disappear. It's a structured three-part test, and most scrapers fail because they stop at step one.
1. Purpose test. Is there a genuine, specific interest? "We want leads" is too vague. "We need to identify VP-level buyers at SaaS companies in DACH for our cybersecurity product" is specific enough.
2. Necessity test. Can you achieve the same goal without scraping personal data? The ICO expects you to document why alternative methods aren't suitable. If a compliant data provider already offers what you need, scraping fails this test outright.
3. Balancing test. This is where claims collapse. We've seen B2B teams build elaborate scraping pipelines only to discover that large-scale, invisible collection of contact data - where people never consented and don't know you have their information - is exactly the scenario regulators flag as likely to fail. The EDPB ChatGPT Taskforce work reinforced these safeguards, emphasizing precise collection criteria, exclusion of certain source categories, and deletion or anonymization of personal data before downstream use.
Large-scale commercial scraping typically triggers a Data Protection Impact Assessment. If you haven't done one, you're already out of step with regulator expectations.
GDPR Compliance Checklist for Scraping
If you're committed to scraping, here's the minimum:
- Define collection criteria before you scrape. No "grab everything and filter later."
- Exclude unnecessary data categories - bank details, geolocation, anything beyond what your stated purpose requires.
- Respect robots.txt and CAPTCHA. CNIL treats these as opposition signals. Ignoring them undermines your legitimate interest claim.
- Delete irrelevant data immediately after collection or once identified.
- Comply with Article 14 transparency rules. Inform data subjects within one month of obtaining their data, at first contact, or at first disclosure to a third party - whichever comes first. If individual notice is disproportionate, Article 14(5)(b) allows notice via public means, but you must document why.
- Maintain a suppression/opt-out list. Process opt-out requests and ensure scraped individuals can be removed.
- Complete a DPIA. Document your balancing test, safeguards, and risk mitigation.
- Set retention and deletion policies. Kaspr got hit partly for automatic renewal of storage - don't repeat their mistake.
- Watch for ePrivacy triggers. If your scraping tools mimic browser behavior, you may also trigger ePrivacy/PECR consent requirements, which is a separate legal risk beyond GDPR.
That's nine line items, and each one requires ongoing operational work. Not a weekend project.

Skip the Minefield - Use Compliant Data Instead
Here's our take: if your average contract value is under $50K, building a compliant scraping pipeline will never pay for itself. The compliance overhead of DIY scraping exceeds the cost of a data provider within the first quarter, every time we've seen it attempted.
Building a compliant pipeline means hiring legal counsel for DPIA documentation, engineering Article 14 notice workflows, maintaining suppression lists across every data subject jurisdiction, and running ongoing data refreshes before records go stale. For most B2B teams, that easily runs $50-100K+ in annual overhead before you scrape a single contact. Achieving full web scraping GDPR compliance on your own is a full-time operational commitment, not a one-time setup.
Prospeo takes a different approach: 300M+ profiles, 98% email accuracy, a 7-day data refresh cycle, and GDPR compliance built into the platform with opt-out enforced globally and DPAs available on request. You search by 30+ filters, export verified contacts, and push them to your sequencer. No scraping infrastructure. No compliance department. Starts free, with paid plans from ~$39/mo.
Skip this route if you're scraping non-personal data like product pricing or public government records - GDPR personal data rules won't apply, and you don't need a B2B contact provider for that.

Regulators say scraping fails the necessity test if a compliant alternative exists. Prospeo is that alternative: 143M+ verified emails, 125M+ mobile numbers, 30+ search filters including buyer intent and technographics - all at $0.01 per email with zero GDPR liability on your side.
Pass the legitimate interest test by not needing to take it.
FAQ
Can I scrape publicly available data under GDPR?
No - "public" doesn't exempt you from GDPR. You still need a lawful basis, usually legitimate interest, and must pass the three-part balancing test. Article 14 transparency obligations also apply within one month of collection.
What are the major GDPR fines for scraping?
Clearview AI received fines exceeding EUR 20M from multiple DPAs for scraping facial images at scale. For B2B teams, Kaspr's EUR 200,000 fine in December 2024 is more instructive - the operational orders (data deletion, processing bans) are what actually threaten your go-to-market. The fine is survivable. Losing your database isn't.
Is there a GDPR-compliant alternative to scraping for B2B data?
Yes. Prospeo provides 300M+ verified B2B profiles with GDPR compliance built in - opt-out enforcement, DPAs on request, and a 7-day data refresh cycle. You get the contact data without building scraping infrastructure or navigating the legitimate interest test yourself.
Does the legitimate interest basis work for commercial scraping?
It can, but most commercial scrapers fail the necessity or balancing test. You must document a specific purpose, prove no less-invasive alternative exists, and show that individuals' privacy rights don't outweigh your business interest. The Dutch DPA argues commercial interest alone is insufficient - though that position is contested at the EU level.