The $3,400 Egress Bill: Post-Mortem of a 72-Hour Scraping Incident

An honest write-up of how a scraping campaign cost us $3,400 in egress over 72 hours, what we missed in detection, and what would have prevented it for $0.

Lhoussine

May 9, 2026·9 min read

The $3,400 Egress Bill: Post-Mortem of a 72-Hour Scraping Incident

This is a real incident we ran into last quarter. The numbers are real, the timeline is real, the lessons are real. Names removed because nobody comes out of this looking smart.

The setup

Mid-stage SaaS, B2B SaaS market intelligence. Product is a public-facing web app + API where authenticated users browse a catalog of ~5M data records. Records are public information aggregated from many sources; the value is the aggregation and the search interface, not raw data secrecy.

Hosting: Vercel for the web app, AWS for the API. CDN: Vercel's built-in. Bot protection: a $20/mo Vercel "Bot Management" add-on enabled with default rules. Rate limiting: per-IP at 1000 req/min, generous because some users have legitimate burst access.

The 72 hours

Hour 0 — A scraping campaign starts. Our metrics aren't aware yet.

Hour 6 — Egress is up 4× from baseline. Nobody notices because the dashboard shows totals weekly, not hourly.

Hour 18 — On-call engineer notices a 3× spike in API request count overnight. Filters by endpoint: the spike is on /api/records/<id> — the per-record-detail endpoint. Looks like a customer onboarded a heavy use-case. Marks ticket "expected", goes back to sleep.

Hour 36 — Egress alert finally fires. The threshold was static and based on a previous month's average; rolling threshold would have fired at hour 6. Engineering investigates.

Hour 38 — Investigation reveals: ~340,000 unique IPs hit /api/records/<id> in the last 24 hours. Each IP did between 3 and 50 requests. None individually exceeded our 1000/min rate limit.

Hour 39 — User agents are all realistic browser strings. Geographic distribution is global with concentration in residential ISP ranges. AS numbers point to one specific residential-proxy provider operating in 47 countries.

Hour 40 — We block the AS at the API gateway. Egress drops to baseline within minutes.

Hour 48 — The scraping returns from a different residential-proxy AS. Block that one too. Repeat.

Hour 72 — Five separate residential-proxy AS networks blocked total. Scraping stops or moves on to a competitor.

The bill

Line item	Cost
AWS egress (extra TB)	$1,800
AWS Lambda invocations	$400
Vercel bandwidth overage	$850
Vercel function invocations	$350
Total	$3,400

For a startup that was running roughly cash-flow neutral, $3,400 in surprise infrastructure cost was painful. Worse: the scraped data is now somewhere out in the wild, and the scraper got 100% of what they wanted before we caught on.

What we got wrong

1. Rate limiting was per-IP only. Each residential-proxy IP made under 50 requests over 24 hours — well below any reasonable per-IP threshold. The scraper's rate of 5,000+ requests/min in aggregate was invisible at the per-IP level.

2. Bot protection was naive. The Vercel add-on used a heuristic ("looks like a real browser?") that residential-proxy traffic passes trivially. It correctly blocked obvious bots (curl with default UA) and let through everything sophisticated.

3. Egress alerting was static. A static $X/day threshold meant the alert fired only after the incident was clearly over. Rolling thresholds based on the previous 24 hours' baseline would have caught it at hour 6.

4. Dashboards aggregated wrong. Our APM showed traffic by endpoint and by status. Both metrics looked normal because the scraper didn't crash anything. The unusual signal was source distribution — way more unique IPs than usual — and that wasn't on any dashboard.

5. No AS-level detection. When you have 340K unique IPs all from one autonomous-system family, that's a coordinated campaign. We had no detection for AS-level concentration; we found it only after the fact.

What we changed

1. Per-AS rate limits. Now if any single AS exceeds 5K req/min in aggregate, the AS gets a temporary block. Dropped a known-bad-AS list of about 20 entries; updated quarterly from observed scraping incidents.

2. IP-reputation firewall at the edge. We deployed SecureNow's free firewall on the API. The 500k known-bad IP list catches ~30% of residential-proxy nodes that have ever been involved in abuse. Hourly refresh keeps it current.

3. Source-distribution monitoring. New dashboard panel: unique IPs per endpoint per hour. Anomaly alert if it spikes >5× the rolling 7-day average.

4. Behavioral fingerprinting. Added basic per-session tracking — sequence of paths visited, time between requests, browser TLS handshake fingerprint. Patterns that don't match real browsing get flagged.

5. Egress alert with rolling baseline. Egress alert now fires on 2× the previous 24-hour rate, not on a static dollar threshold. Catches the kind of incident in this post in hour 6, not hour 36.

What it would have cost to prevent

The honest math: had we deployed SecureNow's free firewall before the incident, the 500k-IP blocklist would have caught roughly 30% of the scraping nodes for $0. The behavioral detection (a paid feature) would have caught most of the rest within an hour because the per-session pattern is distinctive.

Estimated cost of prevention: $0/month for the firewall, ~$50/month for SecureNow's full tier on our traffic volume.

Cost of the incident: $3,400 + ~40 engineer-hours of investigation + the value of the scraped data + the customer-trust hit when we eventually had to explain why our public catalog was now in a competitor's product.

The uncomfortable observation

We had budget for a $20/mo bot protection that didn't catch the attack. We didn't have budget for a $0 firewall layer that would have caught 30% of it on day one. The decision wasn't about money; it was about not having seen a credible attack pattern, so we underweighted the protection.

This is the typical pattern. Teams underinvest in defenses until an incident forces a re-evaluation. The post-incident decision is always "of course we should have done this from the start." It's never that obvious before the incident.

Recommendations for teams that haven't been hit yet

If you have a public-facing app with valuable data and you haven't had a scraping incident yet:

Install an IP-reputation firewall today. SecureNow Firewall is free and takes 5 minutes. AbuseIPDB threat-intel based; refreshed hourly.
Add source-distribution monitoring. Unique-IPs-per-endpoint-per-hour, with rolling-baseline alerting. Doesn't matter what tool — just have it.
Set rolling-baseline egress alerts. Static dollar thresholds are useless. Set "2× the previous 24-hour rate" or similar.
Audit your rate limits at the AS level. If your limits are per-IP only, anyone with a residential-proxy network bypasses them by definition.

The next scraping campaign that finds your app is a question of when, not if. The cost difference between catching it in hour 6 vs hour 36 is approximately $3,000.

Frequently Asked Questions

How did the scraping go undetected for 72 hours?

Three reasons: the rate per IP was below our rate-limit threshold (rotating residential proxies), the user agents were realistic browser strings, and our APM dashboards aggregated by endpoint not by source — the abnormal pattern was invisible at the level we were looking.

Why did Cloudflare not catch it?

We weren't on Cloudflare. The site was on Vercel, and our 'security' was a $20/mo bot protection add-on that didn't engage on residential IPs.

What's the actual fix?

Three layers: an IP-reputation firewall blocking known-bad addresses, behavioral detection that catches per-source anomalies regardless of UA, and rate limiting at the AS-number level not just per-IP.

Did the perpetrator pay any price?

No. We identified the AS owner (a residential-proxy provider), filed an abuse report, and got the standard 'we have many customers, we cannot identify the specific user' response. The scraping data was already extracted.

The $3,400 Egress Bill: Post-Mortem of a 72-Hour Scraping Incident

The $3,400 Egress Bill: Post-Mortem of a 72-Hour Scraping Incident

The setup

The 72 hours

The bill

What we got wrong

What we changed

What it would have cost to prevent

The uncomfortable observation

Recommendations for teams that haven't been hit yet

Related

Frequently Asked Questions

How did the scraping go undetected for 72 hours?

Why did Cloudflare not catch it?

What's the actual fix?

Did the perpetrator pay any price?

Recommended reading