Rate Limiting: The Feature Nobody Thinks About Until It's Too Late

Nobody puts "implement rate limiting" on the sprint board. It's not a user story. It doesn't move a metric. Product never asks for it.

Then one day, someone scripts 50,000 requests to your API in 30 seconds and your database melts. Or worse — a single user's runaway script costs you $800 in AWS Lambda invocations overnight.

Both of these happened to me. Now rate limiting is in my starter template.

The Three Layers

I implement rate limiting at three layers, because each catches different abuse patterns:

Layer 1: Edge (CloudFront / Vercel)

``` Rate: 100 requests per 5 minutes per IP Purpose: Stop brute force and scrapers before they hit your app Cost: $0 (included in CloudFront / Vercel) ```

This is your first defense. It runs at the CDN edge, so abusive traffic never reaches your server. The rate is generous enough that no real user hits it, but tight enough to stop automated abuse.

Layer 2: Application (per-user)

```typescript // Simple in-memory rate limiter const limits = new Map<string, number[]>();

function rateLimit(userId: string, windowMs = 60000, max = 30): boolean { const now = Date.now(); const hits = (limits.get(userId) || []).filter(t => now - t < windowMs); hits.push(now); limits.set(userId, hits); return hits.length > max; } ```

30 requests per minute per authenticated user. This prevents a legitimate user's runaway script from monopolizing your API.

Layer 3: Endpoint-specific

Not all endpoints are equal. A search endpoint can handle 100 req/min. A payment endpoint should allow 5 req/min (nobody legitimately submits 10 payments per minute).

```typescript const endpointLimits: Record<string, { window: number; max: number }> = { '/api/search': { window: 60000, max: 100 }, '/api/contact': { window: 60000, max: 5 }, '/api/subscribe': { window: 3600000, max: 3 }, // 3 per hour '/api/export': { window: 3600000, max: 10 }, }; ```

What to Return

When a user is rate limited, return a 429 with helpful headers:

```typescript return new Response(JSON.stringify({ error: 'Too many requests', retryAfter: Math.ceil(retryAfterMs / 1000) }), { status: 429, headers: { 'Retry-After': String(Math.ceil(retryAfterMs / 1000)), 'X-RateLimit-Limit': String(max), 'X-RateLimit-Remaining': String(Math.max(0, max - hits.length)), 'X-RateLimit-Reset': String(Math.ceil((now + windowMs) / 1000)), } }); ```

The `Retry-After` header tells well-behaved clients when to try again. The X-RateLimit headers let them track their own usage. This is the difference between "your API is broken" and "I need to slow down."

The Real-World Test

For my portfolio's WAF rate limiting, I wrote an attack simulation:

```javascript // waf-attack-sim.mjs const url = 'https://api.sageideas.dev/metrics/latest'; const results = { success: 0, limited: 0 };

for (let i = 0; i < 200; i++) { const res = await fetch(url); if (res.status === 429) results.limited++; else results.success++; // No delay — intentionally aggressive }

console.log(results); // Expected: ~100 success, ~100 limited ```

I run this quarterly. It proves the rate limiting actually works. The evidence is in my artifacts library — real 429 responses from a real attack simulation.

When Rate Limiting Isn't Enough

Rate limiting is table stakes. For real protection, you also need:

Request signing for webhooks (verify the sender)
CAPTCHA for public forms (stop bots)
API keys for programmatic access (identity + rate limit per key)
Cost alerts in AWS (catch runaway Lambda invocations)

Rate limiting is like a lock on your door. It keeps honest people honest. For dedicated attackers, you need more.

Rate Limiting: The Feature Nobody Thinks About Until It's Too Late

Rate Limiting: The Feature Nobody Thinks About Until It's Too Late

The Three Layers

Layer 1: Edge (CloudFront / Vercel)

Layer 2: Application (per-user)

Layer 3: Endpoint-specific

What to Return

The Real-World Test

When Rate Limiting Isn't Enough

Want to see this in action?