Skip to main content
Back to Blog
Engineering

The Bug That Taught Me More Than Any Course Ever Did

February 1, 20269 min read
DebuggingStripeWebhooksPostmortemLessons Learned

The Bug That Taught Me More Than Any Course Ever Did

I want to tell you about a bug. Not a fun one. Not a clever one. The kind that makes your stomach drop when you get the Slack notification at 11pm on a Thursday.

What Happened

I was building subscription billing for Nexural. Stripe webhook comes in — `invoice.payment_succeeded`. My handler updates the user's subscription status to "active" and extends their access period.

Simple. Tested it manually. Worked perfectly. Deployed it.

Three weeks later, a customer emails: "I was charged twice."

Then another. Then two more.

The Root Cause

Stripe retries failed webhook deliveries. My endpoint was returning a 500 on about 1 in 50 requests — a transient database connection timeout. So Stripe would retry. My handler would run again. And because I wasn't checking whether I'd already processed that specific event, it would extend the subscription again and Stripe would create another invoice line item.

The fix was embarrassingly simple:

```typescript // Before: no idempotency check async function handlePaymentSucceeded(event) { const invoice = event.data.object; await db.subscriptions.update({ where: { stripeCustomerId: invoice.customer }, data: { status: 'active', periodEnd: new Date(invoice.period_end * 1000) } }); }

// After: idempotent async function handlePaymentSucceeded(event) { const processed = await db.webhookEvents.findUnique({ where: { eventId: event.id } }); if (processed) return; // Already handled this exact event

await db.$transaction([ db.subscriptions.update({ where: { stripeCustomerId: invoice.customer }, data: { status: 'active', periodEnd: new Date(invoice.period_end * 1000) } }), db.webhookEvents.create({ data: { eventId: event.id, type: event.type, processedAt: new Date() } }) ]); } ```

Eight lines of code. That's all it took to prevent the bug. But I didn't write them because I'd never been bitten by webhook retries before.

What I Actually Learned

1. Manual testing doesn't catch timing bugs.

I tested the webhook flow 20+ times before deploying. It worked every time. Because my local Supabase instance never had connection timeouts. The bug only manifested under real-world network conditions with real Stripe retry behavior.

2. Idempotency isn't optional for financial operations.

I knew the word. I'd read about it. I even had it in my interview talking points. But I didn't implement it because the code "worked." The difference between knowing a concept and internalizing it is getting burned by it.

3. The scariest bugs are the ones that work most of the time.

If this bug had failed 100% of the time, I would have caught it in development. It failed 2% of the time. That's the worst kind — rare enough to slip through testing, common enough to hurt real people.

4. Webhooks are distributed systems problems.

A webhook handler isn't a simple HTTP endpoint. It's a message consumer in a distributed system with at-least-once delivery semantics. The moment I started thinking about it that way, the idempotency requirement became obvious.

What Changed in My Process

After this incident, I implemented three things:

A webhook event log table. Every Stripe event gets recorded before processing. If the event ID already exists, skip it. This is now in every project I build.

A billing test script. I wrote a script that simulates 50 rapid-fire webhook deliveries of the same event. If the system processes it more than once, the test fails. This runs in CI.

A "payment bugs" checklist. Before any billing code ships, I check: idempotency, proration handling, dunning behavior, refund edge cases, currency rounding. It's 12 items. Takes 10 minutes to review. Would have caught this bug.

The Uncomfortable Truth

Every senior engineer has a bug like this. The ones who are actually senior will tell you about it. The ones who pretend they've never shipped broken code are either lying or haven't built anything that matters.

I refunded the 4 customers immediately, explained what happened, and gave them a free month. Three of them are still subscribers. The fourth one would have churned anyway.

The bug cost me maybe $200 in refunds. The lesson was worth infinitely more.

Want to see this in action?

Check out the projects and case studies behind these articles.