Back to Blog
Engineering October 22, 2025 · 10 min read

How We Built Webhook Retries That Actually Work

Exponential backoff, deduplication, idempotency keys, and the edge cases that break most webhook engines. Lessons from production.

Webhooks are deceptively simple in concept — "just POST some JSON to a URL when something happens." In practice, building a reliable webhook delivery system is one of the hardest parts of payment infrastructure.

Why webhooks matter more in Africa

In markets with synchronous checkout flows (card payments via Stripe), webhooks are a nice-to-have confirmation. Your server already knows the payment succeeded because the checkout session returned a success status.

With USSD-based payments, webhooks aren't optional — they're the primary notification mechanism. When a customer initiates a Telebirr payment, your server sends a push request and gets back an acknowledgment that the push was sent. The actual payment completion comes minutes later, via webhook.

If that webhook doesn't arrive, your customer paid but your system doesn't know it. That's a support ticket, a refund request, or a lost customer.

The retry schedule

Our webhook engine uses a configurable exponential backoff schedule. The default is 5 attempts:

  1. Immediate — first delivery attempt
  2. 30 seconds — quick retry for transient network issues
  3. 5 minutes — gives the receiving server time to recover from a brief outage
  4. 1 hour — handles longer outages
  5. 24 hours — final attempt, catches multi-hour downtime

Each project can customize this schedule via the dashboard or API. Some merchants want aggressive retries (every 30 seconds for 5 minutes). Others want a longer tail (retry over 72 hours).

Deduplication

Network issues can cause the same webhook to be delivered twice. Your server crashed after receiving the webhook but before sending the 200 response. Our engine retries. Now the merchant's system processes the payment twice.

Every Zirzir webhook includes an idempotency_key in the payload. Merchants should use this key to deduplicate on their end. But we also deduplicate on our end — if we detect that a previous delivery attempt received a 200, we won't retry.

Signature verification

Every webhook payload is signed with HMAC-SHA256 using the project's webhook secret. The signature is included in the X-Zirzir-Signature header.

typescript

const isValid = verifyWebhook( req.body, req.headers['x-zirzir-signature'], process.env.WEBHOOK_SECRET ); ```

All three SDKs (TypeScript, Python, Go) include webhook verification utilities. Never process a webhook without verifying its signature — this prevents replay attacks and payload tampering.

Delivery logs

Every delivery attempt is logged with the full request/response details: HTTP status code, response body (first 1KB), latency, and any connection errors. These logs are available in the dashboard and via API.

When a merchant reports a missing webhook, you can immediately see whether the webhook was sent, how many attempts were made, and what response each attempt received. No more guessing.

Manual replay

Sometimes webhooks fail for reasons outside the retry window — the merchant's server was down for a week, or they had a bug that returned 500 for a specific event type. The dashboard provides a manual replay button that re-delivers any webhook with a single click.

The edge cases

The hardest problems weren't the obvious ones. They were edge cases like:

  • Rapid status changes. A payment goes from pending to completed to refunded within seconds. Each status change triggers a webhook. If the first webhook is still being retried when the third fires, the merchant might receive them out of order.
  • Webhook URL changes. A merchant updates their webhook URL while deliveries are in-flight. Do pending retries go to the old URL or the new one?
  • Payload size. Some transaction metadata can be large. We cap webhook payloads at 64KB and include a truncated flag if the full payload exceeds this limit.

Each of these required careful thought and testing against real production scenarios.

Lessons learned

  1. Log everything. Webhook delivery is the most common source of support requests. Having detailed delivery logs saves hours of debugging.
  2. Make retries configurable. Different merchants have different uptime profiles and different tolerance for delayed notifications.
  3. Sign everything. Webhook security isn't optional in payments.
  4. Assume out-of-order delivery. Design your webhook consumers to handle events arriving in any order.