Rate Limiting in Next.js That Actually Works

Edge middleware + Vercel KV gives you proper rate limiting without an extra service. Here's the production setup with the gotchas Vercel docs don't cover.

Lhoussine
May 9, 2026·6 min read

Rate Limiting in Next.js That Actually Works

The Next.js docs and Vercel docs both have rate-limiting examples. Both have subtle issues that work in development and break in production. Here's the version that actually works on Vercel and self-host alike.

The setup

npm install @upstash/ratelimit @upstash/redis
# or, if you're on Vercel and prefer KV:
npm install @vercel/kv

The @upstash/ratelimit package handles the bucket math correctly (sliding window or token bucket) and works in Edge runtime. Building it yourself is doable but rarely worth it.

The middleware

// middleware.ts
import { NextResponse } from 'next/server';
import type { NextRequest } from 'next/server';
import { Ratelimit } from '@upstash/ratelimit';
import { Redis } from '@upstash/redis';

const redis = Redis.fromEnv();

const generalLimit = new Ratelimit({
  redis,
  limiter: Ratelimit.slidingWindow(100, '1 m'),
  analytics: true,
  prefix: '@securenow/general',
});

const authLimit = new Ratelimit({
  redis,
  limiter: Ratelimit.slidingWindow(5, '5 m'),
  analytics: true,
  prefix: '@securenow/auth',
});

export async function middleware(request: NextRequest) {
  const ip = request.headers.get('x-forwarded-for')?.split(',')[0]?.trim() || 'unknown';
  const path = request.nextUrl.pathname;

  const limit = path.startsWith('/api/auth') ? authLimit : generalLimit;
  const { success, limit: max, remaining, reset } = await limit.limit(ip);

  if (!success) {
    return new NextResponse(
      JSON.stringify({ error: 'Rate limit exceeded' }),
      {
        status: 429,
        headers: {
          'Content-Type': 'application/json',
          'X-RateLimit-Limit': String(max),
          'X-RateLimit-Remaining': String(remaining),
          'X-RateLimit-Reset': String(reset),
        },
      }
    );
  }

  return NextResponse.next();
}

export const config = {
  matcher: ['/api/:path*', '/login'],
};

The two limiters at different windows match the typical needs: general API at 100/min, auth at 5 per 5 min.

What's actually different from the naive examples

The matcher. Most examples use the default matcher, which catches every request including static assets — wasted KV lookups, wasted edge invocations on Vercel. Scope to API and auth routes.

Sliding window. Ratelimit.slidingWindow is the right choice for most cases. The fixed-window alternative has burstiness issues (allows 2x requests at the boundary).

The Redis instance. Redis.fromEnv() reads UPSTASH_REDIS_REST_URL and UPSTASH_REDIS_REST_TOKEN. Set these in Vercel env vars or your hosting provider.

The prefix. Different limiters need different prefixes or they share the bucket. Easy to miss; broken in subtle ways if missed.

Per-user rate limiting

For authenticated routes, key by user ID:

import { getToken } from 'next-auth/jwt';

export async function middleware(request: NextRequest) {
  const token = await getToken({ req: request });
  const key = token?.sub || request.headers.get('x-forwarded-for')?.split(',')[0]?.trim() || 'unknown';
  // ...rest of the rate limiting
}

This keeps a corporate office network NAT'd to one IP from sharing a single rate-limit bucket.

Costs

Upstash free tier: 10K commands/day. Each rate limit check is ~3 commands (read, increment, expire). So free tier covers about 3.3K requests/day. Paid tier is $0.20/100K commands.

For most Next.js apps the free tier is enough for development; production runs on the $5–$30/month paid tier. Vercel KV pricing is similar — generous free tier, scales with usage.

What this doesn't solve

Same as Express:

  • Distributed attacks (rotating IPs)
  • Bot traffic that mimics legitimate users
  • Application-level abuse (cart cycling, coupon stacking)

For those, layer in IP-reputation firewall (the free SecureNow Firewall on Node runtime) and behavioral detection.

Production checklist

  • ✓ Sliding window, not fixed window
  • ✓ Per-route prefixes, not one bucket for everything
  • ✓ Different limits for general vs auth
  • ✓ Per-user keys when authenticated
  • ✓ Matcher scoped to API routes only
  • X-RateLimit-* response headers (good API hygiene)
  • ✓ Test with synthetic burst traffic before deploying

Related

Frequently Asked Questions

Does in-memory rate limiting work on Vercel?

No. Edge functions and Serverless functions on Vercel run on different instances per request. State doesn't persist. Use a shared store — Vercel KV, Upstash, or another HTTP-accessible Redis.

What's cheaper, KV or Upstash?

Upstash is generally cheaper for high request volumes (free tier covers 10K commands/day). Vercel KV is more convenient if you're already on Vercel — same dashboard, one bill.

Should I use Edge or Node runtime for rate limiting?

Edge for the fast-path block; Node only if you need libraries that don't work at Edge. The simpler your rate-limiting middleware, the better — Edge runtime restrictions force minimalism, which is good.

What about per-user rate limiting?

Use the user ID from your session/JWT as the key instead of IP. For unauthenticated traffic fall back to IP.

Recommended reading

Adding Backend Tracing to a Sentry Stack with OpenTelemetry

If your team uses Sentry for frontend errors and needs backend distributed tracing without doubling the Sentry bill, here's the OpenTelemetry path that doesn't make you choose.

May 9
How to Block Bot Traffic in Express With No Extra Infra

Five approaches to bot blocking in Express, ranked by effort vs. effectiveness. From a 5-line allowlist to a full IP-reputation firewall — all without Cloudflare, AWS WAF, or any new infrastructure.

May 9
How to Block Bot Traffic in Fastify With No Extra Infra

Fastify hooks (onRequest) and the SecureNow preload both work cleanly. Here's the production setup for IP blocking and user-agent filtering.

May 9