Docs Rate limits

Rate limits

UCFP enforces three independent budgets: anonymous demo, authenticated per-minute, authenticated per-day. Hitting any of them returns 429 Too Many Requests with explanatory headers.

Anonymous demo

Budget	Value	Scope
Requests per minute	60	per IP address
Daily quota	none	—
Body size cap	64 KiB text / 4 MiB image / 8 MiB audio	per request

Hosted demo callers also need to clear a Cloudflare Turnstile challenge on first contact in a session. The Turnstile token is cached for 30 minutes; subsequent calls in the same session skip the challenge.

The 60 / minute counter resets each rolling minute. Reaching it returns 429 with Retry-After: <seconds-until-next-window>.

Authenticated (default for new keys)

Budget	Value	Scope
Requests per minute	600	per key
Daily quota	50 000	per key
Body size cap	32 MiB	per request

Both budgets refresh independently. Hitting the per-minute budget delays you by ≤ 60 s; hitting the daily quota requires waiting until the next UTC midnight (or upgrading the key from the dashboard).

You can raise both numbers per-key in Dashboard → Keys → Edit. Hard upper bounds today: 6 000 / minute, 5 000 000 / day. Need more? Open an issue.

Header semantics

Every authenticated response carries:

Header	Meaning
`X-RateLimit-Limit`	The bucket size for the budget that's closest to being hit.
`X-RateLimit-Remaining`	Calls left in that bucket before `429`.
`X-RateLimit-Reset`	Unix epoch seconds when the bucket refills (per-minute) or rolls over (daily).

When the response is itself a 429, you also get:

Header	Meaning
`Retry-After`	Wall-clock seconds until the soonest acceptable retry. Standard HTTP semantic — equivalent to RFC 9110 § 10.2.3.

Backoff strategy: trust Retry-After. Do not exponentially back off on 429 — the server already knows the next available slot and tells you.

What counts as one call

Exactly one inbound HTTP request — one POST to /v1/ingest/…, one GET to /v1/records/…, one streaming connection (for the duration the body is open). Streaming counts as one call regardless of how many subfingerprints the server emits.

/api/fingerprint (the SvelteKit proxy) counts on the SvelteKit side as one call and on the Rust upstream side as one. If you hit the proxy, you spend from your key budget once — the service-bearer call to the Rust upstream is not metered against you.

Burst behaviour

The per-minute bucket is implemented as a token bucket: 600 / 60 = 10 tokens / second refill, 600 capacity. So a burst of up to 600 in the first second is allowed, then refill takes over. This matches a "smooth average of 10 / s with reasonable burst" intuition.

The daily quota is a hard counter; no burst window — once you spent 50 000, you wait for UTC midnight.

Cost classes

In v1 every algorithm costs 1 unit. Future versions may charge semantic-* more — the response will include a units field once that lands. Plan ahead by reading units if present.

Self-hosted

The Rust binary defaults to NoopRateLimiter — no limits, all callers share the single UCFP_TOKEN. Set UCFP_RATELIMIT_URL=… to plug in the webhook-based limiter, or rebuild with --features multi-tenant and use InMemoryTokenBucket. See the Rust crate's RATELIMIT.md for the full matrix.