Rate limits
UCFP enforces three independent budgets: anonymous demo, authenticated per-minute, authenticated per-day. Hitting any of them returns 429 Too Many Requests with explanatory headers.
Anonymous demo
| Budget | Value | Scope |
|---|---|---|
| Requests per minute | 60 | per IP address |
| Daily quota | none | — |
| Body size cap | 64 KiB text / 4 MiB image / 8 MiB audio | per request |
Hosted demo callers also need to clear a Cloudflare Turnstile challenge on first contact in a session. The Turnstile token is cached for 30 minutes; subsequent calls in the same session skip the challenge.
The 60 / minute counter resets each rolling minute. Reaching it returns 429 with Retry-After: <seconds-until-next-window>.
Authenticated (default for new keys)
| Budget | Value | Scope |
|---|---|---|
| Requests per minute | 600 | per key |
| Daily quota | 50 000 | per key |
| Body size cap | 32 MiB | per request |
Both budgets refresh independently. Hitting the per-minute budget delays you by ≤ 60 s; hitting the daily quota requires waiting until the next UTC midnight (or upgrading the key from the dashboard).
You can raise both numbers per-key in Dashboard → Keys → Edit. Hard upper bounds today: 6 000 / minute, 5 000 000 / day. Need more? Open an issue.
Header semantics
Every authenticated response carries:
| Header | Meaning |
|---|---|
X-RateLimit-Limit |
The bucket size for the budget that's closest to being hit. |
X-RateLimit-Remaining |
Calls left in that bucket before 429. |
X-RateLimit-Reset |
Unix epoch seconds when the bucket refills (per-minute) or rolls over (daily). |
When the response is itself a 429, you also get:
| Header | Meaning |
|---|---|
Retry-After |
Wall-clock seconds until the soonest acceptable retry. Standard HTTP semantic — equivalent to RFC 9110 § 10.2.3. |
Backoff strategy: trust Retry-After. Do not exponentially back off on 429 — the server already knows the next available slot and tells you.
What counts as one call
Exactly one inbound HTTP request — one POST to /v1/ingest/…, one GET to /v1/records/…, one streaming connection (for the duration the body is open). Streaming counts as one call regardless of how many subfingerprints the server emits.
/api/fingerprint (the SvelteKit proxy) counts on the SvelteKit side as one call and on the Rust upstream side as one. If you hit the proxy, you spend from your key budget once — the service-bearer call to the Rust upstream is not metered against you.
Burst behaviour
The per-minute bucket is implemented as a token bucket: 600 / 60 = 10 tokens / second refill, 600 capacity. So a burst of up to 600 in the first second is allowed, then refill takes over. This matches a "smooth average of 10 / s with reasonable burst" intuition.
The daily quota is a hard counter; no burst window — once you spent 50 000, you wait for UTC midnight.
Cost classes
In v1 every algorithm costs 1 unit. Future versions may charge semantic-* more — the response will include a units field once that lands. Plan ahead by reading units if present.
Self-hosted
The Rust binary defaults to NoopRateLimiter — no limits, all callers share the single UCFP_TOKEN. Set UCFP_RATELIMIT_URL=… to plug in the webhook-based limiter, or rebuild with --features multi-tenant and use InMemoryTokenBucket. See the Rust crate's RATELIMIT.md for the full matrix.