One thousand users authenticated in 116 microseconds. That's high-throughput authentication on a single node. This is the power of H33's batch authentication—enterprise-scale security that fits in your pocket.
Batch Authentication Numbers
1,000 users: 116µs (8.6M/sec)
Per-user overhead: 0.116µs
Efficiency gain: 1,900x vs sequential
Full cryptographic verification: Yes
The Scale Challenge
Modern applications face authentication demands that would have seemed impossible a decade ago:
- Social platforms: Billions of daily active users
- Gaming: Millions of concurrent players
- IoT: Trillions of device authentications
- Financial services: Millions of transactions per second
Traditional authentication systems crumble under this load. A typical bcrypt-based flow takes 50-100ms per user. At that rate, authenticating even 10,000 concurrent users requires seconds of wall-clock time and dozens of servers. H33's batch processing was designed from the ground up for this scale—and it does so while maintaining full post-quantum security guarantees.
How Batch Processing Works
Instead of processing authentications one at a time, batch mode groups requests and processes them in parallel:
// Sequential (slow) - DON'T DO THIS
for (const user of thousandUsers) {
await h33.auth.verify(user); // 1.36ms each = 220ms total
}
// Batch (fast) - DO THIS
const results = await h33.auth.batchVerify({
users: thousandUsers,
mode: 'parallel'
});
// 116µs total for all 1,000 usersThe key optimizations that enable this performance:
- SIMD vectorization: Process multiple proofs simultaneously using CPU vector instructions
- Shared computation: Common cryptographic operations are computed once and reused
- Memory locality: Batch data is arranged for optimal cache performance
- Parallel verification: Independent verifications run across all available cores
The Cryptographic Pipeline Under the Hood
Each batch authentication in H33 executes a three-stage cryptographic pipeline. Every stage is post-quantum secure, and the entire pipeline completes in a single API call.
H33 packs 32 users into a single BFV ciphertext using SIMD batching. The polynomial ring degree N=4096 provides 4,096 plaintext slots. With 128-dimensional biometric templates, that yields 32 user slots per ciphertext. The result: cryptographic operations that would normally run 32 times execute exactly once.
Stage 1 — FHE Batch Verification. Biometric templates are encrypted under the BFV (Brakerski/Fan-Vercauteren) fully homomorphic encryption scheme with a single 56-bit modulus and plaintext modulus t=65537. The server computes an encrypted inner product between the enrolled template and the candidate—entirely in ciphertext space—using NTT-domain fused multiply-accumulate. Montgomery arithmetic with Harvey lazy reduction eliminates all division from the hot path. On Graviton4, one 32-user batch completes in approximately 1,109 microseconds.
Stage 2 — ZKP Cache Lookup. Rather than regenerating a STARK proof from scratch on every request, H33 uses an in-process DashMap for zero-knowledge proof caching. Each lookup completes in 0.085 microseconds—44 times faster than raw STARK generation—with zero TCP overhead, since the cache lives in the same process as the authentication workers.
Stage 3 — Post-Quantum Attestation. A SHA3-256 digest of the batch result is signed with CRYSTALS-Dilithium (ML-DSA), producing a post-quantum attestation that clients can verify independently. Critically, H33 signs once per batch rather than once per user, reducing Dilithium operations by 31x for a full 32-user batch. The sign-and-verify cycle takes approximately 244 microseconds.
| Pipeline Stage | Component | Latency | PQ-Secure |
|---|---|---|---|
| 1. FHE Batch | BFV inner product (32 users/CT) | ~1,109 µs | Yes (lattice) |
| 2. ZKP | In-process DashMap lookup | 0.085 µs | Yes (SHA3-256) |
| 3. Attestation | SHA3 + Dilithium sign+verify | ~244 µs | Yes (ML-DSA) |
| Total (32 users) | ~1,356 µs | ||
| Per authentication | ~42 µs |
Scaling Visualization
Production Benchmark: 1.595 Million Auth/sec
On a single AWS c8g.metal-48xl instance (Graviton4, 192 vCPUs), H33 sustains 2,172,518 authentications per second with 96 parallel workers. See our full benchmark results for detailed methodology and hardware configuration. Each worker runs the full FHE + ZKP + Dilithium pipeline on 32-user batches. At approximately $2/hour for spot pricing, that works out to roughly $0.000000001 per authentication—nine zeros after the decimal point.
Several architecture decisions were critical to reaching this number. The system allocator outperformed jemalloc by 8% on Graviton4's flat memory model, where arena bookkeeping in tight FHE loops is pure overhead. NTT twiddle factors are stored in Montgomery form and never leave it, avoiding modular division entirely. And enrolled biometric templates are stored pre-transformed in NTT domain, eliminating a redundant forward NTT on every comparison.
Real-World Use Cases
Massive multiplayer games: Authenticate all players in a game server instance simultaneously when the match starts. A 64-player lobby resolves in two batch operations—under 3 milliseconds total.
Workforce logins: Process morning login waves for entire enterprises in milliseconds. A 50,000-employee organization authenticates its entire workforce in under 35ms.
IoT device fleets: Verify thousands of sensor readings in a single batch operation. Each device gets a Dilithium-attested proof of authentication that remains valid even against quantum adversaries.
Event ticketing: Authenticate concert or sports event attendees as they arrive. A 100,000-seat stadium clears biometric gate checks in 7.2ms of compute time.
Implementation Patterns
// Pattern 1: Streaming batch
const batchProcessor = h33.createBatchProcessor({
maxBatchSize: 1000,
maxWaitMs: 10, // Collect for up to 10ms
});
// Requests are automatically batched
app.post('/verify', async (req, res) => {
const result = await batchProcessor.add(req.body);
res.json(result);
});
// Pattern 2: Explicit batching
const pendingRequests = collectRequests(timeWindow);
const results = await h33.auth.batchVerify({
users: pendingRequests,
mode: 'parallel',
continueOnError: true // Don't fail batch on single error
});
// Pattern 3: Scheduled batch
cron.schedule('*/5 * * * * *', async () => {
const queuedAuth = await getQueuedAuthRequests();
if (queuedAuth.length > 0) {
await h33.auth.batchVerify({ users: queuedAuth });
}
});Batch ZK Proofs
Batch processing extends to ZK proof generation as well:
- 100 proofs: 35ms (73% faster than sequential)
- Proof aggregation: Multiple proofs combined into single verification
- Recursive composition: Proofs of proofs for unlimited scale
The ZKP cache layer is the enabler here. Once a STARK proof has been generated and verified for a given authentication context, it is stored in a lock-free DashMap keyed by a SHA3-256 digest. Subsequent requests with the same context skip proof generation entirely and return the cached result in under 100 nanoseconds. At 96 workers, this avoids the serialization bottleneck that TCP-based cache proxies introduce—our benchmarks showed a TCP RESP proxy caused an 11x throughput regression from 1.51M to 136K auth/sec.
Cost Efficiency
Batch processing doesn't just improve performance—it reduces costs:
- Compute: 1,900x fewer CPU cycles per authentication
- Memory: Shared buffers reduce memory allocation—SIMD batching compresses template storage from ~32MB/user to ~256KB/user, a 128x reduction
- Network: Single request replaces thousands
- Billing: H33 batch operations are priced per batch, not per user
Because the FHE, ZKP, and attestation stages are all lattice-based or hash-based, H33 batch authentication is fully post-quantum secure. There is no RSA or elliptic-curve dependency anywhere in the pipeline. When NIST's post-quantum standards become mandatory, H33 applications require zero migration effort.
Scale to 8.6M Auth/Second
Batch processing is available on all H33 plans. Start with 1,000 free auths.
Get Free API Key