2.17 Million Auth/Sec: The Architecture Behind H33's Record

One thousand users authenticated in 116 microseconds. That's high-throughput authentication on a single node. This is the power of H33's batch authentication—enterprise-scale security that fits in your pocket.

Batch Authentication Numbers

1,000 users: 116µs (8.6M/sec)
Per-user overhead: 0.116µs
Efficiency gain: 1,900x vs sequential
Full cryptographic verification: Yes

The Scale Challenge

Modern applications face authentication demands that would have seemed impossible a decade ago:

Social platforms: Billions of daily active users
Gaming: Millions of concurrent players
IoT: Trillions of device authentications
Financial services: Millions of transactions per second

Traditional authentication systems crumble under this load. A typical bcrypt-based flow takes 50-100ms per user. At that rate, authenticating even 10,000 concurrent users requires seconds of wall-clock time and dozens of servers. H33's batch processing was designed from the ground up for this scale—and it does so while maintaining full post-quantum security guarantees.

How Batch Processing Works

Instead of processing authentications one at a time, batch mode groups requests and processes them in parallel:

// Sequential (slow) - DON'T DO THIS
for (const user of thousandUsers) {
  await h33.auth.verify(user);  // 1.36ms each = 220ms total
}

// Batch (fast) - DO THIS
const results = await h33.auth.batchVerify({
  users: thousandUsers,
  mode: 'parallel'
});
// 116µs total for all 1,000 users

The key optimizations that enable this performance:

SIMD vectorization: Process multiple proofs simultaneously using CPU vector instructions
Shared computation: Common cryptographic operations are computed once and reused
Memory locality: Batch data is arranged for optimal cache performance
Parallel verification: Independent verifications run across all available cores

The Cryptographic Pipeline Under the Hood

Each batch authentication in H33 executes a three-stage cryptographic pipeline. Every stage is post-quantum secure, and the entire pipeline completes in a single API call.

Key Insight

H33 packs 32 users into a single BFV ciphertext using SIMD batching. The polynomial ring degree N=4096 provides 4,096 plaintext slots. With 128-dimensional biometric templates, that yields 32 user slots per ciphertext. The result: cryptographic operations that would normally run 32 times execute exactly once.

Stage 1 — FHE Batch Verification. Biometric templates are encrypted under the BFV (Brakerski/Fan-Vercauteren) fully homomorphic encryption scheme with a single 56-bit modulus and plaintext modulus t=65537. The server computes an encrypted inner product between the enrolled template and the candidate—entirely in ciphertext space—using NTT-domain fused multiply-accumulate. Montgomery arithmetic with Harvey lazy reduction eliminates all division from the hot path. On Graviton4, one 32-user batch completes in approximately 1,109 microseconds.

Stage 2 — ZKP Cache Lookup. Rather than regenerating a STARK proof from scratch on every request, H33 uses an in-process DashMap for zero-knowledge proof caching. Each lookup completes in 0.085 microseconds—44 times faster than raw STARK generation—with zero TCP overhead, since the cache lives in the same process as the authentication workers.

Stage 3 — Post-Quantum Attestation. A SHA3-256 digest of the batch result is signed with CRYSTALS-Dilithium (ML-DSA), producing a post-quantum attestation that clients can verify independently. Critically, H33 signs once per batch rather than once per user, reducing Dilithium operations by 31x for a full 32-user batch. The sign-and-verify cycle takes approximately 244 microseconds.

Pipeline Stage	Component	Latency	PQ-Secure
1. FHE Batch	BFV inner product (32 users/CT)	~1,109 µs	Yes (lattice)
2. ZKP	In-process DashMap lookup	0.085 µs	Yes (SHA3-256)
3. Attestation	SHA3 + Dilithium sign+verify	~244 µs	Yes (ML-DSA)
Total (32 users)		~1,356 µs
Per authentication		~42 µs

Scaling Visualization

10 users

12µs

100 users

45µs

1,000 users

116µs

10,000 users

890µs

100,000 users

7.2ms

Production Benchmark: 1.595 Million Auth/sec

On a single AWS c8g.metal-48xl instance (Graviton4, 192 vCPUs), H33 sustains 2,172,518 authentications per second with 96 parallel workers. See our full benchmark results for detailed methodology and hardware configuration. Each worker runs the full FHE + ZKP + Dilithium pipeline on 32-user batches. At approximately $2/hour for spot pricing, that works out to roughly $0.000000001 per authentication—nine zeros after the decimal point.

Several architecture decisions were critical to reaching this number. The system allocator outperformed jemalloc by 8% on Graviton4's flat memory model, where arena bookkeeping in tight FHE loops is pure overhead. NTT twiddle factors are stored in Montgomery form and never leave it, avoiding modular division entirely. And enrolled biometric templates are stored pre-transformed in NTT domain, eliminating a redundant forward NTT on every comparison.

Real-World Use Cases

Massive multiplayer games: Authenticate all players in a game server instance simultaneously when the match starts. A 64-player lobby resolves in two batch operations—under 3 milliseconds total.

Workforce logins: Process morning login waves for entire enterprises in milliseconds. A 50,000-employee organization authenticates its entire workforce in under 35ms.

IoT device fleets: Verify thousands of sensor readings in a single batch operation. Each device gets a Dilithium-attested proof of authentication that remains valid even against quantum adversaries.

Event ticketing: Authenticate concert or sports event attendees as they arrive. A 100,000-seat stadium clears biometric gate checks in 7.2ms of compute time.

Implementation Patterns

// Pattern 1: Streaming batch
const batchProcessor = h33.createBatchProcessor({
  maxBatchSize: 1000,
  maxWaitMs: 10,  // Collect for up to 10ms
});

// Requests are automatically batched
app.post('/verify', async (req, res) => {
  const result = await batchProcessor.add(req.body);
  res.json(result);
});

// Pattern 2: Explicit batching
const pendingRequests = collectRequests(timeWindow);
const results = await h33.auth.batchVerify({
  users: pendingRequests,
  mode: 'parallel',
  continueOnError: true  // Don't fail batch on single error
});

// Pattern 3: Scheduled batch
cron.schedule('*/5 * * * * *', async () => {
  const queuedAuth = await getQueuedAuthRequests();
  if (queuedAuth.length > 0) {
    await h33.auth.batchVerify({ users: queuedAuth });
  }
});

Batch ZK Proofs

Batch processing extends to ZK proof generation as well:

100 proofs: 35ms (73% faster than sequential)
Proof aggregation: Multiple proofs combined into single verification
Recursive composition: Proofs of proofs for unlimited scale

The ZKP cache layer is the enabler here. Once a STARK proof has been generated and verified for a given authentication context, it is stored in a lock-free DashMap keyed by a SHA3-256 digest. Subsequent requests with the same context skip proof generation entirely and return the cached result in under 100 nanoseconds. At 96 workers, this avoids the serialization bottleneck that TCP-based cache proxies introduce—our benchmarks showed a TCP RESP proxy caused an 11x throughput regression from 1.51M to 136K auth/sec.

Cost Efficiency

Batch processing doesn't just improve performance—it reduces costs:

Compute: 1,900x fewer CPU cycles per authentication
Memory: Shared buffers reduce memory allocation—SIMD batching compresses template storage from ~32MB/user to ~256KB/user, a 128x reduction
Network: Single request replaces thousands
Billing: H33 batch operations are priced per batch, not per user

Key Insight

Because the FHE, ZKP, and attestation stages are all lattice-based or hash-based, H33 batch authentication is fully post-quantum secure. There is no RSA or elliptic-curve dependency anywhere in the pipeline. When NIST's post-quantum standards become mandatory, H33 applications require zero migration effort.

Scale to 8.6M Auth/Second

Batch processing is available on all H33 plans. Start with 1,000 free auths.

Get Free API Key

We Authenticated 2.17 Million Users Per Second
Here's the Architecture

Batch Authentication Numbers

The Scale Challenge

How Batch Processing Works

The Cryptographic Pipeline Under the Hood

Scaling Visualization

Production Benchmark: 1.595 Million Auth/sec

Real-World Use Cases

Implementation Patterns

Batch ZK Proofs

Cost Efficiency

Scale to 8.6M Auth/Second

Build With Post-Quantum Security

Batch Authentication Numbers

The Scale Challenge

How Batch Processing Works

The Cryptographic Pipeline Under the Hood

Scaling Visualization

Production Benchmark: 1.595 Million Auth/sec

Real-World Use Cases

Implementation Patterns

Batch ZK Proofs

Cost Efficiency

Scale to 8.6M Auth/Second

Build With Post-Quantum Security

Related Articles