Ever built something knowing that if it goes down, you'll be trending on Twitter? And not in a good way?
That was us. Building an audience registration form for a massive celebrity event in India. The brief was deceptively simple—a form. Name, phone, email, city. That's it.
The catch? The link drops on social media. The celebrity's audience does the rest. We're talking 10 million+ people hitting this page in the first 10 minutes. The kind of traffic most startups don't see in their entire lifetime.
One shot. No retries. Page goes down = memes.
So...how do you even begin building for this?
The "Aha" Moment: Do Less, Faster
The first instinct when someone says "10 million requests" is to reach for the big guns. Kubernetes. Service mesh. Event sourcing. The works.
We didn't do any of that.
Instead, we asked one simple question: when a user clicks Submit, what is the absolute minimum that needs to happen?
The answer? Accept the data. Say "got it." That's literally it.
You don't need to validate the captcha, check for duplicates, write to the database, or send a confirmation—not while the user is staring at a loading spinner on their phone in a Mumbai local. All of that can happen 5 seconds later and nobody would know the difference.
So we split the system into two parts:
User clicks Submit
↓
Cloud Run (ingress) → accepts JSON, yeets it to Pub/Sub → returns 202 in ~100ms
↓ (async, user is already scrolling Instagram)
Cloud Run (consumer) → validates, dedupes, writes to Postgres, archives to GCS
User gets 202 Accepted in under 200 milliseconds. Sees the success screen. Goes back to their life. Meanwhile, the heavy lifting happens in the background where nobody's waiting and nobody cares about cold starts.
This is the async write path pattern. Probably the single most important decision we made.
Why This Doesn't Fall Apart
- Pub/Sub has unlimited throughput. You don't provision capacity. Google just...handles it. We could send 100k messages per second and it wouldn't flinch.
- Cloud Run scales to 1000 instances. Auto-scales based on incoming traffic. We pre-warmed 2 instances and let it rip.
- The consumer processes at its own pace. Falls behind? No problem. Pub/Sub holds messages for 7 days. The queue IS the buffer.
- Cold starts are irrelevant. Ingress is always warm (min 2 instances). Consumer can cold-start all day—nobody's waiting on it.
Simple innit?
One HTML File to Rule Them All
The landing page is a React app compiled into a single HTML file. One. File. JS, CSS, fonts—everything inlined into one index.html using vite-plugin-singlefile.
We host it on Cloudflare CDN. Not on our servers. Not on GCP. Cloudflare serves it from 300+ edge locations. When those 10 million people hit the page at launch, they're hitting Cloudflare's cache, not our infrastructure.
Cloudflare CDN (static page) ← 99% of traffic goes here
GCP HTTPS LB (/api/*) ← only form submissions touch GCP
The landing page literally cannot go down from traffic. Cloudflare was built for this. We were just along for the ride.
Our GCP load balancer only handles one thing—/api/submit. That's it. Everything else is Cloudflare's problem and honestly? I'm okay with that.
Postgres. Yes, Boring Old Postgres.
We went with Cloud SQL (Postgres) instead of something fancier like Firestore or DynamoDB.
Why? Because SQL is SQL. Everyone knows it. Nobody needs to learn a new query language at 2 AM when things are on fire. Cloud SQL with SSD and autoresize handles the write volume. We needed composite indexes for the admin dashboard—filter by state, city, gender, date range—and SQL was literally made for this.
The consumer does:
INSERT INTO submissions (...) VALUES (...)
ON CONFLICT (dedupe_hash) DO NOTHING
Same person registers twice? Database handles it. No race conditions, no distributed locks, no application-level dedup logic. Postgres just goes "yeah I've seen this one" and moves on. Beautiful.
We also slapped on a read replica for the admin dashboard so dashboard queries never compete with production writes. The admin team can slice-and-dice data all they want without affecting the main write path.
Now the Fun Part: Making Sure Nobody Breaks It
Here's the thing about a form that gets shared millions of times on WhatsApp and Instagram—you're basically painting a target on your back. Script kiddies, bots, bored teenagers with Postman, the works.
We went a little paranoid. Six layers of paranoid.
Layer 1: Cloud Armor (The Bouncer)
Cloud Armor sits in front of the load balancer. Every single request gets inspected before it touches our code.
- Rate limiting: 50 requests per 5 minutes per IP. Hit the limit? Banned for 10 minutes. Go take a walk.
- OWASP WAF rules: SQL injection, XSS, scanner detection, LFI, RFI, RCE, protocol attacks, session fixation—all blocked at the edge.
- Method enforcement: Only
POSTallowed on/api/submit. Try aGET? 403. Try aDELETE? Believe it or not, also 403. - Body size limit: Anything over 8 KB gets rejected. Our valid payload is ~500 bytes. If you're sending 8 KB of "name," something's wrong with you.
- Geo-restriction: Flip a switch, block everything outside your target geography. No reason to accept traffic from a botnet in Romania when your audience is domestic.
- Layer 7 DDoS: Google's adaptive protection that learns your traffic patterns and auto-blocks anomalies. Basically a bouncer that gets smarter over time.
All of this happens before a single byte reaches Cloud Run. It's free filtering. The good stuff doesn't even know the bad stuff existed.
Layer 2: Server-Side Validation (Trust Nobody)
"But the client already validated the form!"
Bruh. The client is just a suggestion. Anyone with curl can send whatever they want to your endpoint.
// Server validates: required fields, types, email regex,
// mobile format, gender enum, DOB parse
const result = validate(req.body)
if (!result.valid) {
res.status(400).json({ error: result.error })
return
}
We also verify the reCAPTCHA token server-side. Client gets a token from Google, sends it along, and the ingress calls Google's API to get a bot score. Below 0.5? Flagged. Not rejected—we still store it, but marked for review. We'd rather have a few false positives than reject a legitimate fan on a slow phone.
Layer 3: Client-Side Tricks (Cheap and Surprisingly Effective)
These run in the user's browser. They're free and they catch the lazy bots:
- Honeypot field: A hidden input called "website" that's invisible to humans. Bots see a form field, bots fill it. If it has a value—we show a fake success screen and never actually submit. The bot thinks it won. It doesn't adapt.
- Timing gate: We record when the form loads. If someone "fills" 7 fields in under 3 seconds... that's not a human. That's a script. Fake success, no submission.
- Submit button lock: Disabled immediately on click, shows "Submitting..."—prevents the classic panicked double-click.
- Idempotency key: Each form session gets a
crypto.randomUUID(). Same submission lands twice? Server dedupes on it.
The key insight here: bots that fail these checks get a 200 success response. They think they won. They don't learn. This is way better than returning a 403 that tells them exactly what to fix.
Layer 4: Data Protection (Because PII Is Not a Joke)
- IP hashing at ingress: We hash the IP with SHA-256 before publishing to Pub/Sub. The raw IP never enters the message queue, never touches the database, never hits GCS. It dies at the front door.
- Dedupe hash:
SHA-256(mobile | email)—databaseUNIQUEconstraint catches duplicates silently. - Parameterized queries: Every database query uses
$1, $2, $3parameters. SQL injection is impossible even if Cloud Armor somehow misses it. Defense in depth, baby. - Least-privilege IAM: Ingress can only publish to Pub/Sub. Consumer can only write to SQL and GCS. Neither service has permissions it doesn't need. If one is compromised, the blast radius is tiny.
Layer 5: Headers and Transport (The Stuff Nobody Sees)
- HTTPS everywhere: HTTP → HTTPS redirect with 301. No exceptions.
- HSTS:
max-age=31536000; includeSubDomains; preload—browser remembers to always use HTTPS. Forever. - CSP: Content Security Policy that only allows scripts from our domain, reCAPTCHA, and GTM. Some random
<script>injection? Blocked. - X-Frame-Options: DENY: Nobody can embed our page in an iframe. No clickjacking today.
- CORS lockdown: Ingress only accepts requests from our domain. Any other origin? Silent drop.
Unglamorous? Yes. Important? Extremely.
Layer 6: Monitoring (Because You Can't Fix What You Can't See)
We set up alerts for the things that would wake you up at 3 AM:
- Pub/Sub backlog > 5 minutes → consumer falling behind
- DLQ receiving messages → submissions permanently failing
- Ingress or Consumer 5xx → something is broken
- Cloud Armor block spike > 100/min → attack in progress
- IAM policy changes → someone's messing with permissions
- Secret Manager access anomaly → secrets being accessed weirdly
- Cloud SQL data egress > 100 MB/min → possible data exfiltration
Any of these fires within 60 seconds. We sleep better knowing this.
The Whole Stack, Visualized
Request from user's phone
│
├── Cloudflare (DDoS, edge cache, static HTML)
│
├── Cloud Armor (WAF, rate limit, geo-block, body size)
│
├── HTTPS LB (HSTS, CSP, X-Frame-Options, CORS headers)
│
├── Cloud Run Ingress (validation, reCAPTCHA, IP hash)
│
├── Pub/Sub (buffer, 7-day retention, DLQ)
│
├── Cloud Run Consumer (dedupe, parameterized SQL, flagging)
│
└── Cloud SQL (UNIQUE constraint, CHECK constraints, least-privilege user)
Seven layers. Each catches what the previous one missed. The cost of each check is negligible. The cumulative effect is a system that is really annoying to abuse.
What I'd Do Differently
No system is perfect. Here's what I'd change in hindsight:
- Cloudflare Turnstile over reCAPTCHA: Free, privacy-friendly, no checkboxes. reCAPTCHA v3 works but Google's pricing gets weird at scale.
- Request signing: HMAC the payload so the server can verify it came from our frontend, not someone's Postman collection. Cloud Armor catches most of this, but belt and suspenders.
- Canary deploys: We deployed the consumer straight to prod. Should've done 5% canary first. We got lucky. Don't get lucky.
- Load testing a week earlier: We built a k6 stress test with 8 attack scenarios—legitimate users, double submitters, bot spam, rate limit hammering, duplicate floods, payload abuse, idempotency replay, slow clients. Should've run it against staging a week before launch, not the night before.
The Numbers
Launch day:
| Metric | Value |
|---|---|
| Peak ingestion rate | ~17,000 req/s |
| p95 user-facing latency | < 200ms |
| Cloud Run instances | 2 → 400+ in 30 seconds |
| Pub/Sub message delivery | 100% (zero loss) |
| Cloud Armor bot blocks | Thousands, silently |
| Database throttling | Zero |
| Downtime | Zero |
| Infra cost for launch day | < $50 |
Fifty dollars. For 10 million submissions. Most of that was Cloud Run compute. Pub/Sub and Cloud Armor are essentially free at this scale.
Key Takeaways
If you're building something that needs to handle massive, bursty traffic:
- Decouple the write path. Accept fast, process later. Your user doesn't need to wait for your database to ack.
- CDN everything static. Your infrastructure should only handle the dynamic parts. Let Cloudflare take the punches.
- Security is layers, not a product. WAF at the edge, validation on the server, constraints in the database. Each layer is a safety net for the one above.
- Monitor the things that scare you. Set alerts for the stuff that would ruin your evening.
- Bots are dumb—exploit that. Honeypots and timing gates catch 90% of automated abuse for free. Fake success responses mean they never learn.
The boring architecture wins. No Kubernetes. No service mesh. No event sourcing. Just a load balancer, two containers, a queue, and a database.
Simple enough to fit in your head. Robust enough to handle millions. Boring enough to let you sleep at night.
Built with GCP Cloud Run, Pub/Sub, Cloud SQL, Cloud Armor, and Cloudflare. Frontend is React + Vite compiled to a single HTML file. Infra is Terraform. Total codebase including infrastructure: ~2,000 lines.