At 9:00:00 AM, a surge of traffic hit. Every user, in every time zone, suddenly demanded the same piece of data: the flash sale metadata for item ID #42.
KQR’s cache logic looked like this (pseudocode): kqr row cache contention check gets
— KQR’s row cache for item:42 expired. 9:00:02 — 10,000 concurrent GET requests arrived simultaneously. At 9:00:00 AM, a surge of traffic hit
— KQR had a little-known diagnostic command: CACHE GETS (total): 10,000 CACHE HITS: 0 CACHE
From that day on, KQR’s monitoring dashboard had a new rule: If row cache contention check gets > 1000 per second — flip on single-flight mode. And the team learned a valuable lesson: sometimes, the most dangerous lock isn’t in your database — it’s in your cache’s eagerness to help .
CACHE GETS (total): 10,000 CACHE HITS: 0 CACHE MISSES: 10,000 MISSES WHILE LOCK HELD: 10,000 CONTENTION RATIO: 1.00 TOP CONTENDED ROW: item:42 WAITING THREADS: 9,999 LOCK HOLD TIME (avg): 487ms This was a contention storm . The first thread to acquire the cache lock went to the database (487ms). The other 9,999 threads didn’t just wait — they spun, retried, and choked the CPU.
At 9:00:00 AM, a surge of traffic hit. Every user, in every time zone, suddenly demanded the same piece of data: the flash sale metadata for item ID #42.
KQR’s cache logic looked like this (pseudocode):
— KQR’s row cache for item:42 expired. 9:00:02 — 10,000 concurrent GET requests arrived simultaneously.
— KQR had a little-known diagnostic command:
From that day on, KQR’s monitoring dashboard had a new rule: If row cache contention check gets > 1000 per second — flip on single-flight mode. And the team learned a valuable lesson: sometimes, the most dangerous lock isn’t in your database — it’s in your cache’s eagerness to help .
CACHE GETS (total): 10,000 CACHE HITS: 0 CACHE MISSES: 10,000 MISSES WHILE LOCK HELD: 10,000 CONTENTION RATIO: 1.00 TOP CONTENDED ROW: item:42 WAITING THREADS: 9,999 LOCK HOLD TIME (avg): 487ms This was a contention storm . The first thread to acquire the cache lock went to the database (487ms). The other 9,999 threads didn’t just wait — they spun, retried, and choked the CPU.