Home / Research / LSM Trees

LSM trees, without the hidden reads.

A deferred-update algorithm that drops the hidden read penalty from secondary index maintenance, delivering up to 10x speedups on write-heavy workloads.

At a glance

VenueIEEE DSIT2024
Speedupup to 10xwrite-heavy workloads
TargetsRocksDB · CassandraLevelDB · MyRocks · Tarantool
MethodDeferred updatecompaction-phase reconciliation
Bottleneck killedHidden readsecondary index maintenance
Author positionSole1/1

01The hidden cost nobody talks about

Log-structured merge trees rule modern databases. RocksDB, Cassandra, BigTable, HBase, MyRocks, Tarantool, all of them lean on the LSM contract: random writes are cheap because we batch and compact. SSDs sealed the deal a decade ago.

But there is a load-bearing exception. The instant a table grows a secondary index, every update has to fish the old tuple out of the primary index, extract the old secondary keys, and explicitly delete them from each secondary tree. Those are hidden reads, reads triggered by writes, and they quietly demolish the LSM tree write advantage. In a B-tree they are nearly free; in an LSM tree they are a tax on the operation that LSM was supposed to be cheap at.

02The deferred-update idea

The contribution is conceptually small. Do not pay the hidden read at write time. Insert the new tuple, log the intent, and let the compaction pass reconcile old and new on its own schedule, when the relevant SSTables are already being touched anyway.

The trick is making this safe under all the operations that traditionally relied on the synchronous read: unique constraints, foreign keys, secondary index lookups during reads. The paper works through each, shows where extra metadata is needed, and where the existing LSM compaction strategies already give you the bookkeeping for free.

"On the workloads where LSM trees were supposed to win and do not, deferred updates close the gap. On the workloads where they already win, deferred updates widen it."

03What the numbers actually say

Theoretical analysis plus empirical evaluation across realistic mixes give the headline number: up to a tenfold acceleration on write-intensive workloads with multiple secondary indexes. The bigger story is that the speedup scales with index count, exactly the regime where classic LSM tuning was struggling hardest.

04Why this matters beyond databases

The deferred-update pattern generalises. Anywhere you have a write path that is secretly a read-modify-write, and a periodic background pass that already touches the relevant data, you can usually defer the read. The paper makes this concrete for LSM secondary indexes, but the same logic applies to inverted-index maintenance, tiered caches, and several blockchain state-sync designs I have been looking at since.

FAQWhat people ask me about this paper

Q1How does this differ from existing batched-update strategies in RocksDB?
Production batched-update tuning still pays the hidden read; it just amortises it across multiple writes. Deferred updates remove the read from the write path entirely, pushing the work into compaction where it costs roughly nothing extra.
Q2Does this break read-after-write consistency?
No. Reads still see the latest version; the algorithm is careful to surface in-flight deferred updates from the memtable before they reach compaction. The cost is a small bookkeeping overhead in memory.
Q3What is the worst case?
Read-heavy workloads with frequent unique-constraint checks against rarely-updated keys. The deferred path adds a small constant per check. The break-even is well below typical OLTP write fractions.
Q4Is there a reference implementation?
A research-grade prototype exists; production integration into RocksDB-style engines is the natural next step. Reach me at s.ali.badami@gmail.com for the prototype.
Q5How does this connect to the rest of my work?
It is the same instinct as the bootstrap and quantum-kernel papers: find the place where conventional wisdom is paying a hidden cost, then redesign the contract instead of micro-optimising around it.

CITEHow to cite this paper

@inproceedings{badami2024lsm,
  author    = {Shujaatali Badami},
  title     = {Optimizing {LSM} Tree Operations with Deferred Updates:
               A Comparative Study},
  booktitle = {2024 IEEE International Conference on Data Science
               and Information Technology (DSIT)},
  year      = {2024},
  publisher = {IEEE},
  doi       = {10.1109/DSIT61374.2024.10881309}
}
S. Badami, "Optimizing LSM Tree Operations with Deferred Updates: A Comparative Study," in 2024 IEEE International Conference on Data Science and Information Technology (DSIT), 2024, doi: 10.1109/DSIT61374.2024.10881309.
Badami, S. (2024). Optimizing LSM tree operations with deferred updates: A comparative study. In 2024 IEEE International Conference on Data Science and Information Technology (DSIT). IEEE. https://doi.org/10.1109/DSIT61374.2024.10881309
TY  - CONF
AU  - Badami, Shujaatali
TI  - Optimizing LSM Tree Operations with Deferred Updates: A Comparative Study
T2  - 2024 IEEE International Conference on Data Science and Information Technology (DSIT)
PB  - IEEE
PY  - 2024
DO  - 10.1109/DSIT61374.2024.10881309
ER  -

SEE ALSORelated work in this portfolio