LSM trees, without the hidden reads.
A deferred-update algorithm that drops the hidden read penalty from secondary index maintenance, delivering up to 10x speedups on write-heavy workloads.
At a glance
01The hidden cost nobody talks about
Log-structured merge trees rule modern databases. RocksDB, Cassandra, BigTable, HBase, MyRocks, Tarantool, all of them lean on the LSM contract: random writes are cheap because we batch and compact. SSDs sealed the deal a decade ago.
But there is a load-bearing exception. The instant a table grows a secondary index, every update has to fish the old tuple out of the primary index, extract the old secondary keys, and explicitly delete them from each secondary tree. Those are hidden reads, reads triggered by writes, and they quietly demolish the LSM tree write advantage. In a B-tree they are nearly free; in an LSM tree they are a tax on the operation that LSM was supposed to be cheap at.
02The deferred-update idea
The contribution is conceptually small. Do not pay the hidden read at write time. Insert the new tuple, log the intent, and let the compaction pass reconcile old and new on its own schedule, when the relevant SSTables are already being touched anyway.
The trick is making this safe under all the operations that traditionally relied on the synchronous read: unique constraints, foreign keys, secondary index lookups during reads. The paper works through each, shows where extra metadata is needed, and where the existing LSM compaction strategies already give you the bookkeeping for free.
"On the workloads where LSM trees were supposed to win and do not, deferred updates close the gap. On the workloads where they already win, deferred updates widen it."
03What the numbers actually say
Theoretical analysis plus empirical evaluation across realistic mixes give the headline number: up to a tenfold acceleration on write-intensive workloads with multiple secondary indexes. The bigger story is that the speedup scales with index count, exactly the regime where classic LSM tuning was struggling hardest.
04Why this matters beyond databases
The deferred-update pattern generalises. Anywhere you have a write path that is secretly a read-modify-write, and a periodic background pass that already touches the relevant data, you can usually defer the read. The paper makes this concrete for LSM secondary indexes, but the same logic applies to inverted-index maintenance, tiered caches, and several blockchain state-sync designs I have been looking at since.
FAQWhat people ask me about this paper
Q1How does this differ from existing batched-update strategies in RocksDB?
Q2Does this break read-after-write consistency?
Q3What is the worst case?
Q4Is there a reference implementation?
Q5How does this connect to the rest of my work?
CITEHow to cite this paper
@inproceedings{badami2024lsm,
author = {Shujaatali Badami},
title = {Optimizing {LSM} Tree Operations with Deferred Updates:
A Comparative Study},
booktitle = {2024 IEEE International Conference on Data Science
and Information Technology (DSIT)},
year = {2024},
publisher = {IEEE},
doi = {10.1109/DSIT61374.2024.10881309}
}S. Badami, "Optimizing LSM Tree Operations with Deferred Updates: A Comparative Study," in 2024 IEEE International Conference on Data Science and Information Technology (DSIT), 2024, doi: 10.1109/DSIT61374.2024.10881309.
Badami, S. (2024). Optimizing LSM tree operations with deferred updates: A comparative study. In 2024 IEEE International Conference on Data Science and Information Technology (DSIT). IEEE. https://doi.org/10.1109/DSIT61374.2024.10881309
TY - CONF AU - Badami, Shujaatali TI - Optimizing LSM Tree Operations with Deferred Updates: A Comparative Study T2 - 2024 IEEE International Conference on Data Science and Information Technology (DSIT) PB - IEEE PY - 2024 DO - 10.1109/DSIT61374.2024.10881309 ER -