From 24b5f9ea3b2488202bdb435fcd5acf21c54009cb Mon Sep 17 00:00:00 2001 From: Alex Chi Z Date: Tue, 30 Jan 2024 16:58:50 +0800 Subject: [PATCH] more questions Signed-off-by: Alex Chi Z --- mini-lsm-book/src/week1-05-read-path.md | 1 + mini-lsm-book/src/week2-03-tiered.md | 1 + mini-lsm-book/src/week2-05-manifest.md | 1 + 3 files changed, 3 insertions(+) diff --git a/mini-lsm-book/src/week1-05-read-path.md b/mini-lsm-book/src/week1-05-read-path.md index 3c9c6cc..dc5d719 100644 --- a/mini-lsm-book/src/week1-05-read-path.md +++ b/mini-lsm-book/src/week1-05-read-path.md @@ -84,5 +84,6 @@ We do not provide reference answers to the questions, and feel free to discuss a ## Bonus Tasks * **The Cost of Dynamic Dispatch.** Implement a `Box` version of merge iterators and benchmark to see the performance differences. +* **Parallel Seek.** Creating a merge iterator requires loading the first block of all underlying SSTs (when you create `SSTIterator`). You may parallelize the process of creating iterators. {{#include copyright.md}} diff --git a/mini-lsm-book/src/week2-03-tiered.md b/mini-lsm-book/src/week2-03-tiered.md index 68f506f..98fe2b8 100644 --- a/mini-lsm-book/src/week2-03-tiered.md +++ b/mini-lsm-book/src/week2-03-tiered.md @@ -130,6 +130,7 @@ As tiered compaction does not use the L0 level of the LSM state, you should dire * What happens if compaction speed cannot keep up with the SST flushes? * What might needs to be considered if the system schedules multiple compaction tasks in parallel? * SSDs also write its own logs (basically it is a log-structured storage). If the SSD has a write amplification of 2x, what is the end-to-end write amplification of the whole system? Related: [ZNS: Avoiding the Block Interface Tax for Flash-based SSDs](https://www.usenix.org/conference/atc21/presentation/bjorling). +* Consider the case that the user chooses to keep a large number of sorted runs (i.e., 300) for tiered compaction. To make the read path faster, is it a good idea to keep some data structure that helps reduce the time complexity (i.e., to `O(log n)`) of finding SSTs to read in each layer for some key ranges? Note that normally, you will need to do a binary search in each sorted run to find the key ranges that you will need to read. (Check out Neon's [layer map](https://neon.tech/blog/persistent-structures-in-neons-wal-indexing) implementation!) We do not provide reference answers to the questions, and feel free to discuss about them in the Discord community. diff --git a/mini-lsm-book/src/week2-05-manifest.md b/mini-lsm-book/src/week2-05-manifest.md index c4d898c..96c18cd 100644 --- a/mini-lsm-book/src/week2-05-manifest.md +++ b/mini-lsm-book/src/week2-05-manifest.md @@ -87,5 +87,6 @@ get 1500 ## Bonus Tasks * **Manifest Compaction.** When the number of logs in the manifest file gets too large, you can rewrite the manifest file to only store the current snapshot and append new logs to that file. +* **Parallel Open.** After you collect the list of SSTs to open, you can open and decode them in parallel, instead of doing it one by one, therefore accelerating the recovery process. {{#include copyright.md}}