diff --git a/mini-lsm-book/src/week2-01-compaction.md b/mini-lsm-book/src/week2-01-compaction.md index 7e0b2e5..97c34a9 100644 --- a/mini-lsm-book/src/week2-01-compaction.md +++ b/mini-lsm-book/src/week2-01-compaction.md @@ -8,6 +8,14 @@ In this chapter, you will: * Implement the logic to update the LSM states and manage SST files on the filesystem. * Update LSM read path to incorporate the LSM levels. +## Task 1: Compaction Implementation + +## Task 2: Update the LSM State + +## Task 3: Concat Iterator + +## Task 4: Integrate with the Read Path + ## Test Your Understanding * What are the definitions of read/write/space amplifications? (This is covered in the overview chapter) diff --git a/mini-lsm-book/src/week2-02-simple.md b/mini-lsm-book/src/week2-02-simple.md index a48e1f5..7abc93a 100644 --- a/mini-lsm-book/src/week2-02-simple.md +++ b/mini-lsm-book/src/week2-02-simple.md @@ -7,6 +7,12 @@ In this chapter, you will: * Implement a simple leveled compaction strategy and simulate it on the compaction simulator. * Start compaction as a background task and implement a compaction trigger in the system. +## Task 1: Simple Level Compaction + +## Task 2: Compaction Simulation + +## Task 3: Integrate with the Read Path + ## Test Your Understanding * Is it correct that a key will only be purged from the LSM tree if the user requests to delete it and it has been compacted in the bottom-most level? diff --git a/mini-lsm-book/src/week2-03-tiered.md b/mini-lsm-book/src/week2-03-tiered.md index 35561d2..bfbbfb4 100644 --- a/mini-lsm-book/src/week2-03-tiered.md +++ b/mini-lsm-book/src/week2-03-tiered.md @@ -9,6 +9,12 @@ In this chapter, you will: The tiered compaction we talk about in this chapter is the same as RocksDB's universal compaction. We will use these two terminologies interchangeably. +## Task 1: Universal Compaction + +## Task 2: Compaction Simulation + +## Task 3: Integrate with the Read Path + ## Test Your Understanding * What are the pros/cons of universal compaction compared with simple leveled/tiered compaction? diff --git a/mini-lsm-book/src/week2-04-leveled.md b/mini-lsm-book/src/week2-04-leveled.md index df6e107..dbc56a7 100644 --- a/mini-lsm-book/src/week2-04-leveled.md +++ b/mini-lsm-book/src/week2-04-leveled.md @@ -7,6 +7,12 @@ In this chapter, you will: * Implement a leveled compaction strategy and simulate it on the compaction simulator. * Incorporate leveled compaction strategy into the system. +## Task 1: Leveled Compaction + +## Task 2: Compaction Simulation + +## Task 3: Integrate with the Read Path + ## Test Your Understanding * Finding a good key split point for compaction may potentially reduce the write amplification, or it does not matter at all? diff --git a/mini-lsm-book/src/week2-05-manifest.md b/mini-lsm-book/src/week2-05-manifest.md index 84274ad..cabb4a4 100644 --- a/mini-lsm-book/src/week2-05-manifest.md +++ b/mini-lsm-book/src/week2-05-manifest.md @@ -7,4 +7,10 @@ In this chapter, you will: * Implement encoding and decoding of the manifest file. * Recover from the manifest when the system restarts. +## Task 1: Manifest Encoding + +## Task 2: Write Manifests + +## Task 3: Recover from the State + {{#include copyright.md}} diff --git a/mini-lsm-book/src/week2-06-wal.md b/mini-lsm-book/src/week2-06-wal.md index 9b2e5ac..ecc4c59 100644 --- a/mini-lsm-book/src/week2-06-wal.md +++ b/mini-lsm-book/src/week2-06-wal.md @@ -7,6 +7,12 @@ In this chapter, you will: * Implement encoding and decoding of the write-ahead log file. * Recover memtables from the WALs when the system restarts. +## Task 1: WAL Encoding + +## Task 2: Write WALs + +## Task 3: Recover from the WALs + ## Test Your Understanding * When can you tell the user that their modifications (put/delete) have been persisted? diff --git a/mini-lsm-book/src/week2-07-snacks.md b/mini-lsm-book/src/week2-07-snacks.md index ba2f246..7cd7b06 100644 --- a/mini-lsm-book/src/week2-07-snacks.md +++ b/mini-lsm-book/src/week2-07-snacks.md @@ -9,6 +9,16 @@ In this chapter, you will: * Implement the batch write interface. * Add checksums to the blocks, SST metadata, manifest, and WALs. +## Task 1: Write Batch Interface + +## Task 2: Block Checksum + +## Task 3: SST Checksum + +## Task 4: WAL Checksum + +## Task 5: Manifest Checksum + ## Test Your Understanding * Consider the case that an LSM storage engine only provides `write_batch` as the write interface (instead of single put + delete). Is it possible to implement it as follows: there is a single write thread with an mpsc channel receiver to get the changes, and all threads send write batches to the write thread. The write thread is the single point to write to the database. What are the pros/cons of this implementation? (Congrats if you do so you get BadgerDB!) diff --git a/mini-lsm-book/src/week2-overview.md b/mini-lsm-book/src/week2-overview.md index 522a0cf..953366a 100644 --- a/mini-lsm-book/src/week2-overview.md +++ b/mini-lsm-book/src/week2-overview.md @@ -39,7 +39,7 @@ SST 6: key range 06000 - key 10010, 1000 keys The 3 new SSTs are created by merging SST 1, 2, and 3. We can get a sorted 3000 keys and then split them into 3 files, so as to avoid having a super large SST file. Now our LSM state has 3 non-overlapping SSTs, and we only need to access SST 4 to find key 02333. -## Two Extremes and Write Amplification +## Two Extremes of Compaction and Write Amplification So from the above example, we have 2 naive ways of handling the LSM structure -- not doing compactions at all, and always do full compaction when new SSTs are flushed. @@ -59,7 +59,7 @@ Compaction strategies usually aim to control the number of sorted runs, so as to In leveled compaction, the user can specify a maximum number of levels, which is the number of sorted runs in the system (except L0). For example, RocksDB usually keeps 6 levels (sorted runs) in leveled compaction mode. During the compaction process, SSTs from two adjacent levels will be merged and then the produced SSTs will be put to the lower level of the two levels. The sorted runs (levels) grow exponentially in size -- the lower level will be < some number x > of the upper level in size. -In tiered compaction, the engine will dynamically adjust the number of sorted runs by merging them to minimize write amplification. The number of sorted runs can be high if the compaction strategy does not choose to merge them, therefore making read amplification high. In this tutorial, we will implement RocksDB's universal compaction, which is a kind of tiered compaction strategy. +In tiered compaction, the engine will dynamically adjust the number of sorted runs by merging them or letting new SSTs flushed as new sorted run (a tier) to minimize write amplification. The number of tiers can be high if the compaction strategy does not choose to merge tiers, therefore making read amplification high. In this tutorial, we will implement RocksDB's universal compaction, which is a kind of tiered compaction strategy. ## Space Amplification