| @@ -57,8 +57,8 @@ We are working on a new version of the mini-lsm tutorial that is split into 3 we | |||||||
| | 1.7            | Bloom Filter and Key Compression                | ✅        | ✅            | ✅       | | | 1.7            | Bloom Filter and Key Compression                | ✅        | ✅            | ✅       | | ||||||
| | 2.1            | Compaction Implementation                       | ✅        | ✅            | ✅       | | | 2.1            | Compaction Implementation                       | ✅        | ✅            | ✅       | | ||||||
| | 2.2            | Compaction Strategy - Simple                    | ✅        | ✅            | ✅       | | | 2.2            | Compaction Strategy - Simple                    | ✅        | ✅            | ✅       | | ||||||
| | 2.3            | Compaction Strategy - Tiered                    | ✅        | ✅            | 🚧       | | | 2.3            | Compaction Strategy - Tiered                    | ✅        | ✅            | ✅       | | ||||||
| | 2.4            | Compaction Strategy - Leveled                   | ✅        | ✅            | 🚧       | | | 2.4            | Compaction Strategy - Leveled                   | ✅        | ✅            | ✅       | | ||||||
| | 2.5            | Manifest                                        | ✅        | 🚧            | 🚧       | | | 2.5            | Manifest                                        | ✅        | 🚧            | 🚧       | | ||||||
| | 2.6            | Write-Ahead Log                                 | ✅        | 🚧            | 🚧       | | | 2.6            | Write-Ahead Log                                 | ✅        | 🚧            | 🚧       | | ||||||
| | 2.7            | Batch Write + Checksum                          |          |              |         | | | 2.7            | Batch Write + Checksum                          |          |              |         | | ||||||
|   | |||||||
| @@ -100,6 +100,8 @@ The simulator will flush an L0 SST into the LSM state, run your compaction contr | |||||||
|  |  | ||||||
| In your compaction implementation, you should reduce the number of active iterators (i.e., use concat iterator) as much as possible. Also, remember that merge order matters, and you will need to ensure that the iterators you create produces key-value pairs in the correct order, when multiple versions of a key appear. | In your compaction implementation, you should reduce the number of active iterators (i.e., use concat iterator) as much as possible. Also, remember that merge order matters, and you will need to ensure that the iterators you create produces key-value pairs in the correct order, when multiple versions of a key appear. | ||||||
|  |  | ||||||
|  | Also, note that some parameters in the implementation is 0-based, and some of them are 1-based. Be careful when you use the `level` as an index in a vector. | ||||||
|  |  | ||||||
| **Note: we do not provide fine-grained unit tests for this part. You can run the compaction simulator and compare with the output of the reference solution to see if your implementation is correct.** | **Note: we do not provide fine-grained unit tests for this part. You can run the compaction simulator and compare with the output of the reference solution to see if your implementation is correct.** | ||||||
|  |  | ||||||
| ## Task 2: Compaction Thread | ## Task 2: Compaction Thread | ||||||
|   | |||||||
| @@ -13,22 +13,108 @@ The tiered compaction we talk about in this chapter is the same as RocksDB's uni | |||||||
|  |  | ||||||
| In this chapter, you will implement RocksDB's universal compaction, which is of the tiered compaction family compaction strategies. Similar to the simple leveled compaction strategy, we only use number of files as the indicator in this compaction strategy. And when we trigger the compaction jobs, we always include a full sorted run (tier) in the compaction job. | In this chapter, you will implement RocksDB's universal compaction, which is of the tiered compaction family compaction strategies. Similar to the simple leveled compaction strategy, we only use number of files as the indicator in this compaction strategy. And when we trigger the compaction jobs, we always include a full sorted run (tier) in the compaction job. | ||||||
|  |  | ||||||
|  | ### Task 1.0: Precondition | ||||||
|  |  | ||||||
|  | In this task, you will need to modify: | ||||||
|  |  | ||||||
|  | ``` | ||||||
|  | src/compact/tiered.rs | ||||||
|  | ``` | ||||||
|  |  | ||||||
|  | In universal compaction, we do not use L0 SSTs in the LSM state. Instead, we directly flush new SSTs to a single sorted run (called tier). In the LSM state, `levels` will now include all tiers, where the lowest index is the latest SST flushed. The compaction simulator generates tier id based on the first SST id, and you should do the same in your implementation. | ||||||
|  |  | ||||||
|  | Universal compaction will only trigger tasks when the number of tiers (sorted runs) is larger than `num_tiers`. Otherwise, it does not trigger any compaction. | ||||||
|  |  | ||||||
| ### Task 1.1: Triggered by Space Amplification Ratio | ### Task 1.1: Triggered by Space Amplification Ratio | ||||||
|  |  | ||||||
|  | The first trigger of universal compaction is by space amplification ratio. As we discussed in the overview chapter, space amplification can be estimated by `engine_size / last_level_size`. In our implementation, we compute the space amplification ratio by `all levels except last level size / last level size`, so that the ratio can be scaled to `[0, +inf)` instead of `[1, +inf]`. This is also consistent with the RocksDB implementation. | ||||||
|  |  | ||||||
|  | When `all levels except last level size / last level size` >= `max_size_amplification_percent * 100%`, we will need to trigger a full compaction. | ||||||
|  |  | ||||||
|  | After you implement this trigger, you can run the compaction simulator. You will see: | ||||||
|  |  | ||||||
|  | ```shell | ||||||
|  | cargo run --bin compaction-simulator tiered | ||||||
|  | ``` | ||||||
|  |  | ||||||
|  | ``` | ||||||
|  | --- After Flush --- | ||||||
|  | L3 (1): [3] | ||||||
|  | L2 (1): [2] | ||||||
|  | L1 (1): [1] | ||||||
|  | --- Compaction Task --- | ||||||
|  | compaction triggered by space amplification ratio: 200 | ||||||
|  | L3 [3] L2 [2] L1 [1] -> [4, 5, 6] | ||||||
|  | --- After Compaction --- | ||||||
|  | L4 (3): [3, 2, 1] | ||||||
|  | ``` | ||||||
|  |  | ||||||
|  | With this trigger, we will only trigger full compaction when it reaches the space amplification ratio. And at the end of the simulation, you will see: | ||||||
|  |  | ||||||
|  | ``` | ||||||
|  | --- After Flush --- | ||||||
|  | L73 (1): [73] | ||||||
|  | L72 (1): [72] | ||||||
|  | L71 (1): [71] | ||||||
|  | L70 (1): [70] | ||||||
|  | L69 (1): [69] | ||||||
|  | L68 (1): [68] | ||||||
|  | L67 (1): [67] | ||||||
|  | L40 (27): [39, 38, 37, 36, 35, 34, 33, 32, 31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 13, 14, 15, 16, 17, 18, 19, 20, 21] | ||||||
|  | ``` | ||||||
|  |  | ||||||
|  | The `num_iters` in the compaction simulator is set to 3. However, there are far more than 3 iters in the LSM state, which incurs large read amplification. | ||||||
|  |  | ||||||
|  | The current trigger only reduces space amplification. We will need to add new triggers to the compaction algorithm to reduce read amplification. | ||||||
|  |  | ||||||
| ### Task 1.2: Triggered by Size Ratio | ### Task 1.2: Triggered by Size Ratio | ||||||
|  |  | ||||||
|  | The next trigger is the size ratio trigger. For all tiers, if there is a tier `n` that `size of all previous tiers / this tier >= (1 + size_ratio) * 100%`, we will compact all `n` tiers. We only do this compaction with there are more than `min_merge_width` tiers to be merged. | ||||||
|  |  | ||||||
|  | With this trigger, you will observe the following in the compaction simulator: | ||||||
|  |  | ||||||
|  | ``` | ||||||
|  | L207 (1): [207] | ||||||
|  | L204 (3): [203, 202, 201] | ||||||
|  | L186 (15): [185, 178, 179, 180, 181, 182, 183, 184, 158, 159, 160, 161, 162, 163, 164] | ||||||
|  | L114 (31): [113, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56] | ||||||
|  | ``` | ||||||
|  |  | ||||||
|  | There will be fewer 1-SST tiers and the compaction algorithm will maintain the tiers to have smaller to larger sizes by size ratio. However, when there are more SSTs in the LSM state, there will still be cases that we have more than `num_tiers` tiers. To limit the number of tiers, we will need another trigger. | ||||||
|  |  | ||||||
| ### Task 1.3: Reduce Sorted Runs | ### Task 1.3: Reduce Sorted Runs | ||||||
|  |  | ||||||
|  | If none of the previous triggers produce compaction tasks, we will do a compaction to reduce the number of tiers. We will simply take the top-most tiers to compact into one tier, so that the final state will have exactly `num_tiers` tiers (if no SSTs are flushed during the compaction). | ||||||
|  |  | ||||||
|  | With this compaction enabled, you will see: | ||||||
|  |  | ||||||
|  | ``` | ||||||
|  | L427 (1): [427] | ||||||
|  | L409 (18): [408, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 401, 402, 403, 404, 405, 406, 407] | ||||||
|  | L208 (31): [207, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72] | ||||||
|  | ``` | ||||||
|  |  | ||||||
|  | None of the compaction result will have more than `num_tiers` tiers. | ||||||
|  |  | ||||||
| **Note: we do not provide fine-grained unit tests for this part. You can run the compaction simulator and compare with the output of the reference solution to see if your implementation is correct.** | **Note: we do not provide fine-grained unit tests for this part. You can run the compaction simulator and compare with the output of the reference solution to see if your implementation is correct.** | ||||||
|  |  | ||||||
| ## Task 2: Integrate with the Read Path | ## Task 2: Integrate with the Read Path | ||||||
|  |  | ||||||
| As tiered compaction does not use the L0 level of the LSM state, you should directly flush your memtables to a new tier instead of as an L0 SST. You can use `self.compaction_controller.flush_to_l0()` to know whether to flush to L0. You may use the first output SST id as the level/tier id for your new sorted run. | In this task, you will need to modify: | ||||||
|  |  | ||||||
|  | ``` | ||||||
|  | src/compact.rs | ||||||
|  | src/lsm_storage.rs | ||||||
|  | ``` | ||||||
|  |  | ||||||
|  | As tiered compaction does not use the L0 level of the LSM state, you should directly flush your memtables to a new tier instead of as an L0 SST. You can use `self.compaction_controller.flush_to_l0()` to know whether to flush to L0. You may use the first output SST id as the level/tier id for your new sorted run. You will also need to modify your compaction process to construct merge iterators for tiered compaction jobs. | ||||||
|  |  | ||||||
| ## Test Your Understanding | ## Test Your Understanding | ||||||
|  |  | ||||||
| * What are the pros/cons of universal compaction compared with simple leveled/tiered compaction? | * What are the pros/cons of universal compaction compared with simple leveled/tiered compaction? | ||||||
| * How much storage space is it required (compared with user data size) to run universal compaction without using up the storage device space? | * How much storage space is it required (compared with user data size) to run universal compaction without using up the storage device space? | ||||||
|  | * Can we merge two tiers that are not adjacent in the LSM state? | ||||||
|  | * What happens if compaction cannot keep up with the SST flushes? | ||||||
| * The log-on-log problem. | * The log-on-log problem. | ||||||
|  |  | ||||||
| We do not provide reference answers to the questions, and feel free to discuss about them in the Discord community. | We do not provide reference answers to the questions, and feel free to discuss about them in the Discord community. | ||||||
|   | |||||||
| @@ -14,18 +14,136 @@ In chapter 2 day 2, you have implemented the simple leveled compaction strategie | |||||||
| * Compaction always include a full level. Note that you cannot remove the old files until you finish the compaction, and therefore, your storage engine might use 2x storage space while the compaction is going on (if it is a full compaction). Tiered compaction has the same problem. In this chapter, we will implement partial compaction that we select one SST from the upper level for compaction, instead of the full level. | * Compaction always include a full level. Note that you cannot remove the old files until you finish the compaction, and therefore, your storage engine might use 2x storage space while the compaction is going on (if it is a full compaction). Tiered compaction has the same problem. In this chapter, we will implement partial compaction that we select one SST from the upper level for compaction, instead of the full level. | ||||||
| * SSTs may be compacted across empty levels. As you have seen in the compaction simulator, when the LSM state is empty, and the engine flushes some L0 SSTs, these SSTs will be first compacted to L1, then from L1 to L2, etc. An optimal strategy is to directly place the SST from L0 to the lowest level possible, so as to avoid unnecessary write amplification. | * SSTs may be compacted across empty levels. As you have seen in the compaction simulator, when the LSM state is empty, and the engine flushes some L0 SSTs, these SSTs will be first compacted to L1, then from L1 to L2, etc. An optimal strategy is to directly place the SST from L0 to the lowest level possible, so as to avoid unnecessary write amplification. | ||||||
|  |  | ||||||
| In this chapter, you will implement a production-ready leveled compaction strategy. The strategy is the same as RocksDB's leveled compaction. | In this chapter, you will implement a production-ready leveled compaction strategy. The strategy is the same as RocksDB's leveled compaction. You will need to modify: | ||||||
|  |  | ||||||
|  | ``` | ||||||
|  | src/compact/leveled.rs | ||||||
|  | ``` | ||||||
|  |  | ||||||
|  | To run the compaction simulator, | ||||||
|  |  | ||||||
|  | ``` | ||||||
|  | cargo run --bin compaction-simulator leveled | ||||||
|  | ``` | ||||||
|  |  | ||||||
| ### Task 1.1: Compute Target Sizes | ### Task 1.1: Compute Target Sizes | ||||||
|  |  | ||||||
|  | In this compaction strategy, you will need to know the first/last key of each SST and the size of the SSTs. The compaction simulator will set up some mock SSTs for you to access. | ||||||
|  |  | ||||||
|  | You will need to compute the target sizes of the levels. Assume `base_level_size_mb` is 200MB and the number of levels (except L0) is 6. When the LSM state is empty, the target sizes will be: | ||||||
|  |  | ||||||
|  | ``` | ||||||
|  | [0 0 0 0 0 200MB] | ||||||
|  | ``` | ||||||
|  |  | ||||||
|  | When the levels grow in size as more SSTs get compacted to that level, we will compute the target size based on the size of the last level. When the actual size of SST files in the last level reaches 200MB, for example, 300MB, we will compute the target size of the other levels by dividing the `level_size_multiplier`. Assume `level_size_multiplier=10`. | ||||||
|  |  | ||||||
|  | ``` | ||||||
|  | 0 0 0 0 30MB 300MB | ||||||
|  | ``` | ||||||
|  |  | ||||||
|  | We will only keep at most *one* level below `base_level_size_mb`, and in this case, it is L5. Assume we now have 30GB files in the last level, the target sizes will be, | ||||||
|  |  | ||||||
|  | ``` | ||||||
|  | 0 0 30MB 300MB 3GB 30GB | ||||||
|  | ``` | ||||||
|  |  | ||||||
| ### Task 1.2: Decide Base Level | ### Task 1.2: Decide Base Level | ||||||
|  |  | ||||||
|  | Now, let us solve the problem that SSTs may be compacted across empty levels in the simple leveled compaction strategy. When we compact L0 SSTs with lower levels, we do not directly put it to L1. Instead, we compact it with the first level with `target size > 0``. For example, when the target level sizes are: | ||||||
|  |  | ||||||
|  | ``` | ||||||
|  | 0 0 0 0 30MB 300MB | ||||||
|  | ``` | ||||||
|  |  | ||||||
|  | We will compact L0 SSTs with L5 SSTs if the number of L0 SSTs reaches the `level0_file_num_compaction_trigger` threshold. | ||||||
|  |  | ||||||
|  | Now, you can generate L0 compaction tasks and run the compaction simulator. | ||||||
|  |  | ||||||
|  | ``` | ||||||
|  | --- After Flush --- | ||||||
|  | L0 (1): [23] | ||||||
|  | L1 (0): [] | ||||||
|  | L2 (0): [] | ||||||
|  | L3 (2): [19, 20] | ||||||
|  | L4 (6): [11, 12, 7, 8, 9, 10] | ||||||
|  |  | ||||||
|  | ... | ||||||
|  |  | ||||||
|  | --- After Flush --- | ||||||
|  | L0 (2): [102, 103] | ||||||
|  | L1 (0): [] | ||||||
|  | L2 (0): [] | ||||||
|  | L3 (18): [42, 65, 86, 87, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 61, 62, 52, 34] | ||||||
|  | L4 (6): [11, 12, 7, 8, 9, 10] | ||||||
|  | ``` | ||||||
|  |  | ||||||
|  | The number of levels in the compaction simulator is 4. Therefore, the SSTs should be directly flushed to L3/L4. | ||||||
|  |  | ||||||
| ### Task 1.3: Decide Level Priorities | ### Task 1.3: Decide Level Priorities | ||||||
|  |  | ||||||
|  | Now that we will need to handle compactions below L0. L0 compaction always has the top priority, that you should compact L0 with other levels first if it reaches the threshold. After that, we can compute the compaction priorities of each level by `current_size / target_size`. We only compact levels with this ratio `> 1.0` The one with the largest ratio will be chosen for compaction with the lower level. For example, if we have: | ||||||
|  |  | ||||||
|  | ``` | ||||||
|  | L3: 200MB, target_size=20MB | ||||||
|  | L4: 202MB, target_size=200MB | ||||||
|  | L5: 1.9GB, target_size=2GB | ||||||
|  | L6: 20GB, target_size=20GB | ||||||
|  | ``` | ||||||
|  |  | ||||||
|  | The priority of compaction will be: | ||||||
|  |  | ||||||
|  | ``` | ||||||
|  | L3: 200MB/20MB = 10.0 | ||||||
|  | L4: 202MB/200MB = 1.01 | ||||||
|  | L5: 1.9GB/2GB = 0.95 | ||||||
|  | ``` | ||||||
|  |  | ||||||
|  | L3 and L4 needs to be compacted, while L5 does not. And L3 has a larger ratio, and therefore we will produce a compaction task of L3 and L4. | ||||||
|  |  | ||||||
|  | ### Task 1.4: Select SST to Compact | ||||||
|  |  | ||||||
|  | Now, let us improve the problem that compaction always include a full level from the simple leveled compaction strategy. When we decide to compact two levels, we always select the oldest SST from the upper level. You can know the time that the SST is produced by comparing the SST id. | ||||||
|  |  | ||||||
|  | There are other ways of choosing the compacting SST, for example, by looking into the number of delete tombstones. You can implement this as part of the bonus task. | ||||||
|  |  | ||||||
|  | After you choose the upper level SST, you will need to find all SSTs in the lower level with overlapping keys of the upper level SST. Then, you can generate a compaction task that contain exactly one SST in the upper level and overlapping SSTs in the lower level. | ||||||
|  |  | ||||||
|  | When the compaction completes, you will need to remove the SSTs from the state and insert new SSTs into the correct place. Note that you should keep SST ids ordered by first keys in all levels except L0. | ||||||
|  |  | ||||||
|  | Running the compaction simulator, you should see: | ||||||
|  |  | ||||||
|  | ``` | ||||||
|  | --- After Compaction --- | ||||||
|  | L0 (0): [] | ||||||
|  | L1 (4): [222, 223, 208, 209] | ||||||
|  | L2 (5): [206, 196, 207, 212, 165] | ||||||
|  | L3 (11): [166, 120, 143, 144, 179, 148, 167, 140, 189, 180, 190] | ||||||
|  | L4 (22): [113, 85, 86, 36, 46, 37, 146, 100, 147, 203, 102, 103, 65, 81, 105, 75, 82, 95, 96, 97, 152, 153] | ||||||
|  | ``` | ||||||
|  |  | ||||||
|  | The sizes of the levels should be kept under the level multiplier ratio. And the compaction task: | ||||||
|  |  | ||||||
|  | ``` | ||||||
|  | Upper L1 [224.sst 7cd080e..=33d79d04] | ||||||
|  | Lower L2 [210.sst 1c657df4..=31a00e1b, 211.sst 31a00e1c..=46da9e43] -> [228.sst 7cd080e..=1cd18f74, 229.sst 1cd18f75..=31d616db, 230.sst 31d616dc..=46da9e43] | ||||||
|  | ``` | ||||||
|  |  | ||||||
|  | ...should only have one SST from the upper layer. | ||||||
|  |  | ||||||
| **Note: we do not provide fine-grained unit tests for this part. You can run the compaction simulator and compare with the output of the reference solution to see if your implementation is correct.** | **Note: we do not provide fine-grained unit tests for this part. You can run the compaction simulator and compare with the output of the reference solution to see if your implementation is correct.** | ||||||
|  |  | ||||||
| ## Task 2: Integrate with the Read Path | ## Task 2: Integrate with the Read Path | ||||||
|  |  | ||||||
|  | In this task, you will need to modify: | ||||||
|  |  | ||||||
|  | ``` | ||||||
|  | src/compact.rs | ||||||
|  | src/lsm_storage.rs | ||||||
|  | ``` | ||||||
|  |  | ||||||
|  | The implementation should be similar to simple leveled compaction. Remember to change both get/scan read path and the compaction iterators. | ||||||
|  |  | ||||||
| ## Test Your Understanding | ## Test Your Understanding | ||||||
|  |  | ||||||
| * Finding a good key split point for compaction may potentially reduce the write amplification, or it does not matter at all? | * Finding a good key split point for compaction may potentially reduce the write amplification, or it does not matter at all? | ||||||
|   | |||||||
| @@ -343,7 +343,6 @@ fn main() { | |||||||
|                 } else { |                 } else { | ||||||
|                     storage.dump_original_id(false, false); |                     storage.dump_original_id(false, false); | ||||||
|                 } |                 } | ||||||
|                 println!("--- Compaction Task ---"); |  | ||||||
|                 let mut num_compactions = 0; |                 let mut num_compactions = 0; | ||||||
|                 while let Some(task) = { |                 while let Some(task) = { | ||||||
|                     println!("--- Compaction Task ---"); |                     println!("--- Compaction Task ---"); | ||||||
|   | |||||||
		Reference in New Issue
	
	Block a user
	 Alex Chi Z
					Alex Chi Z