| @@ -84,5 +84,6 @@ We do not provide reference answers to the questions, and feel free to discuss a | |||||||
| ## Bonus Tasks | ## Bonus Tasks | ||||||
|  |  | ||||||
| * **The Cost of Dynamic Dispatch.** Implement a `Box<dyn StorageIterator>` version of merge iterators and benchmark to see the performance differences. | * **The Cost of Dynamic Dispatch.** Implement a `Box<dyn StorageIterator>` version of merge iterators and benchmark to see the performance differences. | ||||||
|  | * **Parallel Seek.** Creating a merge iterator requires loading the first block of all underlying SSTs (when you create `SSTIterator`). You may parallelize the process of creating iterators. | ||||||
|  |  | ||||||
| {{#include copyright.md}} | {{#include copyright.md}} | ||||||
|   | |||||||
| @@ -130,6 +130,7 @@ As tiered compaction does not use the L0 level of the LSM state, you should dire | |||||||
| * What happens if compaction speed cannot keep up with the SST flushes? | * What happens if compaction speed cannot keep up with the SST flushes? | ||||||
| * What might needs to be considered if the system schedules multiple compaction tasks in parallel? | * What might needs to be considered if the system schedules multiple compaction tasks in parallel? | ||||||
| * SSDs also write its own logs (basically it is a log-structured storage). If the SSD has a write amplification of 2x, what is the end-to-end write amplification of the whole system? Related: [ZNS: Avoiding the Block Interface Tax for Flash-based SSDs](https://www.usenix.org/conference/atc21/presentation/bjorling). | * SSDs also write its own logs (basically it is a log-structured storage). If the SSD has a write amplification of 2x, what is the end-to-end write amplification of the whole system? Related: [ZNS: Avoiding the Block Interface Tax for Flash-based SSDs](https://www.usenix.org/conference/atc21/presentation/bjorling). | ||||||
|  | * Consider the case that the user chooses to keep a large number of sorted runs (i.e., 300) for tiered compaction. To make the read path faster, is it a good idea to keep some data structure that helps reduce the time complexity (i.e., to `O(log n)`) of finding SSTs to read in each layer for some key ranges? Note that normally, you will need to do a binary search in each sorted run to find the key ranges that you will need to read. (Check out Neon's [layer map](https://neon.tech/blog/persistent-structures-in-neons-wal-indexing) implementation!) | ||||||
|  |  | ||||||
| We do not provide reference answers to the questions, and feel free to discuss about them in the Discord community. | We do not provide reference answers to the questions, and feel free to discuss about them in the Discord community. | ||||||
|  |  | ||||||
|   | |||||||
| @@ -87,5 +87,6 @@ get 1500 | |||||||
| ## Bonus Tasks | ## Bonus Tasks | ||||||
|  |  | ||||||
| * **Manifest Compaction.** When the number of logs in the manifest file gets too large, you can rewrite the manifest file to only store the current snapshot and append new logs to that file. | * **Manifest Compaction.** When the number of logs in the manifest file gets too large, you can rewrite the manifest file to only store the current snapshot and append new logs to that file. | ||||||
|  | * **Parallel Open.** After you collect the list of SSTs to open, you can open and decode them in parallel, instead of doing it one by one, therefore accelerating the recovery process. | ||||||
|  |  | ||||||
| {{#include copyright.md}} | {{#include copyright.md}} | ||||||
|   | |||||||
		Reference in New Issue
	
	Block a user
	 Alex Chi Z
					Alex Chi Z