| @@ -149,7 +149,7 @@ Now that you have multiple memtables, you may modify your read path `get` functi | ||||
| * Why doesn't the memtable provide a `delete` API? | ||||
| * Is it possible to use other data structures as the memtable in LSM? What are the pros/cons of using the skiplist? | ||||
| * Why do we need a combination of `state` and `state_lock`? Can we only use `state.read()` and `state.write()`? | ||||
| * Why does the order to store and to probe the memtables matter? | ||||
| * Why does the order to store and to probe the memtables matter? If a key appears in multiple memtables, which version should you return to the user? | ||||
| * Is the memory layout of the memtable efficient / does it have good data locality? (Think of how `Byte` is implemented and stored in the skiplist...) What are the possible optimizations to make the memtable more efficient? | ||||
| * So we are using `parking_lot` locks in this tutorial. Is its read-write lock a fair lock? What might happen to the readers trying to acquire the lock if there is one writer waiting for existing readers to stop? | ||||
| * After freezing the memtable, is it possible that some threads still hold the old LSM state and wrote into these immutable memtables? How does your solution prevent it from happening? | ||||
|   | ||||
| @@ -10,19 +10,110 @@ In this chapter, you will: | ||||
|  | ||||
| ## Task 1: Memtable Iterator | ||||
|  | ||||
| self-referential struct | ||||
| In this chapter, we will implement the LSM `scan` interface. `scan` returns a range of key-value pairs in order using an iterator API. In the previous chapter, you have implemented the `get` API and the logic to create immutable memtables, and your LSM state should now have multiple memtables. You will need to first create iterators on a single memtable, then create a merge iterator on all memtables, and finally implement the range limit for the iterators. | ||||
|  | ||||
| In this task, you will need to modify: | ||||
|  | ||||
| ``` | ||||
| src/mem_table.rs | ||||
| ``` | ||||
|  | ||||
| All LSM iterators implement the `StorageIterator` trait. It has 4 functions: `key`, `value`, `next`, and `is_valid`. When the iterator is created, its cursor will stop on some element, and `key` / `value` will return the first key in the memtable/block/SST satisfying the start condition (i.e., start key). These two interfaces will return a `&[u8]` to avoid copy. Note that this iterator interface is different from the Rust-style iterator. | ||||
|  | ||||
| `next` moves the cursor to the next place. `is_valid` returns if the iterator has reached the end or errored. You can assume `next` will only be called when `is_valid` returns true. There will be a `FusedIterator` wrapper for iterators that block calls to `next` when the iterator is not valid to avoid users from misusing the iterators. | ||||
|  | ||||
| Back to the memtable iterator. You should have found out that the iterator does not have any lifetime associated with that. Imagine that you create a `Vec<u64>` and call `vec.iter()`, the iterator type will be something like `VecIterator<'a>`, where `'a` is the lifetime of the `vec` object. The same applies to `SkipMap`, where its `iter` API returns an iterator with a lifetime. However, in our case, we do not want to have such lifetimes on our iterators to avoid making the system overcomplicated (and hard to compile...). | ||||
|  | ||||
| If the iterator does not have a lifetime generics parameter, we should ensure that *whenever the iterator is being used, the underlying skiplist object is not freed*. The only way to achieve that is to put the `Arc<SkipMap>` object into the iterator itself. To define such a structure, | ||||
|  | ||||
| ```rust,no_run | ||||
| pub struct MemtableIterator { | ||||
|     map: Arc<SkipMap<Bytes, Bytes>>, | ||||
|     iter: SkipMapRangeIter<'???>, | ||||
| } | ||||
| ``` | ||||
|  | ||||
| Okay, here is the problem: we want to express that the lifetime of the iterator is the same as the `map` in the structure. How can we do that? | ||||
|  | ||||
| This is the first and most tricky Rust language thing that you will ever meet in this tutorial -- self-referential structure. If it is possible to write something like: | ||||
|  | ||||
| ```rust,no_run | ||||
| pub struct MemtableIterator { // <- with lifetime 'this | ||||
|     map: Arc<SkipMap<Bytes, Bytes>>, | ||||
|     iter: SkipMapRangeIter<'this>, | ||||
| } | ||||
| ``` | ||||
|  | ||||
| Then the problem is solved! You can do this with the help of some third-party libraries like `ouroboros`. It provides an easy way to define self-referential structure. It is also possible to do this with unsafe Rust (and indeed, `ouroboros` itself uses unsafe Rust internally...) | ||||
|  | ||||
| We have already defined the self-referential `MemtableIterator` fields for you, and you will need to implement `MemtableIterator` and the `Memtable::scan` API. | ||||
|  | ||||
| ## Task 2: Merge Iterator | ||||
|  | ||||
| error handling, order requirement | ||||
| In this task, you will need to modify: | ||||
|  | ||||
| ## Task 3: LSM Iterator | ||||
| ``` | ||||
| src/iterators/merge_iterator.rs | ||||
| ``` | ||||
|  | ||||
| Now that you have multiple memtables and you will create multiple memtable iterators. You will need to merge the results from the memtables and return the latest version of each key to the user. | ||||
|  | ||||
| `MergeIterator` maintains a binary heap internally. Note that you will need to handle errors (i.e., when an iterator is not valid) and ensure that the latest version of a key-value pair comes out. | ||||
|  | ||||
| For example, if we have the following data: | ||||
|  | ||||
| ``` | ||||
| iter1: b->del, c->4, d->5 | ||||
| iter2: a->1, b->2, c->3 | ||||
| iter3: e->4 | ||||
| ``` | ||||
|  | ||||
| The sequence that the merge iterator outputs should be: | ||||
|  | ||||
| ``` | ||||
| a->1, b->del, c->4, d->5, e->4 | ||||
| ``` | ||||
|  | ||||
| The constructor of the merge iterator takes a vector of iterators. We assume the one with a lower index (i.e., the first one) has the latest data. | ||||
|  | ||||
| We want to avoid dynamic dispatch as much as possible, and therefore we do not use `Box<dyn StorageIterator>` in the system. Instead, we prefer static dispatch using generics. | ||||
|  | ||||
| ## Task 3: LSM Iterator + Fused Iterator | ||||
|  | ||||
| In this task, you will need to modify: | ||||
|  | ||||
| ``` | ||||
| src/iterators/lsm_iterator.rs | ||||
| ``` | ||||
|  | ||||
| We use the `LsmIterator` structure to represent the internal LSM iterators. You will need to modify this structure multiple times throughout the tutorial when more iterators are added into the system. For now, because we only have multiple memtables, it should be defined as: | ||||
|  | ||||
| ```rust,no_run | ||||
| type LsmIteratorInner = MergeIterator<MemTableIterator>; | ||||
| ``` | ||||
|  | ||||
| You may go ahead and implement the `LsmIterator` structure, which calls the corresponding inner iterator, and also skip deleted keys. | ||||
|  | ||||
| We do not test `LsmIterator` in this task. There will be an integration test in task 4. | ||||
|  | ||||
| Then, we want to provide extra safety on the iterator to avoid users from misusing them. Users should not call `key`, `value`, or `next` when the iterator is not valid. At the same time, they should not use the iterator anymore if `next` returns an error. `FusedIterator` is a wrapper around an iterator to normalize the behaviors across all iterators. You can go ahead and implement it by yourself. | ||||
|  | ||||
| ## Task 4: Read Path - Scan | ||||
|  | ||||
| In this task, you will need to modify: | ||||
|  | ||||
| ``` | ||||
| src/iterators/lsm_storage.rs | ||||
| ``` | ||||
|  | ||||
| We are finally there -- with all iterators you have implemented, you can finally implement the `scan` interface of the LSM engine. | ||||
|  | ||||
| ## Test Your Understanding | ||||
|  | ||||
| * What is the time/space complexity of using your merge iterator? | ||||
| * Why do we need a self-referential structure for memtable iterator? | ||||
| * If a key is removed (there is a delete tombstone), do you need to return it to the user? Where did you handle this logic? | ||||
| * If a key has multiple versions, will the user see all of them? Where did you handle this logic? | ||||
| * If we want to get rid of self-referential structure and have a lifetime on the memtable iterator (i.e., `MemtableIterator<'a>`, where `'a` = memtable or `LsmStorageInner` lifetime), is it still possible to implement the `scan` functionality? | ||||
| * What happens if (1) we create an iterator on the skiplist memtable (2) someone inserts new keys into the memtable (3) will the iterator see the new key? | ||||
| * What happens if your key comparator cannot give the binary heap implementation a stable order? | ||||
|   | ||||
| @@ -13,9 +13,12 @@ In this chapter, you will: | ||||
|  | ||||
| ## Test Your Understanding | ||||
|  | ||||
| * So `Block` is simply a vector of raw data and a vector of offsets. Can we change them to `Byte` and `Arc<[u16]>`, and change all the iterator interfaces to return `Byte` instead of `&[u8]`? What are the pros/cons? | ||||
| * What is the time complexity of seeking a key in the block? | ||||
| * Where does the cursor stop when you seek a non-existent key in your implementation? | ||||
| * So `Block` is simply a vector of raw data and a vector of offsets. Can we change them to `Byte` and `Arc<[u16]>`, and change all the iterator interfaces to return `Byte` instead of `&[u8]`? (Assume that we use `Byte::slice` to return a slice of the block without copying.) What are the pros/cons? | ||||
| * What is the endian of the numbers written into the blocks in your implementation? | ||||
| * Is your implementation prune to a maliciously-built block? Will there be invalid memory access, or OOMs, if a user deliberately construct an invalid block? | ||||
| * Can a block contain duplicated keys? | ||||
| * What happens if the user adds a key larger than the target block size? | ||||
| * Consider the case that the LSM engine is built on object store services (S3). How would you optimize/change the block format and parameters to make it suitable for such services? | ||||
| * Do you love bubble tea? Why or why not? | ||||
|   | ||||
| @@ -15,11 +15,14 @@ In this chapter, you will: | ||||
|  | ||||
| ## Test Your Understanding | ||||
|  | ||||
| * What is the time complexity of seeking a key in the SST? | ||||
| * Where does the cursor stop when you seek a non-existent key in your implementation? | ||||
| * Is it possible (or necessary) to do in-place updates of SST files? | ||||
| * An SST is usually large (i.e., 256MB). In this case, the cost of copying/expanding the `Vec` would be significant. Does your implementation allocate enough space for your SST builder in advance? How did you implement it? | ||||
| * Looking at the `moka` block cache, why does it return `Arc<Error>` instead of the original `Error`? | ||||
| * Does the usage of a block cache guarantee that there will be at most a fixed number of blocks in memory? For example, if you have a `moka` block cache of 4GB and block size of 4KB, will there be more than 4GB/4KB number of blocks in memory at the same time? | ||||
| * Is it possible to store columnar data (i.e., a table of 100 integer columns) in an LSM engine? Is the current SST format still a good choice? | ||||
| * Consider the case that the LSM engine is built on object store services (S3). How would you optimize/change the SST format/parameters and the block cache to make it suitable for such services? | ||||
| * Consider the case that the LSM engine is built on object store services (i.e., S3). How would you optimize/change the SST format/parameters and the block cache to make it suitable for such services? | ||||
|  | ||||
| We do not provide reference answers to the questions, and feel free to discuss about them in the Discord community. | ||||
|  | ||||
|   | ||||
| @@ -12,6 +12,8 @@ In this chapter, you will: | ||||
|  | ||||
| * Is it correct that a key will take some storage space even if a user requests to delete it? | ||||
| * Given that compaction takes a lot of write bandwidth and read bandwidth and may interfere with foreground operations, it is a good idea to postpone compaction when there are large write flow. It is even beneficial to stop/pause existing compaction tasks in this situation. What do you think of this idea? (Read the Slik paper!) | ||||
| * Is it a good idea to use/fill the block cache for compactions? Or is it better to fully bypass the block cache when compaction? | ||||
| * Some researchers/engineers propose to offload compaction to a remote server or a serverless lambda function. What are the benefits, and what might be the potential challenges and performance impacts of doing remote compaction? (Think of the point when a compaction completes and the block cache...) | ||||
|  | ||||
| We do not provide reference answers to the questions, and feel free to discuss about them in the Discord community. | ||||
|  | ||||
|   | ||||
| @@ -9,6 +9,7 @@ In this chapter, you will: | ||||
|  | ||||
| ## Test Your Understanding | ||||
|  | ||||
| * (I know this is stupid but) could you please repeat the definition of read/write/space amplifications? What are the ways to accurately compute them, and what are the ways to estimate them? | ||||
| * Is it correct that a key will only be purged from the LSM tree if the user requests to delete it and it has been compacted in the bottom-most level? | ||||
| * Is it a good strategy to periodically do a full compaction on the LSM tree? Why or why not? | ||||
| * Actively choosing some old files/levels to compact even if they do not violate the level amplifier would be a good choice, is it true? (Look at the Lethe paper!) | ||||
|   | ||||
		Reference in New Issue
	
	Block a user
	 Alex Chi
					Alex Chi