feat(docs): finish part 3

Signed-off-by: Alex Chi <iskyzh@gmail.com>
2022-12-24 19:38:36 -05:00
parent d8cc9b2cf8
commit d38f802234
4 changed files with 117 additions and 6 deletions
--- a/README.md
+++ b/README.md
@@ -28,10 +28,10 @@ The tutorial has 8 parts (which can be finished in 7 days):
 * Day 2: SST encoding.
 * Day 3: MemTable and Merge Iterators.
 * Day 4: Block cache and Engine. To reduce disk I/O and maximize performance, we will use moka-rs to build a block cache
-* for the LSM tree. In this day we will get a functional (but not persistent) key-value engine with `get`, `put`, `scan`,
+  for the LSM tree. In this day we will get a functional (but not persistent) key-value engine with `get`, `put`, `scan`,
  `delete` API.
 * Day 5: Compaction. Now it's time to maintain a leveled structure for SSTs.
 * Day 6: Recovery. We will implement WAL and manifest so that the engine can recover after restart.
 * Day 7: Bloom filter and key compression. They are widely-used optimizations in LSM tree structures.

-We have reference solution up to day 4 and tutorial up to day 2 for now.
+We have reference solution up to day 4 and tutorial up to day 3 for now.
--- a/mini-lsm-book/src/00-overview.md
+++ b/mini-lsm-book/src/00-overview.md
@@ -97,7 +97,7 @@ In this tutorial, we will build the LSM tree structure in 7 days:
 * Day 2: SST encoding.
 * Day 3: MemTable and Merge Iterators.
 * Day 4: Block cache and Engine. To reduce disk I/O and maximize performance, we will use moka-rs to build a block cache
-* for the LSM tree. In this day we will get a functional (but not persistent) key-value engine with `get`, `put`, `scan`,
+  for the LSM tree. In this day we will get a functional (but not persistent) key-value engine with `get`, `put`, `scan`,
  `delete` API.
 * Day 5: Compaction. Now it's time to maintain a leveled structure for SSTs.
 * Day 6: Recovery. We will implement WAL and manifest so that the engine can recover after restart.
--- a/mini-lsm-book/src/03-memtable.md
+++ b/mini-lsm-book/src/03-memtable.md
@@ -19,10 +19,121 @@ in part 4, we will compose all these things together to make a real storage engi

 ## Task 1 - Mem Table

+In this tutorial, we use [crossbeam-skiplist](https://docs.rs/crossbeam-skiplist) as the implementation of memtable.
+Skiplist is like linked-list, where data is stored in a list node and will not be moved in memory. Instead of using
+a single pointer for the next element, the nodes in skiplists contain multiple pointers and allow user to "skip some
+elements", so that we can achieve `O(log n)` search, insertion, and deletion.
+
+In storage engine, users will create iterators over the data structure. Generally, once user modifies the data structure,
+the iterator will become invalid (which is the case for C++ STL and Rust containers). However, skiplists allow us to
+access and modify the data structure at the same time, therefore potentially improving the performance when there is
+concurrent access. There are some papers argue that skiplists are bad, but the good property that data stays in its
+place in memory can make the implementation easier for us.
+
+In `mem_table.rs`, you will need to implement a mem-table based on crossbeam-skiplist. Note that the memtable only
+supports `get`, `scan`, and `put` without `delete`. The deletion is represented as a tombstone `key -> empty value`,
+and the actual data will be deleted during the compaction process (day 5). Note that all `get`, `scan`, `put` functions
+only need `&self`, which means that we can concurrently call these operations.
+
 ## Task 2 - Mem Table Iterator

-## Task 3 - Two-Merge Iterator
+You can now implement an iterator `MemTableIterator` for `MemTable`. `memtable.iter(start, end)` will create an iterator
+that returns all elements within the range `start, end`. Here, start is `std::ops::Bound`, which contains 3 variants:
+`Unbounded`, `Included(key)`, `Excluded(key)`. The expresiveness of `std::ops::Bound` eliminates the need to memorizing
+whether an API has a closed range or open range.

-## Task 4 - Merge Iterator
+Note that `crossbeam-skiplist`'s iterator has the same lifetime as the skiplist itself, which means that we will always
+need to provide a lifetime when using the iterator. This is very hard to use. You can use the `ouroboros` crate to
+create a self-referential struct that erases the lifetime. You will find the [ouroboros examples][ouroboros-example]
+helpful.
+
+[ouroboros-example]: https://github.com/joshua-maros/ouroboros/blob/main/examples/src/ok_tests.rs
+
+```rust
+pub struct MemTableIterator {
+    /// hold the reference to the skiplist so that the iterator will be valid.
+    map: Arc<SkipList>
+    /// then the lifetime of the iterator should be the same as the `MemTableIterator` struct itself
+    iter: SkipList::Iter<'this>
+}
+```
+
+You will also need to convert the Rust-style iterator API to our storage iterator. In Rust, we use `next() -> Data`. But
+in this tutorial, `next` doesn't have a return value, and the data should be fetched by `key()` and `value()`. You will
+need to think a way to implement this.
+
+<details>
+<summary>Spoiler: the MemTableIterator struct</summary>
+
+```rust
+#[self_referencing]
+pub struct MemTableIterator {
+    map: Arc<SkipMap<Bytes, Bytes>>,
+    #[borrows(map)]
+    #[not_covariant]
+    iter: SkipMapRangeIter<'this>,
+    item: (Bytes, Bytes),
+}
+```
+
+We have `map` serving as a reference to the skipmap, `iter` as a self-referential item of the struct, and `item` as the
+last item from the iterator. You might have thought of using something like `iter::Peekable`, but it requires `&mut self`
+when retrieving the key and value. Therefore, one approach is to (1) get the element from the iterator on initializing
+the `MemTableIterator`, store it in `item` (2) when calling `next`, we get the element from inner iter's `next` and move
+the inner iter to the next position.
+
+</details>
+
+## Task 3 - Merge Iterator
+
+Now that you have a lot of mem-tables and SSTs, you might want to merge them to get the latest occurence of a key.
+In `merge_iterator.rs`, we have `MergeIterator`, which is an iterator that merges all iterators *of the same type*.
+The iterator at the lower index position of the `new` function has higher priority, that is to say, if we have:
+
+```
+iter1: 1->a, 2->b, 3->c
+iter2: 1->d
+iter: MergeIterator::create(vec![iter1, iter2])
+```
+
+The final iterator will produce `1->a, 2->b, 3->c`. The data in iter1 will overwrite the data in other iterators.
+
+You can use a `BinaryHeap` to implement this merge iterator. Note that you should never put any invalid iterator inside
+the binary heap. One common pitfall is on error handling. For example,
+
+```rust
+let Some(mut inner_iter) = self.iters.peek_mut() {
+    inner_iter.next()?; // <- will cause problem
+}
+```
+
+If `next` returns an error (i.e., due to disk failure, network failure, checksum error, etc.), it is no longer valid.
+However, when we go out of the if condition and return the error to the caller, `PeekMut`'s drop will try move the
+element within the heap, which causes an access to an invalid iterator. Therefore, you will need to do all error
+handling by yourself instead of using `?` within the scope of `PeekMut`.
+
+You will also need to define a wrapper for the storage iterator so that `BinaryHeap` can compare across all iterators.
+
+## Task 4 - Two Merge Iterator
+
+The LSM has two structures for storing data: the mem-tables in memory, and the SSTs on disk. After we constructed the
+iterator for all SSTs and all mem-tables respectively, we will need a new iterator to merge iterators of two different
+types. That is `TwoMergeIterator`.
+
+You can implement `TwoMergeIterator` in `two_merge_iter.rs`. Similar to `MergeIterator`, if the same key is found in
+both of the iterator, the first iterator takes precedence.
+
+In this tutorial, we explicitly did not use something like `Box<dyn StorageIter>` to avoid dynamic dispatch. This is a
+common optimization in LSM storage engines.

 ## Extra Tasks
+
+* Implement different mem-table and see how it differs from skiplist. i.e., BTree mem-table. You will notice that it is
+  hard to get an iterator over the B+ tree without holding a lock of the same timespan as the iterator. You might need
+  to think of smart ways of solving this.
+* Async iterator. One interesting thing to explore is to see if it is possible to asynchronize everything in the storage
+  engine. You might find some lifetime related problems and need to workaround them.
+* Foreground iterator. In this tutorial we assumed that all operations are short, so that we can hold reference to
+  mem-table in the iterator. If an iterator is held by users for a long time, the whole mem-table (which might be 256MB)
+  will not stay in the memory. To solve this, we can provide a `ForegroundIterator` / `LongIterator` to our user. The
+  iterator will periodically create new underlying storage iterator so as to allow garbage collection of the resources.
--- a/mini-lsm-book/src/SUMMARY.md
+++ b/mini-lsm-book/src/SUMMARY.md
@@ -7,7 +7,7 @@
 - [Store key-value pairs in little blocks](./01-block.md)
 - [And make them into an SST](./02-sst.md)
 - [Now it's time to merge everything](./03-memtable.md)
- [The engine starts](./04-engine.md)
+- [The engine on fire](./04-engine.md)
 - [Let's do something in the background](./05-compaction.md)
 - [Be careful when the system crashes](./06-recovery.md)
 - [A good bloom filter makes life easier](./07-bloom-filter.md)