feat(docs): finish part 3

Signed-off-by: Alex Chi <iskyzh@gmail.com>
This commit is contained in:
Alex Chi
2022-12-24 19:38:36 -05:00
parent d8cc9b2cf8
commit d38f802234
4 changed files with 117 additions and 6 deletions

View File

@@ -28,10 +28,10 @@ The tutorial has 8 parts (which can be finished in 7 days):
* Day 2: SST encoding.
* Day 3: MemTable and Merge Iterators.
* Day 4: Block cache and Engine. To reduce disk I/O and maximize performance, we will use moka-rs to build a block cache
* for the LSM tree. In this day we will get a functional (but not persistent) key-value engine with `get`, `put`, `scan`,
for the LSM tree. In this day we will get a functional (but not persistent) key-value engine with `get`, `put`, `scan`,
`delete` API.
* Day 5: Compaction. Now it's time to maintain a leveled structure for SSTs.
* Day 6: Recovery. We will implement WAL and manifest so that the engine can recover after restart.
* Day 7: Bloom filter and key compression. They are widely-used optimizations in LSM tree structures.
We have reference solution up to day 4 and tutorial up to day 2 for now.
We have reference solution up to day 4 and tutorial up to day 3 for now.

View File

@@ -97,7 +97,7 @@ In this tutorial, we will build the LSM tree structure in 7 days:
* Day 2: SST encoding.
* Day 3: MemTable and Merge Iterators.
* Day 4: Block cache and Engine. To reduce disk I/O and maximize performance, we will use moka-rs to build a block cache
* for the LSM tree. In this day we will get a functional (but not persistent) key-value engine with `get`, `put`, `scan`,
for the LSM tree. In this day we will get a functional (but not persistent) key-value engine with `get`, `put`, `scan`,
`delete` API.
* Day 5: Compaction. Now it's time to maintain a leveled structure for SSTs.
* Day 6: Recovery. We will implement WAL and manifest so that the engine can recover after restart.

View File

@@ -19,10 +19,121 @@ in part 4, we will compose all these things together to make a real storage engi
## Task 1 - Mem Table
In this tutorial, we use [crossbeam-skiplist](https://docs.rs/crossbeam-skiplist) as the implementation of memtable.
Skiplist is like linked-list, where data is stored in a list node and will not be moved in memory. Instead of using
a single pointer for the next element, the nodes in skiplists contain multiple pointers and allow user to "skip some
elements", so that we can achieve `O(log n)` search, insertion, and deletion.
In storage engine, users will create iterators over the data structure. Generally, once user modifies the data structure,
the iterator will become invalid (which is the case for C++ STL and Rust containers). However, skiplists allow us to
access and modify the data structure at the same time, therefore potentially improving the performance when there is
concurrent access. There are some papers argue that skiplists are bad, but the good property that data stays in its
place in memory can make the implementation easier for us.
In `mem_table.rs`, you will need to implement a mem-table based on crossbeam-skiplist. Note that the memtable only
supports `get`, `scan`, and `put` without `delete`. The deletion is represented as a tombstone `key -> empty value`,
and the actual data will be deleted during the compaction process (day 5). Note that all `get`, `scan`, `put` functions
only need `&self`, which means that we can concurrently call these operations.
## Task 2 - Mem Table Iterator
## Task 3 - Two-Merge Iterator
You can now implement an iterator `MemTableIterator` for `MemTable`. `memtable.iter(start, end)` will create an iterator
that returns all elements within the range `start, end`. Here, start is `std::ops::Bound`, which contains 3 variants:
`Unbounded`, `Included(key)`, `Excluded(key)`. The expresiveness of `std::ops::Bound` eliminates the need to memorizing
whether an API has a closed range or open range.
## Task 4 - Merge Iterator
Note that `crossbeam-skiplist`'s iterator has the same lifetime as the skiplist itself, which means that we will always
need to provide a lifetime when using the iterator. This is very hard to use. You can use the `ouroboros` crate to
create a self-referential struct that erases the lifetime. You will find the [ouroboros examples][ouroboros-example]
helpful.
[ouroboros-example]: https://github.com/joshua-maros/ouroboros/blob/main/examples/src/ok_tests.rs
```rust
pub struct MemTableIterator {
/// hold the reference to the skiplist so that the iterator will be valid.
map: Arc<SkipList>
/// then the lifetime of the iterator should be the same as the `MemTableIterator` struct itself
iter: SkipList::Iter<'this>
}
```
You will also need to convert the Rust-style iterator API to our storage iterator. In Rust, we use `next() -> Data`. But
in this tutorial, `next` doesn't have a return value, and the data should be fetched by `key()` and `value()`. You will
need to think a way to implement this.
<details>
<summary>Spoiler: the MemTableIterator struct</summary>
```rust
#[self_referencing]
pub struct MemTableIterator {
map: Arc<SkipMap<Bytes, Bytes>>,
#[borrows(map)]
#[not_covariant]
iter: SkipMapRangeIter<'this>,
item: (Bytes, Bytes),
}
```
We have `map` serving as a reference to the skipmap, `iter` as a self-referential item of the struct, and `item` as the
last item from the iterator. You might have thought of using something like `iter::Peekable`, but it requires `&mut self`
when retrieving the key and value. Therefore, one approach is to (1) get the element from the iterator on initializing
the `MemTableIterator`, store it in `item` (2) when calling `next`, we get the element from inner iter's `next` and move
the inner iter to the next position.
</details>
## Task 3 - Merge Iterator
Now that you have a lot of mem-tables and SSTs, you might want to merge them to get the latest occurence of a key.
In `merge_iterator.rs`, we have `MergeIterator`, which is an iterator that merges all iterators *of the same type*.
The iterator at the lower index position of the `new` function has higher priority, that is to say, if we have:
```
iter1: 1->a, 2->b, 3->c
iter2: 1->d
iter: MergeIterator::create(vec![iter1, iter2])
```
The final iterator will produce `1->a, 2->b, 3->c`. The data in iter1 will overwrite the data in other iterators.
You can use a `BinaryHeap` to implement this merge iterator. Note that you should never put any invalid iterator inside
the binary heap. One common pitfall is on error handling. For example,
```rust
let Some(mut inner_iter) = self.iters.peek_mut() {
inner_iter.next()?; // <- will cause problem
}
```
If `next` returns an error (i.e., due to disk failure, network failure, checksum error, etc.), it is no longer valid.
However, when we go out of the if condition and return the error to the caller, `PeekMut`'s drop will try move the
element within the heap, which causes an access to an invalid iterator. Therefore, you will need to do all error
handling by yourself instead of using `?` within the scope of `PeekMut`.
You will also need to define a wrapper for the storage iterator so that `BinaryHeap` can compare across all iterators.
## Task 4 - Two Merge Iterator
The LSM has two structures for storing data: the mem-tables in memory, and the SSTs on disk. After we constructed the
iterator for all SSTs and all mem-tables respectively, we will need a new iterator to merge iterators of two different
types. That is `TwoMergeIterator`.
You can implement `TwoMergeIterator` in `two_merge_iter.rs`. Similar to `MergeIterator`, if the same key is found in
both of the iterator, the first iterator takes precedence.
In this tutorial, we explicitly did not use something like `Box<dyn StorageIter>` to avoid dynamic dispatch. This is a
common optimization in LSM storage engines.
## Extra Tasks
* Implement different mem-table and see how it differs from skiplist. i.e., BTree mem-table. You will notice that it is
hard to get an iterator over the B+ tree without holding a lock of the same timespan as the iterator. You might need
to think of smart ways of solving this.
* Async iterator. One interesting thing to explore is to see if it is possible to asynchronize everything in the storage
engine. You might find some lifetime related problems and need to workaround them.
* Foreground iterator. In this tutorial we assumed that all operations are short, so that we can hold reference to
mem-table in the iterator. If an iterator is held by users for a long time, the whole mem-table (which might be 256MB)
will not stay in the memory. To solve this, we can provide a `ForegroundIterator` / `LongIterator` to our user. The
iterator will periodically create new underlying storage iterator so as to allow garbage collection of the resources.

View File

@@ -7,7 +7,7 @@
- [Store key-value pairs in little blocks](./01-block.md)
- [And make them into an SST](./02-sst.md)
- [Now it's time to merge everything](./03-memtable.md)
- [The engine starts](./04-engine.md)
- [The engine on fire](./04-engine.md)
- [Let's do something in the background](./05-compaction.md)
- [Be careful when the system crashes](./06-recovery.md)
- [A good bloom filter makes life easier](./07-bloom-filter.md)