add something for w1d2

Signed-off-by: Alex Chi <iskyzh@gmail.com>
This commit is contained in:
Alex Chi
2024-01-20 22:30:02 +08:00
parent a95d866cac
commit 5cff2ec707
2 changed files with 25 additions and 2 deletions

View File

@@ -24,6 +24,8 @@ You will also notice that the `MemTable` structure does not have a `delete` inte
In this task, you will need to implement `MemTable::get` and `MemTable::put` to enable modifications of the memtable. In this task, you will need to implement `MemTable::get` and `MemTable::put` to enable modifications of the memtable.
We use the `bytes` crate for storing the data in the memtable. `bytes::Byte` is similar to `Arc<[u8]>`. When you clone the `Bytes`, or get a slice of `Bytes`, the underlying data will not be copied, and therefore cloning it is cheap. Instead, it simply creates a new reference to the storage area and the storage area will be freed when there are no reference to that area.
## Task 2: A Single Memtable in the Engine ## Task 2: A Single Memtable in the Engine
In this task, you will need to modify: In this task, you will need to modify:
@@ -148,11 +150,13 @@ Now that you have multiple memtables, you may modify your read path `get` functi
* Is it possible to use other data structures as the memtable in LSM? What are the pros/cons of using the skiplist? * Is it possible to use other data structures as the memtable in LSM? What are the pros/cons of using the skiplist?
* Why do we need a combination of `state` and `state_lock`? Can we only use `state.read()` and `state.write()`? * Why do we need a combination of `state` and `state_lock`? Can we only use `state.read()` and `state.write()`?
* Why does the order to store and to probe the memtables matter? * Why does the order to store and to probe the memtables matter?
* Is the memory layout of the memtable efficient / does it have good data locality? (Think of how `Byte` is implemented...) What are the possible optimizations to make the memtable more efficient?
* So we are using `parking_lot` locks in this tutorial. Is its read-write lock a fair lock? What might happen to the readers trying to acquire the lock if there is one writer waiting for existing readers to stop?
We do not provide reference answers to the questions, and feel free to discuss about them in the Discord community. We do not provide reference answers to the questions, and feel free to discuss about them in the Discord community.
## Bonus Tasks ## Bonus Tasks
* You may implement other memtable formats. For example, BTree memtable, vector memtable, and ART memtable. * **More Memtable Formats.** You may implement other memtable formats. For example, BTree memtable, vector memtable, and ART memtable.
{{#include copyright.md}} {{#include copyright.md}}

View File

@@ -10,8 +10,27 @@ In this chapter, you will:
## Task 1: Memtable Iterator ## Task 1: Memtable Iterator
self-referential struct
## Task 2: Merge Iterator ## Task 2: Merge Iterator
## Task 3: Read Path - Scan error handling, order requirement
## Task 3: LSM Iterator
## Task 4: Read Path - Scan
## Test Your Understanding
* Why do we need a self-referential structure for memtable iterator?
* If we want to get rid of self-referential structure and have a lifetime on the memtable iterator (i.e., `MemtableIterator<'a>`, where `'a` = memtable or `LsmStorageInner` lifetime), is it still possible to implement the `scan` functionality?
* What happens if (1) we create an iterator on the skiplist memtable (2) someone inserts new keys into the memtable (3) will the iterator see the new key?
* Why do we need to ensure the merge iterator returns data in the iterator construction order?
* Is it possible to implement a Rust-style iterator (i.e., `next(&self) -> (Key, Value)`) for LSM iterators? What are the pros/cons?
* The scan interface is like `fn scan(&self, lower: Bound<&[u8]>, upper: Bound<&[u8]>)`. How to make this API compatible with Rust-style range (i.e., `key_a..key_b`)? If you implement this, try to pass a full range `..` to the interface and see what will happen.
## Bonus Task
* **Foreground Iterator.** In this tutorial we assumed that all operations are short, so that we can hold reference to mem-table in the iterator. If an iterator is held by users for a long time, the whole mem-table (which might be 256MB) will stay in the memory even if it has been flushed to disk. To solve this, we can provide a `ForegroundIterator` / `LongIterator` to our user. The iterator will periodically create new underlying storage iterator so as to allow garbage collection of the resources.
{{#include copyright.md}} {{#include copyright.md}}