@@ -24,6 +24,8 @@ You will also notice that the `MemTable` structure does not have a `delete` inte
|
||||
|
||||
In this task, you will need to implement `MemTable::get` and `MemTable::put` to enable modifications of the memtable.
|
||||
|
||||
We use the `bytes` crate for storing the data in the memtable. `bytes::Byte` is similar to `Arc<[u8]>`. When you clone the `Bytes`, or get a slice of `Bytes`, the underlying data will not be copied, and therefore cloning it is cheap. Instead, it simply creates a new reference to the storage area and the storage area will be freed when there are no reference to that area.
|
||||
|
||||
## Task 2: A Single Memtable in the Engine
|
||||
|
||||
In this task, you will need to modify:
|
||||
@@ -148,11 +150,13 @@ Now that you have multiple memtables, you may modify your read path `get` functi
|
||||
* Is it possible to use other data structures as the memtable in LSM? What are the pros/cons of using the skiplist?
|
||||
* Why do we need a combination of `state` and `state_lock`? Can we only use `state.read()` and `state.write()`?
|
||||
* Why does the order to store and to probe the memtables matter?
|
||||
* Is the memory layout of the memtable efficient / does it have good data locality? (Think of how `Byte` is implemented...) What are the possible optimizations to make the memtable more efficient?
|
||||
* So we are using `parking_lot` locks in this tutorial. Is its read-write lock a fair lock? What might happen to the readers trying to acquire the lock if there is one writer waiting for existing readers to stop?
|
||||
|
||||
We do not provide reference answers to the questions, and feel free to discuss about them in the Discord community.
|
||||
|
||||
## Bonus Tasks
|
||||
|
||||
* You may implement other memtable formats. For example, BTree memtable, vector memtable, and ART memtable.
|
||||
* **More Memtable Formats.** You may implement other memtable formats. For example, BTree memtable, vector memtable, and ART memtable.
|
||||
|
||||
{{#include copyright.md}}
|
||||
|
@@ -10,8 +10,27 @@ In this chapter, you will:
|
||||
|
||||
## Task 1: Memtable Iterator
|
||||
|
||||
self-referential struct
|
||||
|
||||
## Task 2: Merge Iterator
|
||||
|
||||
## Task 3: Read Path - Scan
|
||||
error handling, order requirement
|
||||
|
||||
## Task 3: LSM Iterator
|
||||
|
||||
## Task 4: Read Path - Scan
|
||||
|
||||
## Test Your Understanding
|
||||
|
||||
* Why do we need a self-referential structure for memtable iterator?
|
||||
* If we want to get rid of self-referential structure and have a lifetime on the memtable iterator (i.e., `MemtableIterator<'a>`, where `'a` = memtable or `LsmStorageInner` lifetime), is it still possible to implement the `scan` functionality?
|
||||
* What happens if (1) we create an iterator on the skiplist memtable (2) someone inserts new keys into the memtable (3) will the iterator see the new key?
|
||||
* Why do we need to ensure the merge iterator returns data in the iterator construction order?
|
||||
* Is it possible to implement a Rust-style iterator (i.e., `next(&self) -> (Key, Value)`) for LSM iterators? What are the pros/cons?
|
||||
* The scan interface is like `fn scan(&self, lower: Bound<&[u8]>, upper: Bound<&[u8]>)`. How to make this API compatible with Rust-style range (i.e., `key_a..key_b`)? If you implement this, try to pass a full range `..` to the interface and see what will happen.
|
||||
|
||||
## Bonus Task
|
||||
|
||||
* **Foreground Iterator.** In this tutorial we assumed that all operations are short, so that we can hold reference to mem-table in the iterator. If an iterator is held by users for a long time, the whole mem-table (which might be 256MB) will stay in the memory even if it has been flushed to disk. To solve this, we can provide a `ForegroundIterator` / `LongIterator` to our user. The iterator will periodically create new underlying storage iterator so as to allow garbage collection of the resources.
|
||||
|
||||
{{#include copyright.md}}
|
||||
|
Reference in New Issue
Block a user