add week 1 day 3 blocks

Signed-off-by: Alex Chi Z <iskyzh@gmail.com>
This commit is contained in:
Alex Chi Z
2024-01-21 13:55:49 +08:00
parent 71db0cc6a1
commit f88394a686
9 changed files with 220 additions and 28 deletions

View File

@@ -78,7 +78,7 @@ The constructor of the merge iterator takes a vector of iterators. We assume the
One common pitfall is on error handling. For example,
```rust
```rust,no_run
let Some(mut inner_iter) = self.iters.peek_mut() {
inner_iter.next()?; // <- will cause problem
}
@@ -116,7 +116,7 @@ In this task, you will need to modify:
src/iterators/lsm_storage.rs
```
We are finally there -- with all iterators you have implemented, you can finally implement the `scan` interface of the LSM engine.
We are finally there -- with all iterators you have implemented, you can finally implement the `scan` interface of the LSM engine. You can simply construct an LSM iterator with the memtable iterators (remember to put the latest memtable at the front of the merge iterator), and your storage engine will be able to handle the scan request.
## Test Your Understanding
@@ -133,7 +133,7 @@ We are finally there -- with all iterators you have implemented, you can finally
We do not provide reference answers to the questions, and feel free to discuss about them in the Discord community.
## Bonus Task
## Bonus Tasks
* **Foreground Iterator.** In this tutorial we assumed that all operations are short, so that we can hold reference to mem-table in the iterator. If an iterator is held by users for a long time, the whole mem-table (which might be 256MB) will stay in the memory even if it has been flushed to disk. To solve this, we can provide a `ForegroundIterator` / `LongIterator` to our user. The iterator will periodically create new underlying storage iterator so as to allow garbage collection of the resources.

View File

@@ -9,8 +9,81 @@ In this chapter, you will:
## Task 1: Block Builder
You have already implemented all in-memory structures for an LSM storage engine in the previous two chapters. Now it's time to build the on-disk structures. The basic unit of the on-disk structure is blocks. Blocks are usually of 4-KB size (the size may vary depending on the storage medium), which is equivalent to the page size in the operating system and the page size on an SSD. A block stores ordered key-value pairs. An SST is composed of multiple blocks. When the number of memtables exceed the system limit, it will flush the memtable as an SST. In this chapter, you will implement the encoding and decoding of a block.
In this task, you will need to modify:
```
src/block/builder.rs
src/block.rs
```
The block encoding format in our tutorial is as follows:
```plaintext
----------------------------------------------------------------------------------------------------
| Data Section | Offset Section | Extra |
----------------------------------------------------------------------------------------------------
| Entry #1 | Entry #2 | ... | Entry #N | Offset #1 | Offset #2 | ... | Offset #N | num_of_elements |
----------------------------------------------------------------------------------------------------
```
Each entry is a key-value pair.
```plaintext
-----------------------------------------------------------------------
| Entry #1 | ... |
-----------------------------------------------------------------------
| key_len (2B) | key (keylen) | value_len (2B) | value (varlen) | ... |
-----------------------------------------------------------------------
```
Key length and value length are both 2 bytes, which means their maximum lengths are 65535. (Internally stored as `u16`)
We assume that keys will never be empty, and values can be empty. An empty value means that the corresponding key has been deleted in the view of other parts of the system. For the `BlockBuilder` and `BlockIterator`, we just treat the empty value as-is.
At the end of each block, we will store the offsets of each entry and the total number of entries. For example, if
the first entry is at 0th position of the block, and the second entry is at 12th position of the block.
```
-------------------------------
|offset|offset|num_of_elements|
-------------------------------
| 0 | 12 | 2 |
-------------------------------
```
The footer of the block will be as above. Each of the number is stored as `u16`.
The block has a size limit, which is `target_size`. Unless the first key-value pair exceeds the target block size, you should ensure that the encoded block size is always less than or equal to `target_size`. (In the provided code, the `target_size` here is essentially the `block_size`)
The `BlockBuilder` will produce the data part and unencoded entry offsets when `build` is called. The information will be stored in the `Block` structure. As key-value entries are stored in raw format and offsets are stored in a separate vector, this reduces unnecessary memory allocations and processing overhead when decoding data —— what you need to do is to simply copy the raw block data to the `data` vector and decode the entry offsets every 2 bytes, *instead of* creating something like `Vec<(Vec<u8>, Vec<u8>)>` to store all the key-value pairs in one block in memory. This compact memory layout is very efficient.
In `Block::encode` and `Block::decode`, you will need to encode/decode the block in the format as indicated above.
## Task 2: Block Iterator
In this task, you will need to modify:
```
src/block/iterator.rs
```
Now that we have an encoded block, we will need to implement the `StorageIterator` interface, so that the user can lookup/scan keys in the block.
`BlockIterator` can be created with an `Arc<Block>`. If `create_and_seek_to_first` is called, it will be positioned at the first key in the block. If `create_and_seek_to_key` is called, the iterator will be positioned at the first key that is `>=` the provided key. For example, if `1, 3, 5` is in a block.
```rust,no_run
let mut iter = BlockIterator::create_and_seek_to_key(block, b"2");
assert_eq!(iter.key(), b"3");
```
The above `seek 2` will make the iterator to be positioned at the next available key of `2`, which in this case is `3`.
The iterator should copy `key` from the block and store them inside the iterator (we will have key compression in the future and you will have to do so). For the value, you should only store the begin/end offset in the iterator without copying them.
When `next` is called, the iterator will move to the next position. If we reach the end of the block, we can set `key` to empty and return `false` from `is_valid`, so that the caller can switch to another block if possible.
## Test Your Understanding
* What is the time complexity of seeking a key in the block?
@@ -27,6 +100,6 @@ We do not provide reference answers to the questions, and feel free to discuss a
## Bonus Tasks
* **Backward Iterators.**
* **Backward Iterators.** You may implement `prev` for your `BlockIterator` so that you will be able to iterate the key-value pairs reversely. You may also have a variant of backward merge iterator and backward SST iterator (in the next chapter) so that your storage engine can do a reverse scan.
{{#include copyright.md}}