@@ -42,7 +42,7 @@ We are working on a new version of the mini-lsm tutorial that is split into 3 we
|
|||||||
| -------------- | ----------------------------------------------- | -------- | ------------ | ------- |
|
| -------------- | ----------------------------------------------- | -------- | ------------ | ------- |
|
||||||
| 1.1 | Memtables | ✅ | ✅ | ✅ |
|
| 1.1 | Memtables | ✅ | ✅ | ✅ |
|
||||||
| 1.2 | Merge Iterators | ✅ | ✅ | ✅ |
|
| 1.2 | Merge Iterators | ✅ | ✅ | ✅ |
|
||||||
| 1.3 | Block Format | ✅ | 🚧 | 🚧 |
|
| 1.3 | Block Format | ✅ | ✅ | ✅ |
|
||||||
| 1.4 | Table Format | ✅ | 🚧 | 🚧 |
|
| 1.4 | Table Format | ✅ | 🚧 | 🚧 |
|
||||||
| 1.5 | Storage Engine - Read Path | ✅ | 🚧 | 🚧 |
|
| 1.5 | Storage Engine - Read Path | ✅ | 🚧 | 🚧 |
|
||||||
| 1.6 | Storage Engine - Write Path | ✅ | 🚧 | 🚧 |
|
| 1.6 | Storage Engine - Write Path | ✅ | 🚧 | 🚧 |
|
||||||
|
|||||||
@@ -78,7 +78,7 @@ The constructor of the merge iterator takes a vector of iterators. We assume the
|
|||||||
|
|
||||||
One common pitfall is on error handling. For example,
|
One common pitfall is on error handling. For example,
|
||||||
|
|
||||||
```rust
|
```rust,no_run
|
||||||
let Some(mut inner_iter) = self.iters.peek_mut() {
|
let Some(mut inner_iter) = self.iters.peek_mut() {
|
||||||
inner_iter.next()?; // <- will cause problem
|
inner_iter.next()?; // <- will cause problem
|
||||||
}
|
}
|
||||||
@@ -116,7 +116,7 @@ In this task, you will need to modify:
|
|||||||
src/iterators/lsm_storage.rs
|
src/iterators/lsm_storage.rs
|
||||||
```
|
```
|
||||||
|
|
||||||
We are finally there -- with all iterators you have implemented, you can finally implement the `scan` interface of the LSM engine.
|
We are finally there -- with all iterators you have implemented, you can finally implement the `scan` interface of the LSM engine. You can simply construct an LSM iterator with the memtable iterators (remember to put the latest memtable at the front of the merge iterator), and your storage engine will be able to handle the scan request.
|
||||||
|
|
||||||
## Test Your Understanding
|
## Test Your Understanding
|
||||||
|
|
||||||
@@ -133,7 +133,7 @@ We are finally there -- with all iterators you have implemented, you can finally
|
|||||||
|
|
||||||
We do not provide reference answers to the questions, and feel free to discuss about them in the Discord community.
|
We do not provide reference answers to the questions, and feel free to discuss about them in the Discord community.
|
||||||
|
|
||||||
## Bonus Task
|
## Bonus Tasks
|
||||||
|
|
||||||
* **Foreground Iterator.** In this tutorial we assumed that all operations are short, so that we can hold reference to mem-table in the iterator. If an iterator is held by users for a long time, the whole mem-table (which might be 256MB) will stay in the memory even if it has been flushed to disk. To solve this, we can provide a `ForegroundIterator` / `LongIterator` to our user. The iterator will periodically create new underlying storage iterator so as to allow garbage collection of the resources.
|
* **Foreground Iterator.** In this tutorial we assumed that all operations are short, so that we can hold reference to mem-table in the iterator. If an iterator is held by users for a long time, the whole mem-table (which might be 256MB) will stay in the memory even if it has been flushed to disk. To solve this, we can provide a `ForegroundIterator` / `LongIterator` to our user. The iterator will periodically create new underlying storage iterator so as to allow garbage collection of the resources.
|
||||||
|
|
||||||
|
|||||||
@@ -9,8 +9,81 @@ In this chapter, you will:
|
|||||||
|
|
||||||
## Task 1: Block Builder
|
## Task 1: Block Builder
|
||||||
|
|
||||||
|
You have already implemented all in-memory structures for an LSM storage engine in the previous two chapters. Now it's time to build the on-disk structures. The basic unit of the on-disk structure is blocks. Blocks are usually of 4-KB size (the size may vary depending on the storage medium), which is equivalent to the page size in the operating system and the page size on an SSD. A block stores ordered key-value pairs. An SST is composed of multiple blocks. When the number of memtables exceed the system limit, it will flush the memtable as an SST. In this chapter, you will implement the encoding and decoding of a block.
|
||||||
|
|
||||||
|
In this task, you will need to modify:
|
||||||
|
|
||||||
|
```
|
||||||
|
src/block/builder.rs
|
||||||
|
src/block.rs
|
||||||
|
```
|
||||||
|
|
||||||
|
The block encoding format in our tutorial is as follows:
|
||||||
|
|
||||||
|
```plaintext
|
||||||
|
----------------------------------------------------------------------------------------------------
|
||||||
|
| Data Section | Offset Section | Extra |
|
||||||
|
----------------------------------------------------------------------------------------------------
|
||||||
|
| Entry #1 | Entry #2 | ... | Entry #N | Offset #1 | Offset #2 | ... | Offset #N | num_of_elements |
|
||||||
|
----------------------------------------------------------------------------------------------------
|
||||||
|
```
|
||||||
|
|
||||||
|
Each entry is a key-value pair.
|
||||||
|
|
||||||
|
```plaintext
|
||||||
|
-----------------------------------------------------------------------
|
||||||
|
| Entry #1 | ... |
|
||||||
|
-----------------------------------------------------------------------
|
||||||
|
| key_len (2B) | key (keylen) | value_len (2B) | value (varlen) | ... |
|
||||||
|
-----------------------------------------------------------------------
|
||||||
|
```
|
||||||
|
|
||||||
|
Key length and value length are both 2 bytes, which means their maximum lengths are 65535. (Internally stored as `u16`)
|
||||||
|
|
||||||
|
We assume that keys will never be empty, and values can be empty. An empty value means that the corresponding key has been deleted in the view of other parts of the system. For the `BlockBuilder` and `BlockIterator`, we just treat the empty value as-is.
|
||||||
|
|
||||||
|
At the end of each block, we will store the offsets of each entry and the total number of entries. For example, if
|
||||||
|
the first entry is at 0th position of the block, and the second entry is at 12th position of the block.
|
||||||
|
|
||||||
|
```
|
||||||
|
-------------------------------
|
||||||
|
|offset|offset|num_of_elements|
|
||||||
|
-------------------------------
|
||||||
|
| 0 | 12 | 2 |
|
||||||
|
-------------------------------
|
||||||
|
```
|
||||||
|
|
||||||
|
The footer of the block will be as above. Each of the number is stored as `u16`.
|
||||||
|
|
||||||
|
The block has a size limit, which is `target_size`. Unless the first key-value pair exceeds the target block size, you should ensure that the encoded block size is always less than or equal to `target_size`. (In the provided code, the `target_size` here is essentially the `block_size`)
|
||||||
|
|
||||||
|
The `BlockBuilder` will produce the data part and unencoded entry offsets when `build` is called. The information will be stored in the `Block` structure. As key-value entries are stored in raw format and offsets are stored in a separate vector, this reduces unnecessary memory allocations and processing overhead when decoding data —— what you need to do is to simply copy the raw block data to the `data` vector and decode the entry offsets every 2 bytes, *instead of* creating something like `Vec<(Vec<u8>, Vec<u8>)>` to store all the key-value pairs in one block in memory. This compact memory layout is very efficient.
|
||||||
|
|
||||||
|
In `Block::encode` and `Block::decode`, you will need to encode/decode the block in the format as indicated above.
|
||||||
|
|
||||||
## Task 2: Block Iterator
|
## Task 2: Block Iterator
|
||||||
|
|
||||||
|
In this task, you will need to modify:
|
||||||
|
|
||||||
|
```
|
||||||
|
src/block/iterator.rs
|
||||||
|
```
|
||||||
|
|
||||||
|
Now that we have an encoded block, we will need to implement the `StorageIterator` interface, so that the user can lookup/scan keys in the block.
|
||||||
|
|
||||||
|
`BlockIterator` can be created with an `Arc<Block>`. If `create_and_seek_to_first` is called, it will be positioned at the first key in the block. If `create_and_seek_to_key` is called, the iterator will be positioned at the first key that is `>=` the provided key. For example, if `1, 3, 5` is in a block.
|
||||||
|
|
||||||
|
```rust,no_run
|
||||||
|
let mut iter = BlockIterator::create_and_seek_to_key(block, b"2");
|
||||||
|
assert_eq!(iter.key(), b"3");
|
||||||
|
```
|
||||||
|
|
||||||
|
The above `seek 2` will make the iterator to be positioned at the next available key of `2`, which in this case is `3`.
|
||||||
|
|
||||||
|
The iterator should copy `key` from the block and store them inside the iterator (we will have key compression in the future and you will have to do so). For the value, you should only store the begin/end offset in the iterator without copying them.
|
||||||
|
|
||||||
|
When `next` is called, the iterator will move to the next position. If we reach the end of the block, we can set `key` to empty and return `false` from `is_valid`, so that the caller can switch to another block if possible.
|
||||||
|
|
||||||
## Test Your Understanding
|
## Test Your Understanding
|
||||||
|
|
||||||
* What is the time complexity of seeking a key in the block?
|
* What is the time complexity of seeking a key in the block?
|
||||||
@@ -27,6 +100,6 @@ We do not provide reference answers to the questions, and feel free to discuss a
|
|||||||
|
|
||||||
## Bonus Tasks
|
## Bonus Tasks
|
||||||
|
|
||||||
* **Backward Iterators.**
|
* **Backward Iterators.** You may implement `prev` for your `BlockIterator` so that you will be able to iterate the key-value pairs reversely. You may also have a variant of backward merge iterator and backward SST iterator (in the next chapter) so that your storage engine can do a reverse scan.
|
||||||
|
|
||||||
{{#include copyright.md}}
|
{{#include copyright.md}}
|
||||||
|
|||||||
@@ -5,22 +5,13 @@ mod builder;
|
|||||||
mod iterator;
|
mod iterator;
|
||||||
|
|
||||||
pub use builder::BlockBuilder;
|
pub use builder::BlockBuilder;
|
||||||
/// You may want to check `bytes::BufMut` out when manipulating continuous chunks of memory
|
|
||||||
use bytes::Bytes;
|
use bytes::Bytes;
|
||||||
pub use iterator::BlockIterator;
|
pub use iterator::BlockIterator;
|
||||||
|
|
||||||
/// A block is the smallest unit of read and caching in LSM tree.
|
/// A block is the smallest unit of read and caching in LSM tree. It is a collection of sorted key-value pairs.
|
||||||
/// It is a collection of sorted key-value pairs.
|
|
||||||
/// The `actual` storage format is as below (After `Block::encode`):
|
|
||||||
///
|
|
||||||
/// ----------------------------------------------------------------------------------------------------
|
|
||||||
/// | Data Section | Offset Section | Extra |
|
|
||||||
/// ----------------------------------------------------------------------------------------------------
|
|
||||||
/// | Entry #1 | Entry #2 | ... | Entry #N | Offset #1 | Offset #2 | ... | Offset #N | num_of_elements |
|
|
||||||
/// ----------------------------------------------------------------------------------------------------
|
|
||||||
pub struct Block {
|
pub struct Block {
|
||||||
data: Vec<u8>,
|
pub(crate) data: Vec<u8>,
|
||||||
offsets: Vec<u16>,
|
pub(crate) offsets: Vec<u16>,
|
||||||
}
|
}
|
||||||
|
|
||||||
impl Block {
|
impl Block {
|
||||||
@@ -35,6 +26,3 @@ impl Block {
|
|||||||
unimplemented!()
|
unimplemented!()
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
#[cfg(test)]
|
|
||||||
mod tests;
|
|
||||||
|
|||||||
@@ -11,8 +11,8 @@ pub struct BlockIterator {
|
|||||||
block: Arc<Block>,
|
block: Arc<Block>,
|
||||||
/// The current key, empty represents the iterator is invalid
|
/// The current key, empty represents the iterator is invalid
|
||||||
key: Vec<u8>,
|
key: Vec<u8>,
|
||||||
/// The corresponding value, can be empty
|
/// the value range from the block
|
||||||
value: Vec<u8>,
|
value_range: (usize, usize),
|
||||||
/// Current index of the key-value pair, should be in range of [0, num_of_elements)
|
/// Current index of the key-value pair, should be in range of [0, num_of_elements)
|
||||||
idx: usize,
|
idx: usize,
|
||||||
}
|
}
|
||||||
@@ -22,7 +22,7 @@ impl BlockIterator {
|
|||||||
Self {
|
Self {
|
||||||
block,
|
block,
|
||||||
key: Vec::new(),
|
key: Vec::new(),
|
||||||
value: Vec::new(),
|
value_range: (0, 0),
|
||||||
idx: 0,
|
idx: 0,
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -1 +0,0 @@
|
|||||||
//! Please copy `mini-lsm/src/block/tests.rs` here so that you can run tests.
|
|
||||||
@@ -159,6 +159,3 @@ impl SsTable {
|
|||||||
self.id
|
self.id
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
#[cfg(test)]
|
|
||||||
mod tests;
|
|
||||||
|
|||||||
@@ -1 +0,0 @@
|
|||||||
//! Please copy `mini-lsm/src/table/tests.rs` here so that you can run tests.
|
|
||||||
136
mini-lsm/src/tests/day3.rs
Normal file
136
mini-lsm/src/tests/day3.rs
Normal file
@@ -0,0 +1,136 @@
|
|||||||
|
use std::sync::Arc;
|
||||||
|
|
||||||
|
use bytes::Bytes;
|
||||||
|
|
||||||
|
use crate::block::{Block, BlockBuilder, BlockIterator};
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_block_build_single_key() {
|
||||||
|
let mut builder = BlockBuilder::new(16);
|
||||||
|
assert!(builder.add(b"233", b"233333"));
|
||||||
|
builder.build();
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_block_build_full() {
|
||||||
|
let mut builder = BlockBuilder::new(16);
|
||||||
|
assert!(builder.add(b"11", b"11"));
|
||||||
|
assert!(!builder.add(b"22", b"22"));
|
||||||
|
builder.build();
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_block_build_large_1() {
|
||||||
|
let mut builder = BlockBuilder::new(16);
|
||||||
|
assert!(builder.add(b"11", &b"1".repeat(100)));
|
||||||
|
builder.build();
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_block_build_large_2() {
|
||||||
|
let mut builder = BlockBuilder::new(16);
|
||||||
|
assert!(builder.add(b"11", b"1"));
|
||||||
|
assert!(!builder.add(b"11", &b"1".repeat(100)));
|
||||||
|
}
|
||||||
|
|
||||||
|
fn key_of(idx: usize) -> Vec<u8> {
|
||||||
|
format!("key_{:03}", idx * 5).into_bytes()
|
||||||
|
}
|
||||||
|
|
||||||
|
fn value_of(idx: usize) -> Vec<u8> {
|
||||||
|
format!("value_{:010}", idx).into_bytes()
|
||||||
|
}
|
||||||
|
|
||||||
|
fn num_of_keys() -> usize {
|
||||||
|
100
|
||||||
|
}
|
||||||
|
|
||||||
|
fn generate_block() -> Block {
|
||||||
|
let mut builder = BlockBuilder::new(10000);
|
||||||
|
for idx in 0..num_of_keys() {
|
||||||
|
let key = key_of(idx);
|
||||||
|
let value = value_of(idx);
|
||||||
|
assert!(builder.add(&key[..], &value[..]));
|
||||||
|
}
|
||||||
|
builder.build()
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_block_build_all() {
|
||||||
|
generate_block();
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_block_encode() {
|
||||||
|
let block = generate_block();
|
||||||
|
block.encode();
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_block_decode() {
|
||||||
|
let block = generate_block();
|
||||||
|
let encoded = block.encode();
|
||||||
|
let decoded_block = Block::decode(&encoded);
|
||||||
|
assert_eq!(block.offsets, decoded_block.offsets);
|
||||||
|
assert_eq!(block.data, decoded_block.data);
|
||||||
|
}
|
||||||
|
|
||||||
|
fn as_bytes(x: &[u8]) -> Bytes {
|
||||||
|
Bytes::copy_from_slice(x)
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_block_iterator() {
|
||||||
|
let block = Arc::new(generate_block());
|
||||||
|
let mut iter = BlockIterator::create_and_seek_to_first(block);
|
||||||
|
for _ in 0..5 {
|
||||||
|
for i in 0..num_of_keys() {
|
||||||
|
let key = iter.key();
|
||||||
|
let value = iter.value();
|
||||||
|
assert_eq!(
|
||||||
|
key,
|
||||||
|
key_of(i),
|
||||||
|
"expected key: {:?}, actual key: {:?}",
|
||||||
|
as_bytes(&key_of(i)),
|
||||||
|
as_bytes(key)
|
||||||
|
);
|
||||||
|
assert_eq!(
|
||||||
|
value,
|
||||||
|
value_of(i),
|
||||||
|
"expected value: {:?}, actual value: {:?}",
|
||||||
|
as_bytes(&value_of(i)),
|
||||||
|
as_bytes(value)
|
||||||
|
);
|
||||||
|
iter.next();
|
||||||
|
}
|
||||||
|
iter.seek_to_first();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_block_seek_key() {
|
||||||
|
let block = Arc::new(generate_block());
|
||||||
|
let mut iter = BlockIterator::create_and_seek_to_key(block, &key_of(0));
|
||||||
|
for offset in 1..=5 {
|
||||||
|
for i in 0..num_of_keys() {
|
||||||
|
let key = iter.key();
|
||||||
|
let value = iter.value();
|
||||||
|
assert_eq!(
|
||||||
|
key,
|
||||||
|
key_of(i),
|
||||||
|
"expected key: {:?}, actual key: {:?}",
|
||||||
|
as_bytes(&key_of(i)),
|
||||||
|
as_bytes(key)
|
||||||
|
);
|
||||||
|
assert_eq!(
|
||||||
|
value,
|
||||||
|
value_of(i),
|
||||||
|
"expected value: {:?}, actual value: {:?}",
|
||||||
|
as_bytes(&value_of(i)),
|
||||||
|
as_bytes(value)
|
||||||
|
);
|
||||||
|
iter.seek_to_key(&format!("key_{:03}", i * 5 + offset).into_bytes());
|
||||||
|
}
|
||||||
|
iter.seek_to_key(b"k");
|
||||||
|
}
|
||||||
|
}
|
||||||
Reference in New Issue
Block a user