docs: improve day 2 documentation & add guidance for task 1 (#19)

* feat(docs): improve day 2 documentation & add guidance for task 1

* Apply suggestions from code review

---------

Co-authored-by: Alex Chi Z <iskyzh@gmail.com>
This commit is contained in:
Xu
2023-07-11 12:08:03 +08:00
committed by GitHub
parent a5ac71c99f
commit 26b8e6c7d8

View File

@@ -24,12 +24,17 @@ The SST builder is similar to block builder -- users will call `add` on the buil
inside SST builder and split block when necessary. Also, you will need to maintain block metadata `BlockMeta`, which inside SST builder and split block when necessary. Also, you will need to maintain block metadata `BlockMeta`, which
includes the first key in each block and the offset of each block. The `build` function will encode the SST, write includes the first key in each block and the offset of each block. The `build` function will encode the SST, write
everything to disk using `FileObject::create`, and return an `SsTable` object. Note that in part 2, you don't need to everything to disk using `FileObject::create`, and return an `SsTable` object. Note that in part 2, you don't need to
actually write the data to the disk. Just store everything in memory as a vector until we implement a block cache. actually write the data to the disk.
Just store everything in memory as a vector until we implement a block cache (Day 4, Task 5).
The encoding of SST is like: The encoding of SST is like:
``` ```
| data block | data block | data block | data block | meta block | meta block offset (u32) | -------------------------------------------------------------------------------------------
| Block Section | Meta Section | Extra |
-------------------------------------------------------------------------------------------
| data block | ... | data block | meta block | ... | meta block | meta block offset (u32) |
-------------------------------------------------------------------------------------------
``` ```
You also need to implement `estimated_size` function of `SsTableBuilder`, so that the caller can know when can it start You also need to implement `estimated_size` function of `SsTableBuilder`, so that the caller can know when can it start
@@ -39,6 +44,17 @@ more data than meta block, we can simply return the size of data blocks for `est
You can also align blocks to 4KB boundary so as to make it possible to do direct I/O in the future. This is an optional You can also align blocks to 4KB boundary so as to make it possible to do direct I/O in the future. This is an optional
optimization. optimization.
The recommend sequence to finish **Task 1** is as below:
- Implement `SsTableBuilder` in `src/table/builder.rs`
- Before implementing `SsTableBuilder`, you may want to take a look in `src/table.rs`, for `FileObject` & `BlockMeta`.
- For `FileObject`, you should at least implement `read`, `size` and `create` (No need for Disk I/O) before day 4.
- For `BlockMeta`, you may want to add some extra fields when encoding / decoding the `BlockMeta` to / from a buffer.
- Implement `SsTable` in `src/table.rs`
- Same as above, you do not need to worry about `BlockCache` until day 4.
After finishing **Task 1**, you should be able to pass all the current tests except two iterator tests.
## Task 2 - SST Iterator ## Task 2 - SST Iterator
Like `BlockIteartor`, you will need to implement an iterator over an SST. Note that you should load data on demand. For Like `BlockIteartor`, you will need to implement an iterator over an SST. Note that you should load data on demand. For
@@ -53,15 +69,18 @@ which block might possibly contain the key. It is possible that the key doesn't
block iterator will be invalid immediately after a seek. For example, block iterator will be invalid immediately after a seek. For example,
``` ```
| block 1 | block 2 | block meta | ----------------------------------
| block 1 | block 2 | block meta |
----------------------------------
| a, b, c | e, f, g | 1: a, 2: e | | a, b, c | e, f, g | 1: a, 2: e |
----------------------------------
``` ```
If we do `seek(b)` in this SST, it is quite simple -- using binary search, we can know block 1 contains keys `a <= keys If we do `seek(b)` in this SST, it is quite simple -- using binary search, we can know block 1 contains keys `a <= keys
< e`. Therefore, we load block 1 and seek the block iterator to the corresponding position. < e`. Therefore, we load block 1 and seek the block iterator to the corresponding position.
But if we do `seek(d)`, we will position to block 1, but seeking `d` in block 1 will reach the end of the block. But if we do `seek(d)`, we will position to block 1, but seeking `d` in block 1 will reach the end of the block.
Therefore, we should check if the iterator is invalid after seek, and switch to the next block if necessary. Therefore, we should check if the iterator is invalid after the seek, and switch to the next block if necessary.
## Extra Tasks ## Extra Tasks