|
|
|
|
@@ -7,6 +7,7 @@ In this part, you will need to modify:
|
|
|
|
|
|
|
|
|
|
* `src/lsm_iterator.rs`
|
|
|
|
|
* `src/lsm_storage.rs`
|
|
|
|
|
* `src/table.rs`
|
|
|
|
|
* Other parts that use `SsTable::read_block`
|
|
|
|
|
|
|
|
|
|
You can use `cargo x copy-test day4` to copy our provided test cases to the starter code directory. After you have
|
|
|
|
|
@@ -16,10 +17,69 @@ test cases, write a new module `#[cfg(test)] mod user_tests { /* your test cases
|
|
|
|
|
|
|
|
|
|
## Task 1 - Put and Delete
|
|
|
|
|
|
|
|
|
|
Before implementing put and delete, let's revisit how LSM tree works. The structure of LSM includes:
|
|
|
|
|
|
|
|
|
|
* Mem-table: one active mutable mem-table and multiple immutable mem-tables.
|
|
|
|
|
* Write-ahead log: each mem-table corresponds to a WAL.
|
|
|
|
|
* SSTs: mem-table can be flushed to the disk in SST format. SSTs are organized in multiple levels.
|
|
|
|
|
|
|
|
|
|
In this part, we only need to take the lock, write the entry (or tombstone) into the active mem-table. You can modify
|
|
|
|
|
`lsm_storage.rs`.
|
|
|
|
|
|
|
|
|
|
## Task 2 - Get
|
|
|
|
|
|
|
|
|
|
To get a value from the LSM, we can simply probe from active memtable, immutable memtables (from latest to earliest),
|
|
|
|
|
and all the SSTs. To reduce the critical section, we can hold the read lock to copy all the pointers to mem-tables and
|
|
|
|
|
SSTs out of the `LsmStorageInner` structure, and create iterators out of the critical section. Be careful about the
|
|
|
|
|
order when creating iterators and probing.
|
|
|
|
|
|
|
|
|
|
## Task 3 - Scan
|
|
|
|
|
|
|
|
|
|
To create a scan iterator `LsmIterator`, you will need to use `TwoMergeIterator` to merge `MergeIterator` on mem-table
|
|
|
|
|
and `MergeIterator` on SST. You can implement this in `lsm_iterator.rs`. Optionally, you can implement `FusedIterator`
|
|
|
|
|
so that if a user accidentally calls `next` after the iterator becomes invalid, the underlying iterator won't panic.
|
|
|
|
|
|
|
|
|
|
The sequence of key-value pairs produced by `TwoMergeIterator` may contain empty value, which means that the value is
|
|
|
|
|
deleted. `LsmIterator` should filter these empty values. Also it needs to correctly handle the start and end bounds.
|
|
|
|
|
|
|
|
|
|
## Task 4 - Sync
|
|
|
|
|
|
|
|
|
|
In this part, we will implement mem-tables and flush to L0 SSTs in `lsm_storage.rs`. As in task 1, write operations go
|
|
|
|
|
directly into the active mutable mem-table. Once `sync` is called, we flush SSTs to the disk in two steps:
|
|
|
|
|
|
|
|
|
|
* Firstly, move the current mutable mem-table to immutable mem-table list, so that no future requests will go into the
|
|
|
|
|
current mem-table. Create a new mem-table. All of these should happen in one single critical section and stall all
|
|
|
|
|
reads.
|
|
|
|
|
* Then, we can flush the mem-table to disk as an SST file without holding any lock.
|
|
|
|
|
* Finally, in one critical section, remove the mem-table and put the SST into `l0_tables`.
|
|
|
|
|
|
|
|
|
|
Only one thread can sync at a time, and therefore you should use a mutex to ensure this requirement.
|
|
|
|
|
|
|
|
|
|
## Task 5 - Block Cache
|
|
|
|
|
|
|
|
|
|
Now that we have implemented the LSM structure, we can start writing something to the disk! Previously in `table.rs`,
|
|
|
|
|
we implemented a `FileObject` struct, without writing anything to disk. In this task, we will change the implementation
|
|
|
|
|
so that:
|
|
|
|
|
|
|
|
|
|
* `read` will read from the disk without any caching using `read_exact_at` in `std::os::unix::fs::FileExt`.
|
|
|
|
|
* The size of the file should be stored inside the struct, and `size` function directly returns it.
|
|
|
|
|
* `create` should write the file to the disk. Generally you should call `fsync` on that file. But this would slow down
|
|
|
|
|
unit tests a lot. Therefore, we don't do fsync until day 6 recovery.
|
|
|
|
|
* `open` remains unimplemented until day 6 recovery.
|
|
|
|
|
|
|
|
|
|
After that, we can implement a new `read_block_cached` function on `SsTable` so that we can leverage block cache to
|
|
|
|
|
serve read requests. Upon initializing the `LsmStorage` struct, you should create a block cache of 4GB size using
|
|
|
|
|
`moka-rs`. Blocks are cached by SST id + block id. Use `try_get_with` to get the block from cache / populate the cache
|
|
|
|
|
if cache miss. If there are multiple requests reading the same block and cache misses, `try_get_with` will only issue a
|
|
|
|
|
single read request to the disk and broadcast the result to all requests.
|
|
|
|
|
|
|
|
|
|
Remember to change `SsTableIterator` to use the block cache.
|
|
|
|
|
|
|
|
|
|
## Extra Tasks
|
|
|
|
|
|
|
|
|
|
* As you might have seen, each time we do a put or deletion, we will need to take a write lock protecting the LSM
|
|
|
|
|
structure. This can cause a lot of problems. Some lock implementations are fair, which means as long as there is a
|
|
|
|
|
writer waiting on the lock, no reader can take the lock. Therefore, the writer will wait until the slowest reader
|
|
|
|
|
finishes its operation before it can actually do some work. One possible optimization is to implement `WriteBatch`.
|
|
|
|
|
We don't need to immediately write users' requests into mem-table + WAL. We can allow users to do a batch of writes.
|
|
|
|
|
* Align blocks to 4K and use direct I/O.
|
|
|
|
|
|