finish 3.2 3.3

Signed-off-by: Alex Chi Z <iskyzh@gmail.com>
2024-01-29 22:44:49 +08:00
parent 3cecf09d59
commit 1e49ba8a07
5 changed files with 184 additions and 17 deletions
--- a/mini-lsm-book/src/week3-03-snapshot-read-part-2.md
+++ b/mini-lsm-book/src/week3-03-snapshot-read-part-2.md
@@ -8,21 +8,92 @@ In this chapter, you will:

 At the end of the day, your engine will be able to give the user a consistent view of the storage key space.

-## Task 1: Lsm Iterator with Read Timestamp
+During the refactor, you might need to change the signature of some functions from `&self` to `self: &Arc<Self>` as necessary.
+
+## Task 1: LSM Iterator with Read Timestamp
+
+The goal of this chapter is to have something like:
+
+```rust,no_run
+let snapshot1 = engine.new_txn();
+// write something to the engine
+let snapshot2 = engine.new_txn();
+// write something to the engine
+snapshot1.get(/* ... */); // we can retrieve a consistent snapshot of a previous state of the engine
+```
+
+To achieve this, we can record the read timestamp (which is the latest committed timestamp) when creating the transaction. When we do a read operation over the transaction, we will only read all versions of the keys below or equal to the read timestamp.
+
+In this task, you will need to modify:
+
+```
+src/lsm_iterator.rs
+```
+
+To do this, you will need to record a read timestamp in `LsmIterator`.
+
+```rust,no_run
+impl LsmIterator {
+    pub(crate) fn new(
+        iter: LsmIteratorInner,
+        end_bound: Bound<Bytes>,
+        read_ts: u64,
+    ) -> Result<Self> {
+        // ...
+    }
+}
+```
+
+And you will need to change your LSM iterator `next` logic to find the correct key.

 ## Task 2: Multi-Version Scan and Get

-For now, inner = `Fused<LsmIterator>`, do not use `TxnLocalIterator`
+In this task, you will need to modify:

-explain why store txn inside iterator
+```
+src/mvcc.rs
+src/mvcc/txn.rs
+src/lsm_storage.rs
+```

-do not implement put and delete
+Now that we have `read_ts` in the LSM iterator, we can implement `scan` and `get` on the transaction structure, so that we can read data at a given point in the storage engine.
+
+We recommend you to create helper functions like `scan_with_ts(/* original parameters */, read_ts: u64)` and `get_with_ts` if necessary in your `LsmStorageInner` structure. The original get/scan on the storage engine should be implemented as creating a transaction (snapshot) and do a get/scan over that transaction. The call path would be like:
+
+```
+LsmStorageInner::scan -> new_txn and Transaction::scan -> LsmStorageInner::scan_with_ts
+```
+
+To create a transaction in `LsmStorageInner::scan`, we will need to provide a `Arc<LsmStorageInner>` to the transaction constructor. Therefore, we can change the signature of `scan` to take `self: &Arc<Self>` instead of simply `&self`, so that we can create a transaction with `let txn = self.mvcc().new_txn(self.clone(), /* ... */)`.
+
+You will also need to change your `scan` function to return a `TxnIterator`. We must ensure the snapshot is live when the user iterates the engine, and therefore, `TxnIterator` stores the snapshot object. Inside `TxnIterator`, we can store a `FusedIterator<LsmIterator>` for now. We will change it to something else later when we implement OCC.
+
+You do not need to implement `Transaction::put/delete` for now, and all modifications will still go through the engine.

 ## Task 3: Store Largest Timestamp in SST

+In this task, you will need to modify:
+
+```
+src/table.rs
+src/table/builder.rs
+```
+
+In your SST encoding, you should store the largest timestamp after the block metadata, and recover it when loading the SST. This would help the system decide the latest commit timestamp when recovering the system.
+
 ## Task 4: Recover Commit Timestamp

-We do not have test cases for this section. You should pass all persistence tests from previous chapters (2.5 and 2.6) after finishing this section.
+Now that we have largest timestamp information in the SSTs and timestamp information in the WAL, we can obtain the largest timestamp committed before the engine starts, and use that timestamp as the latest committed timestamp when creating the `mvcc` object.
+
+If WAL is not enabled, you can simply compute the latest committed timestamp by finding the largest timestamp among SSTs. If WAL is enabled, you should further iterate all recovered memtables and find the largest timestamp.
+
+In this task, you will need to modify:
+
+```
+src/lsm_storage.rs
+```
+
+We do not have test cases for this section. You should pass all persistence tests from previous chapters (including 2.5 and 2.6) after finishing this section.

 ## Test Your Understanding

@@ -31,5 +102,4 @@ We do not have test cases for this section. You should pass all persistence test

 We do not provide reference answers to the questions, and feel free to discuss about them in the Discord community.

-
 {{#include copyright.md}}