docs: update the introduction of StorageIterator (#152)

2025-06-05 15:56:56 +08:00
parent fc4765b925
commit 067fd2e682
2 changed files with 24 additions and 4 deletions
--- a/mini-lsm-book/src/week1-02-merge-iterator.md
+++ b/mini-lsm-book/src/week1-02-merge-iterator.md
@@ -29,9 +29,30 @@ In this task, you will need to modify:
 src/mem_table.rs
 ```

-All LSM iterators implement the `StorageIterator` trait. It has 4 functions: `key`, `value`, `next`, and `is_valid`. When the iterator is created, its cursor will stop on some element, and `key` / `value` will return the first key in the memtable/block/SST satisfying the start condition (i.e., start key). These two interfaces will return a `&[u8]` to avoid copy. Note that this iterator interface is different from the Rust-style iterator.
+All LSM iterators implement the `StorageIterator` trait. It has 4 functions: `key`, `value`, `next`, and `is_valid`. If you're familiar with Rust's standard library `Iterator` trait, you might find `StorageIterator` a bit different. Instead, `StorageIterator` employs a cursor-based API, a design pattern common in database systems and notably inspired by RocksDB's iterators (see [`iterator_base.h`](https://github.com/facebook/rocksdb/blob/main/include/rocksdb/iterator_base.h) and [`iterator.h`](https://github.com/facebook/rocksdb/blob/main/include/rocksdb/iterator.h) for reference).

-`next` moves the cursor to the next place. `is_valid` returns if the iterator has reached the end or errored. You can assume `next` will only be called when `is_valid` returns true. There will be a `FusedIterator` wrapper for iterators that block calls to `next` when the iterator is not valid to avoid users from misusing the iterators.
+When the iterator is created, its cursor will stop on some element, and `key` / `value` will return the first key in the memtable/block/SST satisfying the start condition (i.e., start key). These two interfaces will return a `&[u8]` to avoid copy.
+
+From the caller's perspective, the typical usage pattern is:
+
+```rust
+let mut iter: impl StorageIterator = ...;
+while iter.is_valid() {
+    let key = iter.key();
+    let value = iter.value();
+    // Process key and value
+    iter.next()?; // Advance to the next item, handling potential errors
+}
+```
+
+The semantics of `StorageIterator` are distinct for its core methods:
+
+* `next()`: This method is solely responsible for attempting to move the cursor to the next element. It returns a `Result` to report any errors encountered during this advancement (e.g., I/O issues). It does *not* inherently guarantee that the new position is valid, only that the attempt to move was made.
+* `is_valid()`: This method indicates whether the iterator's current cursor points to a valid data element. It does *not* advance the iterator.
+
+Therefore, as an implementer of `StorageIterator`, after each call to `next()` (even if it succeeds without an error from the `next()` operation itself), you are responsible for updating the internal state so that `is_valid()` correctly reflects whether the new cursor position actually points to a valid item.
+
+In summary, `next` moves the cursor to the next place. `is_valid` returns if the iterator has reached the end or errored. You can assume `next` will only be called when `is_valid` returns true. There will be a `FusedIterator` wrapper for iterators that block calls to `next` when the iterator is not valid to avoid users from misusing the iterators.

 Back to the memtable iterator. You should have found out that the iterator does not have any lifetime associated with that. Imagine that you create a `Vec<u64>` and call `vec.iter()`, the iterator type will be something like `VecIterator<'a>`, where `'a` is the lifetime of the `vec` object. The same applies to `SkipMap`, where its `iter` API returns an iterator with a lifetime. However, in our case, we do not want to have such lifetimes on our iterators to avoid making the system overcomplicated (and hard to compile...).

--- a/mini-lsm-book/src/week1-04-sst.md
+++ b/mini-lsm-book/src/week1-04-sst.md
@@ -11,7 +11,6 @@ In this chapter, you will:
 * Implement SST encoding and metadata encoding.
 * Implement SST decoding and iterator.
  
-
 To copy the test cases into the starter code and run them,

 ```
@@ -84,7 +83,7 @@ src/table.rs

 You can implement a new `read_block_cached` function on `SsTable` .

-We use `moka-rs` as our block cache implementation. Blocks are cached by `(sst_id, block_id)` as the cache key. You may use `try_get_with` to get the block from cache if it hits the cache / populate the cache if it misses the cache. If there are multiple requests reading the same block and cache misses, `try_get_with` will only issue a single read request to the disk and broadcast the result to all requests.
+We use [`moka-rs`](https://docs.rs/moka/latest/moka/) as our block cache implementation. Blocks are cached by `(sst_id, block_id)` as the cache key. You may use `try_get_with` to get the block from cache if it hits the cache / populate the cache if it misses the cache. If there are multiple requests reading the same block and cache misses, `try_get_with` will only issue a single read request to the disk and broadcast the result to all requests.

 At this point, you may change your table iterator to use `read_block_cached` instead of `read_block` to leverage the block cache.