explain copy-on-write strategy (#148)
* explain copy-on-write strategy * fix a typo
This commit is contained in:
@@ -33,7 +33,7 @@ crossbeam-skiplist provides similar interfaces to the Rust std's `BTreeMap`: ins
|
||||
|
||||
You will also notice that the `MemTable` structure does not have a `delete` interface. In the mini-lsm implementation, deletion is represented as a key corresponding to an empty value.
|
||||
|
||||
In this task, you will need to implement `MemTable::get` and `MemTable::put` to enable modifications of the memtable. Note that `put` should always overwrite a key if it already exists. You won't have mutiple entries of the same key in a single memtable.
|
||||
In this task, you will need to implement `MemTable::get` and `MemTable::put` to enable modifications of the memtable. Note that `put` should always overwrite a key if it already exists. You won't have multiple entries of the same key in a single memtable.
|
||||
|
||||
We use the `bytes` crate for storing the data in the memtable. `bytes::Byte` is similar to `Arc<[u8]>`. When you clone the `Bytes`, or get a slice of `Bytes`, the underlying data will not be copied, and therefore cloning it is cheap. Instead, it simply creates a new reference to the storage area and the storage area will be freed when there are no reference to that area.
|
||||
|
||||
@@ -72,6 +72,23 @@ A memtable cannot continuously grow in size, and we will need to freeze them (an
|
||||
|
||||
In this task, you will need to compute the approximate memtable size when put/delete a key in the memtable. This can be computed by simply adding the total number of bytes of keys and values when `put` is called. If a key is put twice, though the skiplist only contains the latest value, you may count it twice in the approximate memtable size. Once a memtable reaches the limit, you should call `force_freeze_memtable` to freeze the memtable and create a new one.
|
||||
|
||||
The `state: Arc<RwLock<Arc<LsmStorageState>>>` field in `LsmStorageInner` is structured this way to manage the LSM tree's overall state concurrently and safely, primarily using a Copy-on-Write (CoW) strategy:
|
||||
|
||||
1. Inner `Arc<LsmStorageState>`: This holds an **immutable snapshot** of the actual `LsmStorageState` (which contains memtable lists, SST references, etc.). Cloning this `Arc` is very cheap (just an atomic reference count increment) and gives any reader a consistent, unchanging view of the state for the duration of their operation.
|
||||
|
||||
2. `RwLock<Arc<LsmStorageState>>`: This read-write lock protects the *pointer* to the current `Arc<LsmStorageState>` (the active snapshot).
|
||||
* **Readers** acquire a read lock, clone the `Arc<LsmStorageState>` (getting their own reference to the current snapshot), and then quickly release the read lock. They can then work with their snapshot without further locking.
|
||||
* **Writers** (when modifying the state, e.g., freezing a memtable) will:
|
||||
* Create a *new* `LsmStorageState` instance, often by cloning the data from the current snapshot and then applying modifications.
|
||||
* Wrap this new state in a new `Arc<LsmStorageState>`.
|
||||
* Acquire the write lock on the `RwLock`.
|
||||
* Replace the old `Arc<LsmStorageState>` with the new one.
|
||||
* Release the write lock.
|
||||
|
||||
3. Outer `Arc<RwLock<...>>`: This allows the `RwLock` itself (and thus the mechanism for accessing and updating the state) to be shared safely across multiple threads or parts of your application that might need to interact with `LsmStorageInner`.
|
||||
|
||||
This CoW approach ensures that readers always see a valid, consistent state snapshot and experience minimal blocking. Writers update the state atomically by swapping out the entire state snapshot, reducing the time critical locks are held and thus improving concurrency.
|
||||
|
||||
Because there could be multiple threads getting data into the storage engine, `force_freeze_memtable` might be called concurrently from multiple threads. You will need to think about how to avoid race conditions in this case.
|
||||
|
||||
There are multiple places where you may want to modify the LSM state: freeze a mutable memtable, flush memtable to SST, and GC/compaction. During all of these modifications, there could be I/O operations. An intuitive way to structure the locking strategy is to:
|
||||
|
Reference in New Issue
Block a user