@@ -18,9 +18,9 @@ src/compact.rs
|
||||
|
||||
Specifically, the `force_full_compaction` and `compact` function. `force_full_compaction` is the compaction trigger the decides which files to compact and update the LSM state. `compact` does the actual compaction job that merges some SST files and return a set of new SST files.
|
||||
|
||||
Your compaction implementation should take all SSTs in the storage engine, do a merge over them by using `MergeIterator`, and then use the SST builder to write the result into new files. You will need to split the SST files if the file is too large. After compaction completes, you can update the LSM state to add all the new sorted run to the first level of the LSM tree. And, you will need to remove unused files in the LSM tree. In your implementation, your SSTs should only be stored in two places: the L0 SSTs and the first level SSTs. That is to say, the `levels` structure in the LSM state should only have one vector.
|
||||
Your compaction implementation should take all SSTs in the storage engine, do a merge over them by using `MergeIterator`, and then use the SST builder to write the result into new files. You will need to split the SST files if the file is too large. After compaction completes, you can update the LSM state to add all the new sorted run to the first level of the LSM tree. And, you will need to remove unused files in the LSM tree. In your implementation, your SSTs should only be stored in two places: the L0 SSTs and the L1 SSTs. That is to say, the `levels` structure in the LSM state should only have one vector. In `LsmStorageState`, we have already initialized the LSM to have L1 in `levels` field.
|
||||
|
||||
Compaction should not block L0 flush, and therefore you should not take the state lock when merging the files. You should only take the state lock at the end of the compaction process when you update the LSM state.
|
||||
Compaction should not block L0 flush, and therefore you should not take the state lock when merging the files. You should only take the state lock at the end of the compaction process when you update the LSM state, and release the lock right after finishing modifying the states.
|
||||
|
||||
You can assume that the user will ensure there is only one compaction going on. `force_full_compaction` will be called in only one thread at any time. The SSTs being put in the level 1 should be sorted by their first key and should not have overlapping key ranges.
|
||||
|
||||
@@ -51,7 +51,7 @@ In your compaction implementation, you only need to handle `FullCompaction` for
|
||||
|
||||
Because we always compact all SSTs, if we find multiple version of a key, we can simply retain the latest one. If the latest version is a delete marker, we do not need to keep it in the produced SST files. This does not apply for the compaction strategies in the next few chapters.
|
||||
|
||||
There are some niches that you might need to think about. For example,
|
||||
There are some niches that you might need to think about.
|
||||
|
||||
* How does your implementation handle L0 flush in par with compaction? (Not taking the state lock when doing the compaction, and also need to consider new L0 files produced when compaction is going on.)
|
||||
* If your implementation removes the original SST files immediately after the compaction completes, will it cause problems in your system? (Generally no on macOS/Linux because the OS will not actually remove the file until no file handle is being held.)
|
||||
@@ -73,11 +73,14 @@ In this task, you will need to modify,
|
||||
```
|
||||
src/lsm_iterator.rs
|
||||
src/lsm_storage.rs
|
||||
src/compact.rs
|
||||
```
|
||||
|
||||
Now that we have the two-level structure for your LSM tree, and you can change your read path to use the new concat iterator to optimize the read path.
|
||||
|
||||
You will need to change the inner iterator type of the `LsmStorageIterator`. After that, you can construct a two merge iterator that merges memtables and L0 SSTs, and another merge iterator that merges that iterator with the L1 concat iterator.
|
||||
You will need to change the inner iterator type of the `LsmStorageIterator`. After that, you can construct a two merge iterator that merges memtables and L0 SSTs, and another merge iterator that merges that iterator with the L1 concat iterator.
|
||||
|
||||
You can also change your compaction implementation to leverage the concat iterator.
|
||||
|
||||
You will need to implement `num_active_iterators` for concat iterator so that the test case can test if concat iterators are being used by your implementation, and it should always be 1.
|
||||
|
||||
|
@@ -47,7 +47,7 @@ impl LsmStorageState {
|
||||
.map(|level| (level, Vec::new()))
|
||||
.collect::<Vec<_>>(),
|
||||
CompactionOptions::Tiered(_) => Vec::new(),
|
||||
CompactionOptions::NoCompaction => vec![(0, Vec::new())],
|
||||
CompactionOptions::NoCompaction => vec![(1, Vec::new())],
|
||||
};
|
||||
Self {
|
||||
memtable: Arc::new(MemTable::create(0)),
|
||||
|
76
mini-lsm/src/tests/week2_day1.rs
Normal file
76
mini-lsm/src/tests/week2_day1.rs
Normal file
@@ -0,0 +1,76 @@
|
||||
use std::ops::Bound;
|
||||
|
||||
use bytes::Bytes;
|
||||
use tempfile::tempdir;
|
||||
|
||||
use self::harness::check_iter_result;
|
||||
|
||||
use super::*;
|
||||
use crate::lsm_storage::{LsmStorageInner, LsmStorageOptions};
|
||||
|
||||
fn sync(storage: &LsmStorageInner) {
|
||||
storage
|
||||
.force_freeze_memtable(&storage.state_lock.lock())
|
||||
.unwrap();
|
||||
storage.force_flush_next_imm_memtable().unwrap();
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_task3_integration() {
|
||||
let dir = tempdir().unwrap();
|
||||
let storage = LsmStorageInner::open(&dir, LsmStorageOptions::default_for_week1_test()).unwrap();
|
||||
storage.put(b"0", b"2333333").unwrap();
|
||||
storage.put(b"00", b"2333333").unwrap();
|
||||
storage.put(b"4", b"23").unwrap();
|
||||
sync(&storage);
|
||||
|
||||
storage.delete(b"4").unwrap();
|
||||
sync(&storage);
|
||||
|
||||
storage.force_full_compaction().unwrap();
|
||||
assert!(storage.state.read().l0_sstables.is_empty());
|
||||
assert!(!storage.state.read().levels[0].1.is_empty());
|
||||
|
||||
storage.put(b"1", b"233").unwrap();
|
||||
storage.put(b"2", b"2333").unwrap();
|
||||
sync(&storage);
|
||||
|
||||
storage.put(b"00", b"2333").unwrap();
|
||||
storage.put(b"3", b"23333").unwrap();
|
||||
storage.delete(b"1").unwrap();
|
||||
// sync(&storage);
|
||||
// storage.force_full_compaction().unwrap();
|
||||
|
||||
// assert!(storage.state.read().l0_sstables.is_empty());
|
||||
// assert!(!storage.state.read().levels[0].1.is_empty());
|
||||
|
||||
check_iter_result(
|
||||
&mut storage.scan(Bound::Unbounded, Bound::Unbounded).unwrap(),
|
||||
vec![
|
||||
(Bytes::from("0"), Bytes::from("2333333")),
|
||||
(Bytes::from("00"), Bytes::from("2333")),
|
||||
(Bytes::from("2"), Bytes::from("2333")),
|
||||
(Bytes::from("3"), Bytes::from("23333")),
|
||||
],
|
||||
);
|
||||
|
||||
assert_eq!(
|
||||
storage.get(b"0").unwrap(),
|
||||
Some(Bytes::from_static(b"2333333"))
|
||||
);
|
||||
assert_eq!(
|
||||
storage.get(b"00").unwrap(),
|
||||
Some(Bytes::from_static(b"2333"))
|
||||
);
|
||||
assert_eq!(
|
||||
storage.get(b"2").unwrap(),
|
||||
Some(Bytes::from_static(b"2333"))
|
||||
);
|
||||
assert_eq!(
|
||||
storage.get(b"3").unwrap(),
|
||||
Some(Bytes::from_static(b"23333"))
|
||||
);
|
||||
assert_eq!(storage.get(b"4").unwrap(), None);
|
||||
assert_eq!(storage.get(b"--").unwrap(), None);
|
||||
assert_eq!(storage.get(b"555").unwrap(), None);
|
||||
}
|
Reference in New Issue
Block a user