mini_lsm/mini-lsm-book/src/week1-04-sst.md

# Sorted String Table (SST)

![Chapter Overview](./lsm-tutorial/week1-04-overview.svg)

In this chapter, you will:

* Implement SST encoding and metadata encoding.
* Implement SST decoding and iterator.
  
## Task 1: SST Builder

## Task 2: SST Iterator

## Task 3: Block Cache

## Test Your Understanding

* What is the time complexity of seeking a key in the SST?
* Where does the cursor stop when you seek a non-existent key in your implementation?
* Is it possible (or necessary) to do in-place updates of SST files?
* An SST is usually large (i.e., 256MB). In this case, the cost of copying/expanding the `Vec` would be significant. Does your implementation allocate enough space for your SST builder in advance? How did you implement it?
* Looking at the `moka` block cache, why does it return `Arc<Error>` instead of the original `Error`?
* Does the usage of a block cache guarantee that there will be at most a fixed number of blocks in memory? For example, if you have a `moka` block cache of 4GB and block size of 4KB, will there be more than 4GB/4KB number of blocks in memory at the same time?
* Is it possible to store columnar data (i.e., a table of 100 integer columns) in an LSM engine? Is the current SST format still a good choice?
* Consider the case that the LSM engine is built on object store services (i.e., S3). How would you optimize/change the SST format/parameters and the block cache to make it suitable for such services?

We do not provide reference answers to the questions, and feel free to discuss about them in the Discord community.

## Bonus Tasks

* **Explore different SST encoding and layout.** For example, in the [Lethe](https://disc-projects.bu.edu/lethe/) paper, the author adds secondary key support to SST. Or you can use B+ Tree as the SST format instead of sorted blocks.
* **Index Blocks.**
* **Index Cache.**

{{#include copyright.md}}
move merge iterator to day 2 Signed-off-by: Alex Chi Z <iskyzh@gmail.com> 2024-01-19 12:15:01 +08:00			`# Sorted String Table (SST)`

update toc for v2 Signed-off-by: Alex Chi <iskyzh@gmail.com> 2024-01-20 11:55:10 +08:00			`![Chapter Overview](./lsm-tutorial/week1-04-overview.svg)`

			`In this chapter, you will:`

			`* Implement SST encoding and metadata encoding.`
			`* Implement SST decoding and iterator.`

			`## Task 1: SST Builder`

			`## Task 2: SST Iterator`
copyright notice Signed-off-by: Alex Chi <iskyzh@gmail.com> 2024-01-20 12:05:57 +08:00
update sst chapter outline Signed-off-by: Alex Chi <iskyzh@gmail.com> 2024-01-20 22:42:09 +08:00			`## Task 3: Block Cache`

a lot of questions Signed-off-by: Alex Chi <iskyzh@gmail.com> 2024-01-20 23:38:09 +08:00			`## Test Your Understanding`

add week 1 day 2 tutorial Signed-off-by: Alex Chi <iskyzh@gmail.com> 2024-01-21 11:56:09 +08:00			`* What is the time complexity of seeking a key in the SST?`
			`* Where does the cursor stop when you seek a non-existent key in your implementation?`
			`* Is it possible (or necessary) to do in-place updates of SST files?`
a lot of questions Signed-off-by: Alex Chi <iskyzh@gmail.com> 2024-01-20 23:38:09 +08:00			* An SST is usually large (i.e., 256MB). In this case, the cost of copying/expanding the `Vec` would be significant. Does your implementation allocate enough space for your SST builder in advance? How did you implement it?
			* Looking at the `moka` block cache, why does it return `Arc<Error>` instead of the original `Error`?
i love questions Signed-off-by: Alex Chi <iskyzh@gmail.com> 2024-01-21 00:45:10 +08:00			* Does the usage of a block cache guarantee that there will be at most a fixed number of blocks in memory? For example, if you have a `moka` block cache of 4GB and block size of 4KB, will there be more than 4GB/4KB number of blocks in memory at the same time?
			`* Is it possible to store columnar data (i.e., a table of 100 integer columns) in an LSM engine? Is the current SST format still a good choice?`
add week 1 day 2 tutorial Signed-off-by: Alex Chi <iskyzh@gmail.com> 2024-01-21 11:56:09 +08:00			`* Consider the case that the LSM engine is built on object store services (i.e., S3). How would you optimize/change the SST format/parameters and the block cache to make it suitable for such services?`
i love questions Signed-off-by: Alex Chi <iskyzh@gmail.com> 2024-01-21 00:45:10 +08:00
			`We do not provide reference answers to the questions, and feel free to discuss about them in the Discord community.`

			`## Bonus Tasks`

			`* Explore different SST encoding and layout. For example, in the [Lethe](https://disc-projects.bu.edu/lethe/) paper, the author adds secondary key support to SST. Or you can use B+ Tree as the SST format instead of sorted blocks.`
			`* Index Blocks.`
			`* Index Cache.`
a lot of questions Signed-off-by: Alex Chi <iskyzh@gmail.com> 2024-01-20 23:38:09 +08:00
copyright notice Signed-off-by: Alex Chi <iskyzh@gmail.com> 2024-01-20 12:05:57 +08:00			`{{#include copyright.md}}`