2024-01-20 19:27:36 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								# Mini-LSM Overview
  
						 
					
						
							
								
									
										
										
										
											2022-12-24 17:13:52 -05:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2022-12-23 18:44:59 -05:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								## Overview of LSM
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								An LSM storage engine generally contains 3 parts:
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								1.  Write-ahead log to persist temporary data for recovery. 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								2.  SSTs on the disk for maintaining a tree structure. 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								3.  Mem-tables in memory for batching small writes. 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								The storage engine generally provides the following interfaces:
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								*  `Put(key, value)` : store a key-value pair in the LSM tree. 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								*  `Delete(key)` : remove a key and its corresponding value. 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								*  `Get(key)` : get the value corresponding to a key. 
						 
					
						
							
								
									
										
										
										
											2022-12-24 10:11:06 -05:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								*  `Scan(range)` : get a range of key-value pairs. 
						 
					
						
							
								
									
										
										
										
											2022-12-23 18:44:59 -05:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								To ensure persistence,
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								*  `Sync()` : ensure all the operations before `sync`  are persisted to the disk. 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								Some engines choose to combine `Put`  and `Delete`  into a single operation called `WriteBatch` , which accepts a batch
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								of key value pairs.
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								In this tutorial, we assume the LSM tree is using leveled compaction algorithm, which is commonly used in real-world
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								systems.
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2024-01-20 19:27:36 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								### Write Path
  
						 
					
						
							
								
									
										
										
										
											2022-12-23 18:44:59 -05:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2024-01-20 19:27:36 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2022-12-23 18:44:59 -05:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2024-01-20 19:27:36 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								The write path of LSM contains 4 steps:
							 
						 
					
						
							
								
									
										
										
										
											2022-12-23 18:44:59 -05:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								1.  Write the key-value pair to write-ahead log, so that it can be recovered after the storage engine crashes. 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								2.  Write the key-value pair to memtable. After (1) and (2) completes, we can notify the user that the write operation 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								   is completed.
							 
						 
					
						
							
								
									
										
										
										
											2024-01-20 19:27:36 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								3.  When a memtable is full, we will freeze them into immutable memtables, and will flush them to the disk as SST files in the background. 
						 
					
						
							
								
									
										
										
										
											2022-12-23 18:44:59 -05:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								4.  We will compact some files in some level into lower levels to maintain a good shape for the LSM tree, so that read 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								   amplification is low.
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2024-01-20 19:27:36 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								### Read Path
  
						 
					
						
							
								
									
										
										
										
											2022-12-23 18:44:59 -05:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2024-01-20 19:27:36 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2022-12-23 18:44:59 -05:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								When we want to read a key,
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								1.  We will first probe all the memtables from latest to oldest. 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								2.  If the key is not found, we will then search the entire LSM tree containing SSTs to find the data. 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2024-01-20 19:27:36 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								There are two types of read: lookup and scan. Lookup finds one key in the LSM tree, while scan iterates all keys within a range in the storage engine. We will cover both of them throughout the tutorial.
							 
						 
					
						
							
								
									
										
										
										
											2024-01-19 12:00:36 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2024-01-20 19:27:36 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								## Tutorial Structure
  
						 
					
						
							
								
									
										
										
										
											2024-01-19 12:00:36 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2024-01-20 19:27:36 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2024-01-19 12:00:36 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2024-01-20 19:27:36 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								We have 3 parts (weeks) for this tutorial. In the first week, we will focus on the storage structure and the storage format of an LSM storage engine. In the second week, we will dive into compactions in depth and implement persistence support for the storage engine. In the third week, we will implement multi-version concurrency control.
							 
						 
					
						
							
								
									
										
										
										
											2024-01-20 12:05:57 +08:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{{#include  copyright.md}}