diff --git a/mini-lsm-book/src/00-get-started.md b/mini-lsm-book/src/00-get-started.md index 4d32f0a..d1603bf 100644 --- a/mini-lsm-book/src/00-get-started.md +++ b/mini-lsm-book/src/00-get-started.md @@ -1,4 +1,4 @@ -# Get Started +# Environment Setup The starter code and reference solution is available at [https://github.com/skyzh/mini-lsm](https://github.com/skyzh/mini-lsm). diff --git a/mini-lsm-book/src/00-overview.md b/mini-lsm-book/src/00-overview.md index 602cc20..93c0f3a 100644 --- a/mini-lsm-book/src/00-overview.md +++ b/mini-lsm-book/src/00-overview.md @@ -1,44 +1,4 @@ -# Overview - - - -In this tutorial, you will learn how to build a simple LSM-Tree storage engine in the Rust programming language. - -## What is LSM, and Why LSM? - -Log-structured merge tree is a data structure to maintain key-value pairs. This data structure is widely used in -distributed database systems like [TiDB](https://www.pingcap.com) and [CockroachDB](https://www.cockroachlabs.com) as -their underlying storage engine. [RocksDB](http://rocksdb.org), based on [LevelDB](https://github.com/google/leveldb), -is an implementation of LSM-Tree storage engine. It provides a wide range of key-value access functionalities and is -used in a lot of production systems. - -Generally speaking, LSM Tree is an append-friendly data structure. It is more intuitive to compare LSM to other -key-value data structure like RB-Tree and B-Tree. For RB-Tree and B-Tree, all data operations are in-place. That is to -say, when you update the value corresponding to the key, the value will be overwritten at its original memory or disk -space. But in an LSM Tree, all write operations, i.e., insertions, updates, deletions, are performed in somewhere else. -These operations will be batched into SST (sorted string table) files and be written to the disk. Once written to the -disk, the file will not be changed. These operations are applied lazily on disk with a special task called compaction. -The compaction job will merge multiple SST files and remove unused data. - -This architectural design makes LSM tree easy to work with. - -1. Data are immutable on persistent storage, which means that it is easier to offload the background tasks (compaction) - to remote servers. It is also feasible to directly store and serve data from cloud-native storage systems like S3. -2. An LSM tree can balance between read, write and space amplification by changing the compaction algorithm. The data - structure itself is super versatile and can be optimized for different workloads. - -In this tutorial, we will learn how to build an LSM-Tree-based storage engine in the Rust programming language. - -## Prerequisites of this Tutorial - -* You should know the basics of the Rust programming language. Reading [the Rust book](https://doc.rust-lang.org/book/) - is enough. -* You should know the basic concepts of key-value storage engines, i.e., why we need somehow complex design to achieve - persistence. If you have no experience with database systems and storage systems before, you can implement Bitcask - in [PingCAP Talent Plan](https://github.com/pingcap/talent-plan/tree/master/courses/rust/projects/project-2). -* Knowing the basics of an LSM tree is not a requirement but we recommend you to read something about it, e.g., the - overall idea of LevelDB. This would familiarize you with concepts like mutable and immutable mem-tables, SST, - compaction, WAL, etc. +# Mini-LSM Overview ## Overview of LSM @@ -65,95 +25,34 @@ of key value pairs. In this tutorial, we assume the LSM tree is using leveled compaction algorithm, which is commonly used in real-world systems. -## Write Flow +### Write Path - + -The write flow of LSM contains 4 steps: +The write path of LSM contains 4 steps: 1. Write the key-value pair to write-ahead log, so that it can be recovered after the storage engine crashes. 2. Write the key-value pair to memtable. After (1) and (2) completes, we can notify the user that the write operation is completed. -3. When a memtable is full, we will flush it to the disk as an SST file in the background. +3. When a memtable is full, we will freeze them into immutable memtables, and will flush them to the disk as SST files in the background. 4. We will compact some files in some level into lower levels to maintain a good shape for the LSM tree, so that read amplification is low. -## Read Flow +### Read Path - + When we want to read a key, 1. We will first probe all the memtables from latest to oldest. 2. If the key is not found, we will then search the entire LSM tree containing SSTs to find the data. -## Community +There are two types of read: lookup and scan. Lookup finds one key in the LSM tree, while scan iterates all keys within a range in the storage engine. We will cover both of them throughout the tutorial. -You may join skyzh's Discord server and study with the mini-lsm community. +## Tutorial Structure -[](https://skyzh.dev/join/discord) + -## About the Author - -As of writing (at the end of 2022), Chi is a first-year master's student in Carnegie Mellon University. He has 5 years' -experience with the Rust programming language since 2018. He has been working on a variety of database systems including -[TiKV][db1], [AgateDB][db2], [TerarkDB][db3], [RisingLight][db4], and [RisingWave][db5]. In his first semester in CMU, -he worked as a teaching assistant for CMU's [15-445/645 Intro to Database Systems][15445-course] course, where he built -a new SQL processing layer for the [BusTub][bustub] educational database system, added more query optimization stuff into -the course, and made the course [more challenging than ever before][tweet]. Chi is interested in exploring how the Rust -programming language can fit in the database world. Check out his [previous tutorial][type-exercise] on building a -vectorized expression framework if you are also interested in that topic. - -[db1]: https://github.com/tikv/tikv -[db2]: https://github.com/tikv/agatedb -[db3]: https://github.com/bytedance/terarkdb -[db4]: https://github.com/risinglightdb/risinglight -[db5]: https://github.com/risingwavelabs/risingwave -[15445-course]: https://15445.courses.cs.cmu.edu/fall2022/ -[tweet]: https://twitter.com/andy_pavlo/status/1598137241016360961 -[type-exercise]: https://github.com/skyzh/type-exercise-in-rust -[bustub]: https://github.com/cmu-db/bustub - - +We have 3 parts (weeks) for this tutorial. In the first week, we will focus on the storage structure and the storage format of an LSM storage engine. In the second week, we will dive into compactions in depth and implement persistence support for the storage engine. In the third week, we will implement multi-version concurrency control. {{#include copyright.md}} diff --git a/mini-lsm-book/src/00-preface.md b/mini-lsm-book/src/00-preface.md new file mode 100644 index 0000000..906f439 --- /dev/null +++ b/mini-lsm-book/src/00-preface.md @@ -0,0 +1,98 @@ +# Preface + + + +In this tutorial, you will learn how to build a simple LSM-Tree storage engine in the Rust programming language. + +## What is LSM, and Why LSM? + +Log-structured merge tree is a data structure to maintain key-value pairs. This data structure is widely used in +distributed database systems like [TiDB](https://www.pingcap.com) and [CockroachDB](https://www.cockroachlabs.com) as +their underlying storage engine. [RocksDB](http://rocksdb.org), based on [LevelDB](https://github.com/google/leveldb), +is an implementation of LSM-Tree storage engine. It provides a wide range of key-value access functionalities and is +used in a lot of production systems. + +Generally speaking, LSM Tree is an append-friendly data structure. It is more intuitive to compare LSM to other +key-value data structure like RB-Tree and B-Tree. For RB-Tree and B-Tree, all data operations are in-place. That is to +say, when you update the value corresponding to the key, the value will be overwritten at its original memory or disk +space. But in an LSM Tree, all write operations, i.e., insertions, updates, deletions, are performed in somewhere else. +These operations will be batched into SST (sorted string table) files and be written to the disk. Once written to the +disk, the file will not be changed. These operations are applied lazily on disk with a special task called compaction. +The compaction job will merge multiple SST files and remove unused data. + +This architectural design makes LSM tree easy to work with. + +1. Data are immutable on persistent storage, which means that it is easier to offload the background tasks (compaction) + to remote servers. It is also feasible to directly store and serve data from cloud-native storage systems like S3. +2. An LSM tree can balance between read, write and space amplification by changing the compaction algorithm. The data + structure itself is super versatile and can be optimized for different workloads. + +In this tutorial, we will learn how to build an LSM-Tree-based storage engine in the Rust programming language. + +## Prerequisites + +* You should know the basics of the Rust programming language. Reading [the Rust book](https://doc.rust-lang.org/book/) + is enough. +* You should know the basic concepts of key-value storage engines, i.e., why we need somehow complex design to achieve + persistence. If you have no experience with database systems and storage systems before, you can implement Bitcask + in [PingCAP Talent Plan](https://github.com/pingcap/talent-plan/tree/master/courses/rust/projects/project-2). +* Knowing the basics of an LSM tree is not a requirement but we recommend you to read something about it, e.g., the + overall idea of LevelDB. This would familiarize you with concepts like mutable and immutable mem-tables, SST, + compaction, WAL, etc. + +## What should you expect from this tutorial... + +After learning this course, you should have a deep understanding of how a LSM-based storage system works, gain hands-on experience of designing such systems, and apply what you have learned in your study and career. You will become an expert of LSM storage systems, understand the design tradeoffs in such storage systems, and find optimal ways to design a LSM-based storage system to meet your workload requirements/goals. This is a very in-depth tutorial that covers all the important implementation details and design choices of modern storage systems (i.e., RocksDB) based on the author's experience in several LSM-like storage systems, and you will be able to directly apply what you have learned in both industry and academia. + +### Structure + +The tutorial is a large course that is split into several parts (weeks). Each week usually has seven chapters, and each of the chapter can be finished within 2-3 hours. The first six chapters of each part will instruct you to build a working system, and the last chapter of each week will be a *snack time* chapter that implements some easy things over what you have built in the previous six days. In each chapter, there will be required tasks, *check you understanding* questions, and bonus tasks. + +### Testing + +We provide full test suite and some cli tools for you to validate if your solution is correct. Note that the test suite is not exhaustive, and your solution might not be 100% correct after passing all test cases. You might need to fix earlier bugs when implementing later parts of the system. We recommend you to think thoroughly about your implementation, especially when there are multi-thread operations and race conditions. + +### Solution + +We have a solution that implements all the functionalities as required in the tutorial in the mini-lsm main repo. At the same time, we also have a mini-lsm solution checkpoint repo where each commit corresponds to a chapter in the tutorial. + +Keeping such checkpoint repo up-to-date to the mini-lsm tutorial is hard because each bug fix or new feature will need to go through all commits (or checkpoints). Therefore, this repo might not be using the latest starter code or incorporating the latest features from the mini-lsm tutorial. + +**TL;DR: We do not guarantee the solution checkpoint repo contains a correct solution, passes all tests, or has the correct doc comments.** For a correct implementation and the solution after implementing all things, please take a look at the solution in the main repo instead. [https://github.com/skyzh/mini-lsm/tree/main/mini-lsm](https://github.com/skyzh/mini-lsm/tree/main/mini-lsm). + +If you are stuck at some part of the tutorial or do not know where to implement a functionality, you can refer to this repo for help. You may compare the diff between commits to know what has been changed. Some functions in the mini-lsm tutorial might be changed multiple times throughout the chapters, and you can know what exactly are expected to be implemented for each chapter in this repo. + +You may access the solution checkpoint repo at [https://github.com/skyzh/mini-lsm-solution-checkpoint](https://github.com/skyzh/mini-lsm-solution-checkpoint). + +### Feedbacks + +Your feedback is greatly appreciated. We have rewritten the whole course from scratch in 2024 based on the feedbacks from the students. We hope you can share your learning experience and help us continuously improve the learning experience. Please join the [Discord community](https://skyzh.dev/join/discord) to share your experience. + +The tutorial was originally planned as a general guidance that students start from an empty directory and implement whatever they want based on the specification we had. We had minimal tests that checks if the behavior is correct. However, the original tutorial is too open-ended that caused huge obstacles with the learning experience. As students do not have an overview of the whole system beforehand and the instructions are kind of vague, sometimes it is hard for the students to know why a design decision is made and what they need to achieve a goal. And some part of the course is too compact that it is impossible to deliver expected contents within just one chapter. Therefore, we completely redesigned the course to have a easier learning curve and clearer learning goals. The original one-week tutorial is now split into two weeks (first week on storage format, and second week on deep-dive compaction), with an extra part on MVCC. We hope you find this course interesting and helpful in your study and career. We would like to thank everyone who commented in [Feedback after coding day 1](https://github.com/skyzh/mini-lsm/issues/11) and [Hello, when is the next update plan for the tutorial?](https://github.com/skyzh/mini-lsm/issues/7) -- your feedback greatly helped us improve the course. + +### License + +The source code of this course is licensed under Apache 2.0, while the author owns the full copyright of the tutorial itself (markdown files + figures). + +### Will this tutorial be free forever? + +Yes! Everything publicly available now will be free forever and will receive lifetime updates and bug fixes. Meanwhile, we might provide paid code review and office hour services in the future. For the DLC part (*rest of your life* chapters), we do not have plans to finish them as of 2024, and have not decided whether they will be public available or not. + +## Community + +You may join skyzh's Discord server and study with the mini-lsm community. + +[](https://skyzh.dev/join/discord) + + +## About the Author + +As of writing (at the beginning of 2024), Chi obtained his master's degree in Computer Science from Carnegie Mellon University and his bachelor's degree from Shanghai Jiao Tong University. He has been working on a variety of database systems including [TiKV][db1], [AgateDB][db2], [TerarkDB][db3], [RisingWave][db4], and [Neon][db5]. Since 2022, he worked as a teaching assistant for [CMU's Database Systems course](https://15445.courses.cs.cmu) for three semesters on the BusTub educational system, where he added a lot of new features and more challenges to the course (check out the re-designed [query execution](https://15445.courses.cs.cmu.edu/fall2022/project3/) project and the super challenging [multi-version concurrency control](https://15445.courses.cs.cmu.edu/fall2023/project4/) project). Besides working on the BusTub educational system, he is also a maintainer of the [RisingLight](https://github.com/risinglightdb/risinglight) educational database system. Chi is interested in exploring how the Rust programming language can fit in the database world. Check out his previous tutorial on building a vectorized expression framework [type-exercise-in-rust](https://github.com/skyzh/type-exercise-in-rust) and on building a vector database [write-you-a-vector-db](https://github.com/skyzh/write-you-a-vector-db) if you are also interested in that topic. + +[db1]: https://github.com/tikv/tikv +[db2]: https://github.com/tikv/agatedb +[db3]: https://github.com/bytedance/terarkdb +[db4]: https://github.com/risingwavelabs/risingwave +[db5]: https://github.com/neondatabase/neon + +{{#include copyright.md}} diff --git a/mini-lsm-book/src/SUMMARY.md b/mini-lsm-book/src/SUMMARY.md index dde6d2a..fb90440 100644 --- a/mini-lsm-book/src/SUMMARY.md +++ b/mini-lsm-book/src/SUMMARY.md @@ -1,7 +1,8 @@ # LSM in a Week -[Overview](./00-overview.md) -[Get Started](./00-get-started.md) +[Preface](./00-preface.md) +[Mini-LSM Overview](./00-overview.md) +[Environment Setup](./00-get-started.md) --- @@ -22,7 +23,7 @@ # Mini-LSM v2 -- [Week 1: Mini-LSM](./week1-overview.md) +- [Week 1 Overview: Mini-LSM](./week1-overview.md) - [Memtable](./week1-01-memtable.md) - [Merge Iterator](./week1-02-merge-iterator.md) - [Block](./week1-03-block.md) @@ -31,7 +32,7 @@ - [Write Path](./week1-06-write-path.md) - [Snack Time: SST Optimizations](./week1-07-sst-optimizations.md) -- [Week 2: Compaction and Persistence](./week2-overview.md) +- [Week 2 Overview: Compaction and Persistence](./week2-overview.md) - [Compaction Implementation](./week2-01-compaction.md) - [Simple Compaction Strategy](./week2-02-simple.md) - [Tiered Compaction Strategy](./week2-03-tiered.md) @@ -40,7 +41,7 @@ - [Write-Ahead Log (WAL)](./week2-06-wal.md) - [Snack Time: Batch Write and Checksums](./week2-07-snacks.md) -- [Week 3: MVCC](./week3-overview.md) +- [Week 3 Overview: MVCC](./week3-overview.md) # The Rest of Your Life (TBD) diff --git a/mini-lsm-book/src/copyright.md b/mini-lsm-book/src/copyright.md index 794a83b..82aaa7e 100644 --- a/mini-lsm-book/src/copyright.md +++ b/mini-lsm-book/src/copyright.md @@ -1 +1 @@ -
Copyright © 2022 - 2024 Alex Chi Z. All Rights Reserved.
+Your feedback is greatly appreciated. Welcome to join our Discord Community.
Copyright © 2022 - 2024 Alex Chi Z. All Rights Reserved.