7 Log Compaction
Raft ' s log grows during normal operation to incorporate + client requests, but in a practical system, it cannot grow WI Thout bound. As the log grows longer, it occupies more space and takes more time to replay. This would eventually cause availability problems without some mechanism to discard obsolete information it has Accumulat Ed in the log.
Snapshotting is the simplest approach to compaction. In snapshotting, the entire current system state was written to a snapshot on stable storage and then the entire log up to tha T point is discarded. Snapshotting is used in Chubby and ZooKeeper, and the remainder of this section describes snapshotting in Raft.
Incremental approaches to compaction, such as log cleaning [] and log-structured merge trees [5], is also possib Le. These operate on a fraction of the data at once, so they spread the load of compaction more evenly over time. They first select a region of data the have accumulated many deleted and overwritten objects, then they rewrite the live o Bjects from this region to compactly and free of the region. This requires significant additional mechanism and complexity compared to snapshotting, which simplifies the problem by Al Ways operating on the entire data set. While log cleaning would require modifications to Raft, state machines can implement LSM trees using the same interface as Snapshotting.
Figure shows, the basic idea of snapshotting in Raft. Each server takes snapshots independently, covering just the committed entries in its log. The most of the work consists of the "state machine" writing its "to the snapshot." Raft also includes a small amount of metadata in the Snapshot:the last included index was the index of the last entry in T He log that the snapshot replaces (the last entry of the state machine had applied), and the last included term are the term O F This entry. These is preserved to support the Appendentries consistency check for the first log entry following the snapshot, since T Hat entry needs a previous log index and term. To enable cluster membership changes (section 6), the snapshot also includes the latest configuration in the log as of Las T included index. Once a server completes writing a snapshot, it may delete all logs entries up through the last included index, as well as a NY prior snapshot.
Although servers normally take snapshots independently, the leader must occasionally send snapshots to followers that L AG behind. This happens when the leader have already discarded the next log entry that it needs to send to a follower. Fortunately, this situation was unlikely in normal operation:a follower that have kept up with the leader would already hav E this entry. However, an exceptionally slow follower or a new server joining the cluster (section 6) would not. The bring such a follower up-to-date is for the leader to send it a snapshot over the network.
The leader uses a new RPC called Installsnapshot to send snapshots to followers that is too far behind; see Figure 13. When the a follower receives a snapshot with the this RPC, it must decide the what to does with its existing log entries. Usually the snapshot would contain new information not already in the recipient ' s log. The follower discards its entire log; It's all superseded by the snapshot and may possibly has uncommitted entries that conflict with the snapshot. If instead the follower receives a snapshot that describes a prefix of it log (due to retransmission or by mistake) and then Log entries covered by the snapshot is deleted but entries following the snapshot is still valid and must be retained.< /p>
This snapshotting approach departs from Raft ' s strong leader principle, since followers can take snapshots without the Knowledge of the leader. However, we think this departure is justified. While have a leader helps avoid conflicting decisions in reaching consensus, consensus have already been reached when SNA Pshotting, so no decisions conflict. Data still only flows from leaders to followers, just followers can now reorganize their data.
We considered an alternative leader-based approach in which only the leader would create a snapshot and then it would send This snapshot to each of its followers. However, this has both disadvantages. First, sending the snapshot to each follower would waste network bandwidth and slow the snapshotting process. Each follower already have the information needed to produce their own snapshots, and it is typically much cheaper for a serv Er to produce a snapshot from their local state than it was to send and receive one over the network. Second, the leader ' s implementation would is more complex. For example, the leader would need to send snapshots to followers in parallel with replicating new log entries to them, so As not to block new client requests.
There is and more issues that impact snapshotting performance. First, servers must decide when to snapshot. If a server snapshots too often, it wastes disk bandwidth and energy; If it snapshots too infrequently, it risks exhausting its storage capacity, and it increases the time required to replay t He log during restarts. One simple strategy are to take a snapshot when the log reaches a fixed size in bytes. If this size is set to being significantly larger than the expected size of a snapshot, then the disk bandwidth overhead for Snapshotting'll be small.
The second performance issue is a writing a snapshot can take a significant amount of time, and we don't want this to Delay normal operations. The solution is to use copy-on-write techniques so this new updates can be accepted without impacting the snapshot being W Ritten. For example, the state machines built with functional data structures naturally. Alternatively, the operating system ' s copy-on-write support (e.g., fork on Linux) can is used to create an in-memory snaps Hot of the entire state machine (our implementation uses this approach).
7th log Compression
During normal operation, the raft log grows to incorporate more client requests, but it cannot grow indefinitely in the real system. When the log grows, it takes up more space and consumes more time to reproduce. If there is no mechanism to discard stale information accumulated in the log, it will eventually lead to usability issues.
Consistency Algorithm Quest (Extended Version) 9