October 2, 2022 7:00 PM PDT
This document summarizes the discussions and technical details regarding consensus algorithms, focusing on fault tolerance, leader election, log replication, and various consensus mechanisms like Paxos and Raft.
Presenter: Longwei, Tech Lead
Key Topics
Fault and Partial Failure
- Unreliable clocks: distinction between time of day clock and monotonic clock.
- In distributed systems, total order of events cannot be defined; only partial order exists.
- Truth is defined by majority consensus.
Consensus Mechanisms
- Linearizability: Ensures total order of operations.
- Causality: Establishes the order of events.
- Sequence numbers and timestamps are critical for maintaining order.
- Non-causal ordering examples: TinyURL for global unique keys.
Key Generation
- Techniques include single machine incremental counters, odd-even sequences, and Snowflake (timestamp + instance + sequence).
- Lamport timestamps are used to guarantee causality without total order.
Fault-Tolerant Consensus
- Paxos: Lacks a leader and requires multiple round trips for agreement.
- Raft: A simplified version of Paxos that includes leader election, log replication, and crash recovery.
Raft Consensus Algorithm
- Leader election process involves sending messages with terms and maintaining three states: follower, candidate, and leader.
- Heartbeats are used for periodic suppression to prevent unnecessary elections.
- If followers do not hear from a leader, they can become candidates.
- Log replication requires new writes to go to the leader, which then commits changes after majority agreement.
Edge Cases
- Scenarios where no leader is elected and the importance of having an odd number of nodes to avoid ties.
- Handling network partitions: majority wins, and clients can write to partitions but only the majority can commit changes.
Network Repair and Rollback
- Use of terms to agree on the real leader and rollback uncommitted changes.
Consensus Costs
- The cost of consensus is often the delay introduced by network communication.
Quorum and Building Blocks
- Quorum strategies involve read and write quorums, with Raft always using majority.
- Discussed the implications of not using consensus, including high write availability at the cost of slower reads.
Two-Phase Commit (2PC)
- Involves a prepare phase and a commit phase, with the drawback of a coordinator being a single point of failure.
Discussions
- References to DynamoDB and its implementation of consensus mechanisms.
- Mention of projects utilizing Raft, such as Etcd and Kafka (kRaft in beta).
Conclusion
- The discussions highlighted the complexities and trade-offs involved in implementing consensus algorithms in distributed systems, emphasizing the importance of fault tolerance and efficient communication strategies.