February 26, 2023 7:00 PM PST
This document summarizes the discussion around the design of a distributed key-value (KV) store. The focus was on functional requirements, architecture, scaling, and technical implications necessary to support various services such as authentication, authorization, and messaging. The design emphasizes eventual consistency, high availability, and efficient data handling.
Requirements
Functional Requirements
- Design a distributed KV store to support services like authentication, authorization, and messaging.
- Exclude payment processing.
- Support for range queries and batch queries/updates.
- Implement eventual consistency, eliminating the need for transactions.
- Consider in-memory storage options.
Scaling Requirements
- KV store should scale linearly.
- Anticipate multi-data center support after five years.
- Ensure real-time user experience with a latency target.
- Aim for 99.9% availability and robustness to prevent data loss.
- Read/write ratio is 99/1, necessitating efficient data structures.
Capacity Planning
- Target capacity of 1 petabyte of data.
- Handle 1 million queries per second, with 10,000 write queries per second and 990,000 read queries per second.
- Distribute data across multiple nodes to manage load and reduce master pressure.
- Average key-value size is approximately 2KB.
API Design
- getDataRequest(user_token, key, version):
- Returns a string result and version_id.
- Error handling includes error_code and error_message.
Architecture
Data Model
- Key-value storage with globally unique keys.
- Values can be strings, integers, or JSON.
Key Methodology
- Use a hash function to convert keys to storage keys.
- Implement strategies for handling hash collisions, such as open addressing or separate chaining.
- Sharding strategy to support range queries by assigning key ranges to nodes.
Storage Considerations
- Use a B+ tree for storage due to its efficiency in handling range queries.
- Consider hybrid storage solutions combining in-memory and disk-based storage.
Query Handling
- Implement a mechanism for range queries across nodes.
- Utilize a configuration service to manage key ranges in each node.
Caching
- Introduce caching mechanisms between storage and disk to improve performance.
- Consider using an LRU cache as an independent service.
Leader Election
- Implement leader election to manage master node failures and ensure data consistency across replicas.
Discussion Points
- Explore consistent hashing to resolve hot-key issues.
- Discuss the implications of strong vs. eventual consistency.
- Evaluate the need for queues based on latency requirements.
- Consider the use of technologies like Redis and RockDB for specific use cases.
- Review the design patterns of existing systems like DynamoDB and Cassandra.
Grading Criteria
- Soft Skills: Clarification of requirements, discussion of trade-offs, clear presentation, and pacing of the interview.
- Hard Skills: Design quality, knowledge of data structures, availability, durability, and familiarity with specific technologies or algorithms.
Areas Covered in System Design Training
- High throughput infrastructure (e.g., KV store, distributed message queue).
- High volume infrastructure (e.g., cloud file systems, distributed log collection).
- Collaboration applications (e.g., multi-user chat, news feeds).
- Distributed transaction applications (e.g., auction systems, payment processing).
- Content sharing applications (e.g., YouTube, Google Photos).
- Geography applications (e.g., Uber, Yelp).
This structured approach ensures clarity in the design process and highlights key considerations for building a robust distributed key-value store.