October 30, 2022 7:00 PM PDT
This meeting focused on the design and implementation of distributed caching systems, discussing various database technologies, caching strategies, and their applications in services such as authentication and friendship management. The discussion highlighted the importance of optimizing data retrieval and maintaining consistency between caches and databases.
Presenter: Pauline
Key Topics Discussed
1. System Design Summary
- Data Storage: Data is saved in a database to accelerate lookup.
- Common Scenarios:
- Authentication service
- Friendship management
- User tables
2. Operations Analysis
- Common Operations:
- Registration
- Login
- Query
- Update user info
- Demanding Operation: Querying user information is the most demanding operation.
3. Database Performance
- QPS (Queries Per Second):
- MySQL/PostgreSQL: 1k QPS
- MongoDB/Cassandra: 10k QPS
- Redis/Memcached: 100k to 1M QPS
4. Database Preferences
- For operations like registration, login, and modification (300 QPS), MySQL is preferred.
- For querying user info, a custom system with lots of reads and fewer writes can be optimized with caching.
5. Caching Overview
- What is Cache?: A key-value store that can include features like expiration.
- Types of Caches:
- Memcached: Does not support persistence.
- Redis: Supports persistence.
6. Cache Operations
- Cache Mechanism:
- Application retrieves data from cache to optimize database queries.
- Potential for dirty data due to inconsistencies between cache and database.
- Case A: db.set(user); cache.set(key, user);
- Case B: db.set(user); cache.delete(key);
- Case C: cache.set(key, user); db.set(user);
- Case D: cache.delete(key); db.set(user) (can lead to dirty data).
7. Consistency Strategies
- Avoiding Inconsistency:
- First set then delete: db.set(key, customer); cache.delete(key).
- Set TTL (Time To Live), e.g., 7 days.
8. Caching Strategies
- Cache Aside: Server communicates with DB and cache separately.
- Cache Through: Server only talks to cache, which communicates with the DB (used by Redis).
9. Authentication Service
- Components:
- Login, session, cookie management.
- Session table to store user information (session key, user_id, expired_at).
10. Friendship Service
- Types of Friendship:
- One-way friendship: Friendship table with from_user_id and to_user_id.
- Two-way friendship: A record for each friendship pair.
11. Database Technologies
-
NoSQL (Cassandra):
- Scalable, fault-tolerant, and consistent.
- Column-oriented database based on Dynamo and Google Bigtable.
-
RDBMS vs. Cassandra:
- RDBMS: Structured data handling.
- Cassandra: Unstructured data handling.
12. Friendship Storage in Databases
- Choosing the Right Database:
- SQL for structured data with easy indexing.
- NoSQL for distributed, auto-scaling, and replica capabilities.
13. Extended Friendship Modeling
-
One-way Friendship:
- Redis: key = user_id, value = set of friend_user_id.
- Cassandra: row_key = user_id, column_key = friends_id.
-
Two-way Friendship:
- Find friends A and B, then compute the intersection to store in cache/memory.
14. Performance Expectations
- For a user base exceeding 100M with an average of 1000 friends, the expected number of database queries should be less than 20.
Conclusion
The meeting provided insights into the design and optimization of distributed caching systems, emphasizing the importance of maintaining data consistency and the selection of appropriate database technologies based on use cases.