December 26, 2021 7:00 PM PST
This document summarizes a mock system design interview focused on the implementation of a rate limiter. The discussion covers functional and non-functional requirements, system design considerations, algorithms, and feedback from the interview process.
Requirements
Functional Requirements
- Limit the rate of requests:
- By request type or by user.
- Frequency per minute depends on users.
- Services are deployed on multiple servers.
- Return a boolean if false, with an HTTP status of 400.
- Limit based on requests per minute and per hour.
Non-Functional Requirements
- System design should consider external APIs.
- Input parameters for requests:
- User ID
- Request type
- Output:
- Boolean indicating whether to process the request (true) or return an HTTP status (false).
System Design
Database Schema
-
APITable
- Key:
api_id
- Value:
rate_limit
- Key:
-
UserTable
- Key:
userId
- Values:
rate_limit
,email
,phone
- Key:
Low Latency Considerations
- Cache rate count of each API in time intervals.
- Cache user information and API information.
- Use multiple clusters globally.
Storage Types
- Cache options: Redis, Memcached.
- For each API, create a hash table:
- Key:
userID
- Value: Count of frequency in the current time interval.
- Key:
- Storage calculation:
- API_ID (2 Bytes) + userID (8 Bytes) + overhead of hash table (20 Bytes) = 30 Bytes per count.
- Assuming 1000 requests per time interval, this results in 30KB per user and 30GB storage for 1 million users.
User Identification
- IP addresses cannot effectively distinguish between good and bad users.
- Limiting based on users helps manage request surges.
Algorithms
- For APIs on a single server, a sliding window algorithm is effective.
- For APIs on multiple servers, a token bucket method is recommended:
- Allocate a few tokens per server.
- Reduce tokens locally for each request.
- Servers must communicate with each other regarding token counts.
Discussions During the Interview
- The rate limiter should be positioned before requests hit the server.
- Eventual consistency allows for more jobs to be computed in a time interval than the limit permits, leading to negative token counts.
- Considerations for handling traffic spikes and request queuing.
- Rate limits can be applied at both the load balancer and server levels, allowing flexibility for different customer needs.
Feedback
Positive Aspects
- Effective requirement gathering.
- Clear API design and caching strategy.
- Good understanding of the pros and cons of using IP versus user-based limits.
Areas for Improvement
- Clarify the distinction between cluster and single machine setups, emphasizing that the rate limiter operates before requests reach the server.
- Provide a more detailed explanation of the sliding and fixed window algorithms, including their respective advantages and disadvantages.
- Consider the implications of high QPS (queries per second) on the rate limiter's design.
Conclusion
The interview provided valuable insights into the design and implementation of a rate limiter, highlighting the importance of balancing user needs, system performance, and security considerations. Further refinement of algorithm explanations and system architecture discussions will enhance understanding and application in real-world scenarios.