June 5, 2022 7:00 PM PDT
This document summarizes a mock system design interview focused on implementing an auto-complete feature for a search functionality. The discussion covered functional and non-functional requirements, system design considerations, and various technical approaches to handle high user demand and data processing.
Requirements
Functional Requirements
- User type prefix should provide suggestions.
- Display top 10 suggestions sorted by popularity.
- Support English only, accommodating both lower and upper case.
- Build out history data.
- Prefix length should be less than 20 characters.
Non-Functional Requirements
- Real-time response.
- Based on one week of history data.
- Scalability to support 100 million users.
- High availability (HA).
- No strong consistency required.
System Design
External APIs
- Assumed 100 million daily active users (DAU).
- Each user performs 5 queries per day.
- Calculated write queries per second (QPS):
- ( \text{Write QPS} = \frac{100M \times 5}{24 \times 3600} = 10,000 )
- Calculated read QPS:
- ( \text{Read QPS} = 10^4 \times 20 = 200,000 )
- Estimated storage requirement:
- ( 20 \times 100M \times 5 \times 20% = 2G ) per day.
Database Schema
- Use a relational database management system (RDBMS) to satisfy functional requirements.
- API endpoints:
GET searchText=xxxx
GET prefix=a
- Response format:
{words: [apple, act, action]}
System Components
- Data Collection: Collect search data to provide suggestions.
- Search Results for Autocomplete: Generate suggestions based on collected data.
- Queue System: Buffer search requests to manage load.
- Frequency Counting: Track the frequency of each search term over the past 7 days.
- Aggregation Service: Use batch jobs for data aggregation.
- Trie Structure: Implement a trie to optimize search space and speed.
- Key-Value Store: Store prefixes as keys for efficient retrieval.
- Caching: Use Redis as a caching layer to store top suggestions.
Caching Strategy
- All data is stored in the database; cache only retains the top 10 suggestions.
- Cache misses may return an empty string if no suggestions exist for a prefix.
- Cache updates can occur every hour.
Feedback and Discussion Points
Interviewer Feedback
- Covered requirements thoroughly.
- Demonstrated strong functional system understanding.
- Good grasp of real-time processing and scalability.
- Effective use of trie data structure and caching for performance.
- Suggested that the interviewee spent too much time on the basic design.
Audience Insights
- Discussed the necessity of a two-step solution: basic and advanced.
- Emphasized the importance of keeping the solution fresh and relevant.
- Raised questions about data storage and aggregation methods.
- Suggested combining batch and streaming jobs for efficiency.
- Discussed the implications of sudden news events on data updates.
Technical Considerations
- The final solution leaned towards using key-value pairs over trie due to existing solutions.
- Consideration of RDBMS vs. NoSQL for high throughput.
- Discussed the need for low latency in the system.
- Explored methods for calculating top K suggestions and the frequency of updates.
Conclusion
The interview highlighted the complexities involved in designing a scalable and efficient auto-complete system. Key takeaways include the importance of a robust data structure, effective caching strategies, and the need for a balance between real-time updates and batch processing. The discussion provided valuable insights into both the technical and strategic aspects of system design.