December 4, 2022 6:00 PM PST
This document summarizes a mock system design interview focused on creating a recommendation system for short videos, similar to platforms like YouTube or TikTok. The interview aimed to assess the candidate's ability to design a scalable system that can handle a large number of users and video interactions.
Requirements
Functional Requirements
- Design a recommendation system for short videos.
- Daily Active Users (DAU): 100 million, scalable to 1 billion.
- Average video length: 5-10 minutes; average user browsing time: 1 hour.
- Provide a customized list of short video recommendations for users, assuming 20-50 videos per list.
Non-Functional Requirements
- Focus on performance and availability.
- Recommendation algorithm is the primary focus, not the entire system.
System Design
External APIs
- Utilize external APIs for data collection and interaction.
Feature Generation
- Collect three types of information:
- User profile
- Video profile
- User-video interaction
- Use a NoSQL database (e.g., Redis) to store this information.
Candidate Generation and Ranking
- Implement candidate generation to narrow down video candidates.
- Use a ranking system to choose top candidates.
- Apply filtering to enforce business logic (e.g., inappropriate content) and ensure diversity.
Handling New Uploads
- Compute recommendations using static data overnight.
- For newly uploaded videos, compute only the delta and rerank.
Database Considerations
- Discussed the implications of new uploads on read/write ratios.
- Considered SQL for handling delta changes but concluded that NoSQL would be more suitable due to performance concerns with concurrent reads and writes.
Dynamic Recommendations
- Identified challenges in data collection and performance for candidate generation and ranking.
- Suggested using a pre-trained model for offline training and Kafka for streaming data handling.
User Interaction
- Addressed how the algorithm responds to user actions (e.g., clicking "unlike").
- Discussed the need for device IDs in user-video interactions to differentiate between multiple clients.
Component Distribution
- Identified distributed components (e.g., load balancer, application server, user profile) versus centralized components (e.g., ranker, filter).
- Suggested that candidate generation could benefit from a pre-trained model to improve performance.
Feedback
Interviewer Feedback
- Overall, the system design structure was correct, but there were minor errors in details.
- Suggested improvements in scaling the system and understanding read/write ratios.
Audience Feedback
- Discussed the importance of dynamic recommendations and the trade-offs between device-based and user-based approaches.
- Emphasized the need for multiple-choice questions during interviews to guide the discussion effectively.
Conclusion
The interview highlighted the complexities of designing a recommendation system, including the need for scalability, efficient data handling, and user interaction management. The candidate demonstrated a solid understanding of the key components and considerations necessary for building such a system.