July 17, 2022 8:00 PM PDT
This document summarizes a mock system design interview focused on creating a news aggregator similar to Google News. The discussion covered functional and scaling requirements, system architecture, and ranking mechanisms for delivering personalized news feeds to users. The interview aimed to assess the interviewee's understanding of system design principles and their ability to communicate effectively with the interviewer.
Functional Requirements
- News Aggregator: A system that aggregates news from multiple sources.
- Subscription Service: Users can subscribe to different categories of news topics.
- News Feed: Users receive a personalized news feed based on their subscriptions.
- Sources: The system crawls five different news sources.
System Design
Components
- Frontend: Serves the news feed to users.
- Backend: Includes a crawler to fetch news and a ranking service to prioritize news items.
External APIs
getNewsFeed(userId)
subscribe(userId, topicList)
Crawler
- Responsible for fetching news from various sources.
- Stores only links in the database, not the content.
Scaling Requirements
- Each news item requires approximately 200 bytes (URL, etc.).
- The system must retrieve the latest news within 5 seconds.
- Designed to support 300 million daily active users (DAU).
Performance Metrics
- Queries Per Second (QPS): Estimated at 300 million DAU * 10 requests per day.
- Cache Strategy:
- Cache for hot news items.
- Store a list of news IDs for quick access.
- Consider trade-offs between latency and user experience.
Ranking Mechanism
- Core Inputs: Source, news, user preferences, and category.
- Top News Selection: Based on crawling data and user interactions (likes, comments, shares).
- Machine Learning Model: Used to generate a personalized feed of 500 news items for each user.
Interviewer and Audience Feedback
- The interviewee demonstrated a good understanding of system design but needed to communicate more effectively with the interviewer to clarify requirements.
- Suggestions included:
- Focusing on urgent news optimization.
- Exploring database design in more depth.
- Considering both push and pull mechanisms for news delivery.
Additional Discussion Points
- Crawling Latency: Acknowledged as a challenge but not a core requirement.
- Hybrid Model: Combining pull and push strategies for news delivery.
- Personalization Depth: Discussed the potential use of clickstream data for enhanced personalization.
Conclusion
The interview highlighted the complexities involved in designing a scalable news aggregator system. It emphasized the importance of clear communication during the design process and the need to balance functional requirements with performance metrics. The feedback provided valuable insights for improving the interviewee's approach to system design discussions.