February 6, 2022 7:00 PM PST
This document summarizes a mock system design interview focused on a cloud file storage system. The discussion covered various aspects of system architecture, functional and non-functional requirements, API design, and potential challenges in implementation. The interview aimed to assess the candidate's ability to design a scalable and reliable cloud storage solution.
Requirements
Functional Requirements
- File Operations:
- Edit, download, and upload files (similar to Google Drive).
- Sync local folders with cloud folders (similar to Dropbox).
- Manage permissions (public/private).
- List, view, upload, download, and delete files in the client/web interface.
Non-Functional Requirements
- Availability: 99.9%
- Reliability: Ensure no loss of user data.
- Latency: Real-time user experience for listing files.
- Capacity:
- Daily Active Users (DAU): 10M
- Total Users: 50M
- Average file size per user: 1 GB
- Total storage requirement: 50M GB
- Network Bandwidth:
- Average upload per user: 100 MB
- Upload bandwidth: 10M users * 100 MB / (24h * 3600s)
- Download bandwidth: 20M users * 100 MB / (24h * 3600s)
- Files per User: Average file size of 50 MB leads to approximately 200 files per user.
System Design
API Design
- UploadFileRequest:
(user_token, file_path, file(data_stream), description, title, permission)
- DownloadFileRequest:
(user_token, file_path) -> file(data_stream)
- DeleteFile:
(user_token, file_path)
- ListFile:
(user_token, folder_path, pagination, sort) -> list<file_meta_data>
Data Storage
- FileMetadataTable (NoSQL + Transaction):
- Fields:
file_id
,user_id
,permission
,title
,description
,file_path
,list<chunk_address>
,creation_time
,last_updated_time
.
- Fields:
File Upload Strategies
-
Chunked Upload:
- Pros: Can resume uploads if interrupted.
- Cons: Requires client-side support and increases CPU usage.
-
Full File Upload:
- Pros: Easier to implement, supports web clients.
- Cons: If upload fails, the entire process must restart.
Syncing Mechanisms
-
WebSocket:
- Pros: Bidirectional communication for uploads/downloads.
- Cons: High resource usage to maintain connections.
-
Pull/Long Polling:
- Pros: Cost-effective and easy to implement.
- Cons: Requires additional requests for downloads.
Handling Concurrent Modifications
- Implement a File Processor to manage metadata and deduplication.
- Use a Fanout Message Queue for notifications about updated chunks.
- Deduplication options:
- Based on checksum (recommended).
- Based on entire data.
Feedback and Discussion Points
- The interviewer emphasized the need for a working solution, with time allocated for requirements gathering, system architecture design, and addressing normal cases.
- Audience feedback highlighted the importance of soliciting regular feedback during the interview process.
- The design should consider scalability, availability, and reliability, with discussions on high availability and the implications of different availability levels.
- Suggestions included using Server-Sent Events (SSE) for better connection management and considering the implications of chunking on client-side processing.
Conclusion
The mock interview provided valuable insights into the design of a cloud file storage system, emphasizing the importance of scalability, reliability, and user experience. The candidate demonstrated a solid understanding of system architecture and the challenges involved in implementing a robust cloud storage solution.