November 13, 2022 6:00 PM PST
This document summarizes the key points discussed during a design interview focused on a cloud storage service. The interview covered various aspects including requirements gathering, system design, API definitions, and architectural choices. The goal was to create a scalable, durable, and highly available cloud storage solution that supports file uploads, downloads, and synchronization across multiple clients.
Requirements Gathering
Basic Requirements
-
Functional Requirements:
- Support for directories
- Upload files
- Download files
- File synchronization across multiple clients
-
Out of Scope:
- Permission management
- File sharing
- Notifications
Scale Estimates
-
User Base:
- 50 million signed-up users
- 10 million daily active users
-
Storage and Usage:
- 10 GB of free space per user
- Average of 2 uploads per day, with an average file size of 500 KB
- 1:1 read to write ratio
- Average of 1000 files per user
Capacity Planning
-
Total Storage Requirement:
- 50 million users * 10 GB = 500 PB
- Estimated number of files: 10 billion (1000 files * 10 million users)
- Metadata storage requirement: 2 TB (10 billion files * 200 bytes of metadata per file)
-
Bandwidth Requirements:
- 200 uploads per second * 0.5 MB per upload = 100 MB per second
Non-Functional Requirements
- Durability: 99.99999% (3 replicas)
- Availability: 99.999% (approximately 5 minutes downtime per year)
- Performance: Quick synchronization, minimal bandwidth usage, scalability, and high availability
API Design
Core APIs
-
File Operations:
list
,upload
,download
,uploadChunk
,downloadChunk
-
Client Notifications:
- Mechanism for notifying clients when a file version is updated on the server (consider trade-offs between polling and push notifications).
Database Schema
- Design considerations for metadata storage and file management.
Architectural Choices
-
Storage Solutions:
- Consider using Amazon S3, Google Cloud Storage, or HDFS for file storage.
- Evaluate SQL (MySQL) vs NoSQL (Cassandra) for metadata storage.
-
Message Queue System:
- Consider using a message queue (e.g., RabbitMQ) to handle file operations and notifications.
- Define message content structure (user ID, file name, operation type).
System Design Considerations
-
High-Level Design:
- Include API gateway, encryption, and load balancers.
- Handle different file sizes (small, medium, large).
-
Client-Side Operations:
- Client applications should monitor directories for changes and communicate with the API service for file operations.
-
Failure Handling:
- Ensure the system can handle API service crashes and maintain data integrity.
Conclusion
The design interview explored various aspects of building a cloud storage service, focusing on scalability, durability, and performance. Key considerations included API design, database schema, and architectural choices to ensure a robust and efficient system. Further investigation into different cloud providers and storage solutions is recommended to finalize the architecture.