November 13, 2022 6:00 PM PST
This document summarizes the key points discussed during a design interview focused on a cloud storage service. The interview covered various aspects including requirements gathering, system design, API definitions, and architectural choices. The goal was to create a scalable, durable, and highly available cloud storage solution that supports file uploads, downloads, and synchronization across multiple clients.
Requirements Gathering
Basic Requirements
- 
Functional Requirements:
- Support for directories
 - Upload files
 - Download files
 - File synchronization across multiple clients
 
 - 
Out of Scope:
- Permission management
 - File sharing
 - Notifications
 
 
Scale Estimates
- 
User Base:
- 50 million signed-up users
 - 10 million daily active users
 
 - 
Storage and Usage:
- 10 GB of free space per user
 - Average of 2 uploads per day, with an average file size of 500 KB
 - 1:1 read to write ratio
 - Average of 1000 files per user
 
 
Capacity Planning
- 
Total Storage Requirement:
- 50 million users * 10 GB = 500 PB
 - Estimated number of files: 10 billion (1000 files * 10 million users)
 - Metadata storage requirement: 2 TB (10 billion files * 200 bytes of metadata per file)
 
 - 
Bandwidth Requirements:
- 200 uploads per second * 0.5 MB per upload = 100 MB per second
 
 
Non-Functional Requirements
- Durability: 99.99999% (3 replicas)
 - Availability: 99.999% (approximately 5 minutes downtime per year)
 - Performance: Quick synchronization, minimal bandwidth usage, scalability, and high availability
 
API Design
Core APIs
- 
File Operations:
list,upload,download,uploadChunk,downloadChunk
 - 
Client Notifications:
- Mechanism for notifying clients when a file version is updated on the server (consider trade-offs between polling and push notifications).
 
 
Database Schema
- Design considerations for metadata storage and file management.
 
Architectural Choices
- 
Storage Solutions:
- Consider using Amazon S3, Google Cloud Storage, or HDFS for file storage.
 - Evaluate SQL (MySQL) vs NoSQL (Cassandra) for metadata storage.
 
 - 
Message Queue System:
- Consider using a message queue (e.g., RabbitMQ) to handle file operations and notifications.
 - Define message content structure (user ID, file name, operation type).
 
 
System Design Considerations
- 
High-Level Design:
- Include API gateway, encryption, and load balancers.
 - Handle different file sizes (small, medium, large).
 
 - 
Client-Side Operations:
- Client applications should monitor directories for changes and communicate with the API service for file operations.
 
 - 
Failure Handling:
- Ensure the system can handle API service crashes and maintain data integrity.
 
 
Conclusion
The design interview explored various aspects of building a cloud storage service, focusing on scalability, durability, and performance. Key considerations included API design, database schema, and architectural choices to ensure a robust and efficient system. Further investigation into different cloud providers and storage solutions is recommended to finalize the architecture.