March 5, 2023 7:00 PM PST
This document outlines the design considerations and functional requirements for a Google Drive-like application, focusing on file management, storage, and API interactions.
Functional Requirements
- Support directories
- Upload files
- Download files
- File synchronization across multiple clients
Out of Scope
- Permission management
- File sharing
- Notification services
Scale
- File size limit: < 1 GB
- User base: 50 million signed up users, 10 million daily active users
- Free storage: 10 GB per user
- Upload limit: 2 files per day, average size 500 KB
- Read to write ratio: 1:1
- Average files per user: 1000 files
Estimates
- Total storage: 50 million users * 10 GB = 500 PB
- Uploads: 2 files per user
- QPS for uploads:
- 10 million users * 2 uploads / 100,000 = 200 QPS
- Peak QPS = 200 QPS * 5 = 1000 QPS
Metadata DB Storage
- Files per user: 10 GB / 500 KB = 20,000 files
- Total files: 1000 files * 10 million users = 10 billion file entries
- Metadata per file: file path, S3 path, user, date
- Total metadata storage: 10 billion files * 200 bytes = 2 TB
Bandwidth
- Uploads: 200 uploads per second * 0.5 MB per file = 100 MB per second
Non-Functional Requirements
- Durability
- Quick synchronization
- Minimized bandwidth usage
- Scalability
- Availability
API Design
UploadFile
DownloadFile
GetFileDirectory
- Support for pull/push new changes
Client-Server Interaction
- User connects to the service
- Long polling or WebSocket for notifications
- File changes trigger download initiation to the API gateway
Sharding
- Considerations for efficient data distribution and access
Points to Cover
API Endpoints
- List, upload, download, uploadChunk, downloadChunk
- Client notifications for version updates on the server (trade-offs between polling vs. push)
Database Schema
- Architecture choices:
- Traffic routing through application server for S3/Google Cloud Storage
- Push vs. poll for change propagation
Storage Choices
- Caching mechanisms
- Database options: SQL (MySQL) vs. NoSQL (Cassandra, eventual consistency)
- File storage options: Amazon S3, Google Cloud Storage, HDFS
Bar Raiser
- Familiarity with Amazon S3, Google Cloud Storage, or HDFS workflows
- Implementation of tiered storage to optimize costs
Skills Assessment
Soft Skills
- Requirement gathering
- Discussing trade-offs
- Clear presentation and communication
Hard Skills
- Design quality
- Knowledge of existing solutions and trade-offs
- Integration into the larger context of project and product lifecycle
Additional Considerations
- Google File System (GFS) integration
- Effective communication of knowledge and design choices
- API flow and schema design