October 31, 2021 7:00 PM PDT
This document summarizes the discussions and design considerations for a metrics system utilizing a time series database. The focus is on handling high volumes of data with specific functional and non-functional requirements, including scalability, performance, and data consistency.
Google monach time series database
Requirements
User Requirements
- Users: 1 million
- Traffic: 10 writes and 1 read per second
Functional Requirements
- Data collection
- CRUD operations
- Aggregation
- Calculation
- Storage
- Querying
- Data visualization
Non-Functional Requirements
- High availability
- High scalability
- High performance
- Data consistency
Constraints
- Math Constraints:
- Users: 1 million
- Queries per second (QPS):
- Write-heavy: ( \frac{1M \times 20}{10^5} = 10^4 )
- Peak: ( 3 \times 10^4 )
- Reads: 10 writes, 1 read
- Storage: ( 3 \text{ years} \times 365 \times 10^5 \times 10^4 \times 1 \text{ KB} / (1024 \times 1024) )
- Bandwidth:
- Write: ( 1 \text{ KB} \times 3 \times 10^4 = 0.5 \text{ M/s} )
- Read: ( 0.05 \text{ M/s} )
- Memory: ( 1M \times 0.2 \times 1 \text{ KB} = 5 \times 10^3 \text{ GB} )
System Design
System Design Diagram
- API Endpoints:
metric.send(userId, eventName, status, timestamp)
: Returns statusCodemetric.get(eventName, timestamp, range)
: Returns List/ integrate
Data Flow
- Writes:
- Client → Load Balancer → Aggregator Service → Message Queue (Kafka) → Log Service → NoSQL Database (Elastic Search) → Visualization Service
- Reads:
- Client → Load Balancer → Redis → Read Service → Elastic Search
Database Schema
- NoSQL Schema:
{ "index": "eventName", "timestamp": "time", "status": "running", "tenant": ["tenant1", "tenant2"] }
- SQL Schema: | NotificationID | Sender | Receiver | Content | Status | Timestamp |
Discussion Points
- API and Schema Review: The interviewee discussed the API design and provided examples for
metric.send
. - Visualization Service: The addition of Grafana for visualization was suggested, allowing users to customize their UI.
- Redis Usage: Redis was proposed to cache results and reduce database load, especially for frequent read requests.
- Real-Time Computation: The interviewee acknowledged that real-time computation can be challenging and suggested using sampling for quick estimates.
- Data Aggregation: The interviewer emphasized the importance of aggregating data before writing to the time series database to improve performance.
- Handling Large Payloads: Suggestions included splitting large payloads into smaller files and using hashing to distribute loads across multiple aggregator instances.
- Monitoring and Validation: Discussed the need for monitoring missing data and validating incoming records.
Audience Feedback
- General Observations: The interviewee appeared nervous and could improve by asking for clarification on requirements before diving into design.
- Technical Insights: Audience members raised questions about Kafka partitioning strategies, the use of Elastic Search, and the importance of aggregation in reducing query load.
- System Design Considerations: Emphasized the need for a robust architecture that avoids single points of failure and maintains performance under heavy traffic.
Conclusion
The meeting provided valuable insights into the design and operational considerations for a metrics system utilizing a time series database. The discussions highlighted the importance of scalability, performance optimization, and effective data handling strategies in building a robust metrics monitoring solution.