June 12, 2022 7:00 PM PDT
This document summarizes the design and discussion of a metadata delivery system, focusing on its requirements, architecture, and potential failure cases. The system aims to allow customers to self-onboard their services and manage metadata delivery efficiently.
Requirements
Functional Requirements
- Customers can self-onboard their services running within their infrastructure nodes.
- The metadata delivery system is responsible for:
- Delivering custom metadata to all nodes associated with the service (metadata can expire based on customer configuration).
- Updating custom dynamic metadata periodically.
- Allowing customers to force metadata delivery to a single node via an API call.
Basic Assumptions
- An existing API is available to retrieve metadata.
- A mapping between service names and infrastructure nodes is provided via an S3 object.
Data Specifications
- Data size: 10KB
- Services: 100k
- Nodes: > 10M
- Certificate expiry: customizable (typically around 1 hour).
Non-Functional Requirements
- Reliability and high availability.
- Must deliver at least once before the certificate expires.
- Traffic estimate: 20 QPS.
System Design
External APIs
- API Example:
GetMetaData(node_id, serviceName_id)
Register(serviceName_id, s3_file_link)
Refresh(serviceName_id, new_node_id)
Architecture Design
- Add a delivery service.
- Implement alarms for system outages.
- Address hardware failure handling.
Failure Handling
- Acknowledgment for successful delivery should be provided.
- API server should detect failures and retry delivery using alternative services.
- Multiple API servers should be deployed to mitigate single points of failure.
- In the event of a complete system failure, alarms should be triggered for manual recovery.
Interviewer and Audience Feedback
Interviewer Feedback
-
Soft Skills:
- Demonstrated good soft skills and covered key points.
- Noted the omission of onboarding details in the API design.
-
Hard Skills:
- Suggested the need for a database to store customer information.
- Highlighted potential issues with synchronous calls in large system failures.
Interviewee Reflection
- Acknowledged the interview was not strong and that key requirements were overlooked.
- Recognized the need for a queue in handling synchronous calls.
Design Considerations
- The API server can handle customer requests for service registration and metadata retrieval.
- Metadata is assumed to include certificates, which should be delivered periodically.
- Clarifications on metadata types and expiration handling were discussed.
Audience Questions
- Discussed the storage of service name to node mapping and the assignment of work to workers.
- Explored the use of message queues to prevent blocking of the API server.
- Addressed how to implement expiration using cron jobs or priority queues.
Conclusion
The metadata delivery system design emphasizes reliability, efficient metadata management, and the ability to handle failures gracefully. Feedback from the interview highlighted areas for improvement in both design and communication.