May 7, 2023 7:00 PM PDT
This document outlines the design considerations and discussions from a system design interview focused on creating a URL shortening service similar to TinyUrl. The interview covers various aspects including storage solutions, URL conversion methods, API definitions, and strategies for handling high traffic.
System Design Interview Notes
Storage Solutions
- Discussed options for storing data: SQL vs. NoSQL.
- Proposed a key-value store using short URLs as keys.
- Suggested creating a two-way table for mapping long URLs to short URLs and vice versa.
- Considered the architecture of having one write server and multiple read servers for scalability.
URL Conversion
- Discussed methods for converting long URLs into short URLs using hash algorithms.
- Estimated size requirements for long URLs (64 bits).
- Introduced base64 hashing and addressed potential collisions.
- Proposed adding more bytes to the end of the hash and incorporating timestamps into the input value.
- Suggested a hash function of the form
hash_function(time_stamp + long_url)
and taking the first 10 bits. - Mentioned the use of a sliding window to ensure the uniqueness of short URLs.
Data Persistence and Caching
- Emphasized the need to persist data in a database and to implement caching.
- Clarified that the cache should not be a browser cache but should be behind a load balancer.
- Discussed the importance of defining the API early in the design process.
API and Traffic Management
- Estimated traffic handling capabilities, with calculations showing 1000 QPS.
- Discussed the need for eventual consistency and the handling of large read volumes through read replicas.
- Suggested implementing an extra layer of cache to check the uniqueness of short URLs.
- Addressed security concerns regarding potential abuse of the system by adding rate limiting and using a bloom filter to quickly filter out invalid URLs.
Scalability and Cost Considerations
- Discussed the scalability of the system with projections of high QPS (queries per second).
- Evaluated the cost implications of using NoSQL databases versus managed databases like DynamoDB and self-managed solutions like Cassandra.
- Highlighted that while NoSQL databases may require more maintenance, they can be more extensible and suited for large amounts of data.
Conclusion
- Concluded the interview without additional comments from the interviewee.
- Emphasized the importance of considering both technical and cost factors in the design of the URL shortening service.