August 13, 2023 7:00 PM PDT
This document outlines the key features and comparisons of Azure Cosmos Database, focusing on its architecture, performance, and use cases.
Presenter: R, Tech Lead
System Design Presentation - Azure Cosmos Database
Key Features
- Fully Managed: Azure Cosmos Database is a fully managed database service.
- High Resiliency: Offers 99.999% uptime.
- Low Latency: Achieves less than 10ms read/write latency at P99.
- Elastic Scale:
- Throughput: Supports from 100 requests per second to over a trillion requests per second.
- Storage: Scales from 50GB to petabytes.
- Global Distribution: Can be deployed across all Azure regions.
- Tunable Consistency: Options include eventual, consistent-prefix, session, bounded staleness, and strong consistency.
Architecture
- Data Structure: Utilizes unique data structures for indexing.
- B+ Tree: A traditional choice for low write throughput, but can slow down with many indices.
- BW Tree: Provides lock-free updates using compare and swap, improving write throughput.
- Partitioning:
- Horizontal partitioning with a maximum of 50GB per partition.
- Each replica set contains four replicas with one leader.
- Forwarder propagates changes to the leader.
Conflict Resolution
- Strong consistency eliminates write conflicts.
- Delta records are written on SSD to improve write throughput.
- Conflict detection occurs during the compare and swap process.
Comparison with DynamoDB
- Indexing:
- DynamoDB offers an optional sort key and supports a limited number of global secondary indices.
- Cosmos DB supports indexing on all JSON fields.
- Query Capabilities: Cosmos DB provides more types of queries across different database types.
Pros and Cons
-
NoSQL Databases:
- Pros: Flexible schema design, horizontal scaling, performance tuning for specific workloads, favors availability over consistency.
- Cons: Limited query capabilities (e.g., no joins).
-
SQL Databases:
- Pros: ACID compliance, structured schema, supports complex queries including joins.
- Cons: Performance bottlenecks with large datasets.
Use Cases
-
SQL:
- E-commerce, financial systems, content management systems.
-
NoSQL:
- Social media, graph databases, chat applications, big data applications (e.g., Cassandra for Netflix), Internet of Things, high write-throughput scenarios (e.g., Philips Hue using DynamoDB).
-
Hybrid Cases:
- Gaming industry (e.g., Redis for leaderboards), e-commerce with personalized recommendations.