64
Real-life performance comparison of MongoDB Atlas on Azure vs GCP
TLDR; MongoDB Atlas M20-M40 on Azure and GCP perform pretty similar, with Azure seemingly a bit faster (surprise!) and GCP seemingly a bit more stable (less outliers).
The intention of this post is to compare the real-life performance of MongoDB M20-M40 running on Azure and GCP. Read further if you want to get the condensed results of otherwise a multi-day/week exercise1.
The trigger for the load tests were various disk-related performance issues on Azure and repeated statements that MongoDB on AWS and GCP is running much better.
The trigger for the load tests were various disk-related performance issues on Azure and repeated statements that MongoDB on AWS and GCP is running much better.
Important Notes:

Note that all database operations are single-document ones where the document size is less than 1 KB, and fall into 3 categories:
1) find a single document by UUID id (primary key)
2) insert a single document
3) replace a single document
1) find a single document by UUID id (primary key)
2) insert a single document
3) replace a single document
The load test infrastructure involves the following tools:


The MongoDB instances used are summarized in the following table:
TABLE MongoDB Instances, including IOPS and $$$
Cloud | Instance Type | Storage Size (GB) | vCPUs | RAM | IOPS | $$$/h | $$$/month |
---|---|---|---|---|---|---|---|
Azure | M20 | 128 | 24 | 4 | 500 | 0.34 | 245 |
GCP | M20 | 128 | 1 | 3.75 | 76805 | 0.33 | 238 |
Azure | M20 | 256 | 24 | 4 | 1100 | 0.45 | 324 |
GCP | M20 | 256 | 1 | 3.75 | 150005 | 0.45 | 324 |
Azure | M30 | 128 | 2 | 8 | 500 | 0.80 | 576 |
GCP | M30 | 128 | 2 | 7.5 | 76805 | 0.60 | 432 |
Azure | M30 | 256 | 2 | 8 | 1100 | 0.91 | 655 |
GCP | M30 | 256 | 2 | 7.5 | 150005 | 0.72 | 518 |
Azure | M40 | 128 | 4 | 16 | 500 | 1.46 | 1051 |
GCP | M40 | 128 | 4 | 15 | 76805 | 1.06 | 763 |
Worth noting:
The following sections will present the load testing results from different perspectives.


Interpretation:


Interpretation:


Interpretation:

Interpretation:

Interpretation:


Interpretation:
There are a few hypotheses as to why the results could be incorrect:
Hypothesis 1: Burstability of M20 on Azure might lead to good results in shorter test durations, and worse results in longer ones
Hypothesis 2: There is another bottleneck in the load test infrastructure (K8s node utilization checked)
As stated above, overall it seems Azure has better averages with smaller instance/storage sizes, however GCP has lower duration variability/less outliers. Things become more equal with bigger instance/storage sizes. Having in mind that the costs have opposite dynamics - GCP gets substantially cheaper with bigger instance/storage sizes, it seems that GCP is a winner in the M30/M40 range. However, if M20 better suits your needs (as in my case), then Azure seems to have almost the same costs as GCP but better performance. Once again, beware of the various disk-related performance issues on Azure.
P.S. I would appreciate any comments pointing out an obvious mistake leading to wrong results!
Including the load test automation/setup itself ↩
Synchronous = Client calls via HTTPS/REST Service1, Service1 acts as an orchestrator and calls synchronously Service2, Service3 etc. and awaits (async) their responses. Some of the inter-microservice calls are done in sequence, and some in parallel. ↩
"Transaction" does not mean database transaction or distributed database transaction. No database transactions are used (no database was harmed ;) ↩
Burstable, which means from 2 x 100% = 200% only 40% are provisioned, but accumulated credits allow bursting up to the full 200% from time to time ↩
Half of the IOPS are read IOPS and half are write IOPS ↩
64