How to identify cost increases in Amazon Aurora due to I/O consumption and strategies to reduce costs.

Carlos Andres Zambrano Barrera
Globant
Published in
4 min readApr 14, 2021

--

Introduction

Many of us use Amazon Aurora every day for different projects; without a doubt, it is the best relational database currently on the market. However, it is often a bit complex to understand the database's pricing, and even more difficult is to identify cost-saving strategies.

In this post, I will show you how to identify and optimize the cost of an Aurora Database in which the cost doubled in 1 month because of I/O consumption.

Below points are covered in the article:

  1. Current Workload
  2. Database’s Cost Increase
  3. Identify Root Cause
  4. Strategies to reduce I/O consumption’s cost
  5. Lessons Learned

1. Current Workload

This database works in production and supports the operation of hundreds of thousands of users in LATAM; the database has the following characteristics:

  • Instance Type: db.r5.xlarge
  • Write queries per second: 510
  • Read queries per second: 6
  • Average connections to the DB: 100
  • Type: Aurora Provisioned MySQL.

The workload receives a lot of queries from multiple sources, such as web servers.

2. Database's Cost Increase

The database's cost began to increase suddenly, rising approximately 800 USD in a month; when I looked at the detail, I realized that the cost was due to I/O requests, which increased from 2,811,123,309 IOs to 6,000,196,824 IOs in one month.

AWS Cost Explorer report.

After reviewing AWS Cost Explorer and identifying the RDS cost increase, I check the monthly billing for this increase.

February 2021

March 2021

3. Identify Root Cause

The most important thing when we have increased costs in our services is to be able to identify the root cause, for this we must start asking ourselves questions about our workload:

  • What recent changes have we made to the workload?
  • Are there changes in our business KPIs, for example, more users of our application?

After this, we must rely on all the tools that AWS provides us to identify the cause.

CloudWatch Metrics

1- Chek the CloudWatch metrics for VolumeReadIOPs and VolumeWriteIOPs, and we could see the increase of the read operations

Performance Insights

2- It does not currently support the version of Amazon Aurora that we have running. But if you are having the same problem, you should use it.

Audit Logs + CloudWatch Insights

3- To identify the queries causing high consumption, we must enable the Audit logs in the database; once enabled, we must go to CloudWatch Insights and query the logs to find these queries that are being more recurrent in the database.

fields @message,@ingestionTime,@timestamp
| filter @message like /QUERY/
| limit 1000

There we can already identify the queries running on the database and work to correct the problem.

4. Strategies to reduce I/O consumption's cost

The best strategies for reducing de I/O consumption's cost are:

1- Work closely with your database administrator and identify if there are queries that can optimize.
2- Add a cache layer in front of the database like an ElastiCache Redis. But take care of ElastiCache cluster's cost and consider that you have to read the RDS to update the Cache database.
3- Bring this reporting layer to S3 and do the queries with Athena, connecting it to a visualization tool.

5. Lessons Learned

1- When you need to do more detailed troubleshooting, temporarily enable the logs in the database; in this case, we use audit logs; additionally, we could also enable slow queries, errors, and general logs. Note that leaving these logs permanently enabled will result in increased costs for CloudWatch Logs.

2- When we have high costs in our IO databases, understand how your workload is performing, investigate the consumption and correct it at the application level.

3- When the cost is more for reading operations, look for alternatives; for example, evaluate if you can use architecture with S3 + Athena for your reporting.

4- To control DB schemas changes, use different tools, such as a Liquibase + Jenkins that allow controlling changes in DBs.

Carlos Zambrano

--

--

Carlos Andres Zambrano Barrera
Globant

AWS x10, Tech Director en Globant con más de 7 años de experiencia en AWS.