Best practices and tips to reduce AWS S3 costs

Amazon Simple Storage Service (Amazon S3) is a widely used object storage service that Amazon Web Services (AWS) provides. It offers scalable storage solutions to store and retrieve data, making it a fundamental component of cloud-based applications and data management strategies. While AWS S3 provides unparalleled flexibility and accessibility, users need to understand the cost structure associated with using the service.

S3 pricing

First, you need to understand the AWS S3 pricing structure. We will not be covering all the pricing details here, but at a high level, these are the S3 pricing components:

  1. Storage – You pay for storing objects in your S3 buckets by size. The rates will depend on the region, amount of storage, and the storage tier you selected for your objects. Typically, the storage price for the S3 Standard tier is ~ $0.023-$0.027 per GB.
  2. Object Requests – You pay for requests (PUT, COPY, POST, LIST, GET, SELECT, etc.) made against your S3 buckets and objects. S3 Standard price is ~$0.005/1000 requests for write and list operations, while GET, SELECT, and other operations are ~ 1/10th of the write operation price. To summarize, a read operation (not including LIST) is much cheaper than a write operation.
  3. Data transfer – You pay for all bandwidth out of Amazon S3. You never pay for ingress! Further, you don’t pay anything if transferring data within buckets or egress from S3 to any AWS within the same region. The rates will depend on the region and the amount of data transfer. Typically, you are looking at ~ $0.09 for the first 10TB/month.
  4. Other costs – There are other costs that you could incur related to management, insights, replication, and transformations.

Tips to manage, control, and reduce S3 costs

Here, we list a few things to help you be aware of S3 costs – some aren’t obvious. You may be able to identify low-hanging fruits to reduce AWS S3 bill:

Use S3 Intelligent Tiering

This is a no-brainer! If you haven’t already done so, we highly recommend that you go ahead and do it now and save a ton of money.

AWS provides this service of automatic cost savings for a small monitoring and automation fee. The objects are automatically moved between tiers based on the data access patterns. The less or rarely accessed data is automatically transferred to less expensive infrequent access tier classes. When you access the data in infrequent access tier classes, S3 intelligent tiering moves the data to the frequent access tier. Therefore, you get the most optimum performance for the price you pay.

Note: Objects smaller than 128 KB will not be monitored and will always be charged at the Frequent Access tier rates.

Know your S3 charges for requests

Amazon charges for requests made to S3 service, which includes PUT, COPY, POST, LIST, GET, SELECT, Lifecycle Transition, and Data Retrievals requests to objects and buckets. It can be especially costly if you have numerous small files stored in S3 and you frequently access them. Each time you access a file, it counts as a request, and you’ll be charged for it.

Delete S3 incomplete Multipart Uploads

If a multipart upload process is interrupted or not completed for some reason, S3 will keep the parts that have not been completely uploaded as ‘incomplete MPUs’. You will be charged for their storage. To find out how to avoid that, use Storage Lens. Storage Lens has Free and Advanced Tiers. For checking the incomplete multipart uploads, the free tier is enough. Set up a bucket lifecycle policy so a specific bucket won’t have an incomplete MPU storage issue; see Configuring a bucket lifecycle configuration to delete incomplete multipart uploads. You can also achieve this by AWS CLI by creating a JSON file with the name mpu-retention.json, such as below, and then running the command:


{
"Rules": [ { "ID": "MPU Retention", "Status": "Enabled", "Filter": { "Prefix": "" }, "AbortIncompleteMultipartUpload": { "DaysAfterInitiation": 7 } } ] }

aws s3api put-bucket-lifecycle-configuration — bucket YOUR_BUCKET_NAME — lifecycle-configuration file://mpu-retention.json

It would be best if you changed the retention days per your organization’s policy.

Be aware of EBS snapshot storage costs

EBS snapshots are stored in S3 and are charged more than the normal cost of S3 Standard Storage. For instance, looking at data for US East (Ohio), the S3 Standard is charged ~$0.02/GB-month. However, EBS snapshots on S3 are charged at $0.05/GB-month. If you store EBS snapshots in the S3 Archive tier, there is also a recovery fee [EBS pricing]. Even when we delete the EBS volume and the attached EC2 instance, the costs will accumulate for the orphaned snapshots. To get the list of snapshots not linked to any volume, try this:

ORPHANED_SNAPSHOT_IDS=$(comm -23 <(aws ec2 describe-snapshots — owner-ids $AWS-ACCOUNT-ID — query ‘Snapshots[*].SnapshotId’ — output text | tr ‘\t’ ‘\n’ | sort) <(aws ec2 describe-volumes — query ‘Volumes[*].SnapshotId’ — output text | tr ‘\t’ ‘\n’ | sort | uniq))

To get the list of snapshots not linked to any AMIs, do as follows:

ORPHANED_SNAPSHOT_IDS=$(comm -23 <(aws ec2 describe-snapshots — owner-ids $AWS_ACCOUNT_ID — query ‘Snapshots[*].SnapshotId’ — output text | tr ‘\t’ ‘\n’ | sort) <(aws ec2 describe-images — filters Name=state,Values=available — owners $AWS_ACCOUNT_ID — query “Images[*].BlockDeviceMappings[*].Ebs.SnapshotId” — output text | tr ‘\t’ ‘\n’ | sort | uniq))

Configure S3 Lifecycle Policies

Leverage AWS S3 Storage classes based on your organization’s business needs. You should configure AWS S3 Lifecycle Policies to reduce the S3 costs. S3 Inventory Reports generates a report showing objects in the bucket that have not been accessed in a specific period; these can be deleted or moved to Glacier.

Save on S3 retrieval fees by selecting the parts you need for analysis

AWS charges you Data transfer fees for egress to the internet, within the AWS region to your application, or transfer to other AWS services across regions. Other charges apply for transfer out to Direct Connect. When running applications that need to process a massive amount of data (say AI, ML applications), retrieving large objects from S3 will add fast to the AWS Bill.

Instead of retrieving large S3 objects and then running analysis on them, leverage S3 Select capability with SQL integration to retrieve only part of the data set required for analysis. You will save a lot of money on the S3 retrieval fees. See Filtering and retrieving data using Amazon S3 Select.

Data Analysis costs can be reduced multifold if the data resides in S3. Leverage Athena, which can analyze data sitting in S3 using SQL without moving data out of S3 (see Amazon Athena – Interactive SQL Queries for Data in Amazon S3). A similar activity can also be done with AWS Redshift Spectrum, where an analyst can perform SQL queries on data stored in Amazon S3 buckets. This reduces the egress costs for S3.

S3 Bucket Versioning adds costs

S3 versioning is a bucket-level setting that is essential for compliance, disaster recovery, audit, etc., use cases. When you enable versioning for the bucket, the setting applies to all (not some) objects.

Please know about the price implications of enabling object versions. Versioning will result in multiple versioned copies of the same object being stored in S3. The decision to enable or retain object versions needs to be made in the context of the business requirements and ROI.

You should limit the object versions to reduce the costs of S3 storage. Delete previous versions of the objects you do not need. We highly recommend configuring the bucket lifecycle rule for a versioning-enabled bucket. You can also use S3 Storage Lens to see non-concurrent versions of objects (object versions not associated with any parent object as their parent object has been deleted) in your S3 bucket.

Leverage Cross Region Replication (CRR) feature of S3

If you frequently access S3 data from your workload or a service cross-region, consider the Cross Region Replication (CRR) feature of the S3 bucket. 

Consider a simple scenario where your S3 bucket in us-west-2 is accessed by an AWS service or workload in us-east-1. If you are transferring 200 GB of data daily, then you pay:

  • Data transfer charges at $0.02/GB for cross-region data transfer. Therefore, your monthly cost  = 200 GB *$0.02/GB*30 = $120.00.

Instead, use the Cross-Region Replication (CRR) feature of S3 to do a one-time data transfer between S3 buckets from us-west-2 to us-east-1. This will improve performance and significantly reduce the data transfer charges. For the above example, let’s recalculate the cost when we use CRR.

  • You pay a one-time data transfer cost of 200GB * $0.02/GB = $4.00. (S3 bucket from us-west-2 to us-east-1)
  • You don’t pay any data transfer charges from S3 to your workload or service since there is no S3 data transfer charge in the same region.
  • You pay $0.023/GB for storage over a month = 200GB * $0.023/GB = $4.60

Therefore, you can reduce the overall cost from $120.00 to $8.60 by using CRR. A massive reduction of 93%!

Compress your data before storing

We highly recommend you to compress your data before storing it in S3. By compressing a stored object, you are reducing the S3 storage and data transfer costs – remember, with features like replication, versioning, etc., the costs start to balloon fast. Additionally, you get a much better performance! So why not?

Track the number of requests to your bucket

You can monitor requests to your bucket in one or more of the following ways:

  • You can enable server access logging. For more information on how to review access logs, see Amazon S3 Server Access Log Format.
  • You can enable object-level logging using AWS CloudTrail.
  • You can enable Amazon CloudWatch metrics for Amazon S3 requests. Metrics such as AllRequests and BytesDownloaded can help you monitor the requests made to your bucket.

After you understand the requests made to your bucket, you can take measures to reduce your costs from requests. For example, you can prevent unauthorized access or limit public access to your bucket using bucket policies or AWS Identity and Access Management (IAM) policies.

Should you use S3 Gateway VPC Endpoints for within-region access from your VPC?

AWS S3 data transfer charge for egress OUT to the internet is $0.09/GB. You don’t pay these charges if you use the S3 public service endpoint to access your data from within your VPC. Per PrivateLink documentation, when you access S3 from a public subnet, the traffic to Amazon S3 traverses the internet gateway. However, it does not leave the AWS network. Therefore, you don’t pay the data transfer charges of $0.09/GB for egress to the internet in case of accessing S3 from a public subnet. So, there is no benefit from a cost perspective in this scenario – but read further.

You cannot access the S3 public service endpoint from your private subnet. To access S3 from a private subnet, you can either create a NAT gateway or configure S3 Gateway Endpoint. With NAT gateway, you will pay additional charges, while Gateway VPC endpoint is free!

NAT Gateways are expensive. For a region like us-east-1, you pay $0.045/hour + 0.045/GB of data processed. That’s a lot!

Therefore, we recommend that you configure your VPC to use a gateway endpoint if you need to access S3 from private subnet.

Review the cost of individual buckets

Checking out the s3-find-bucket-cost suggestion in the AWS knowledge center. You may have many large buckets you need to identify using S3 Storage Lens. You can access a centralized view of all buckets in your account. You can even configure an AWS Organizations level dashboard to see all buckets in all your accounts.

Write less, or write with a policy to delete automatically

S3 data, once written, is difficult to modify and delete later. After all, who knows who owns the data? Do you know if the data is required or not? Who is going to take this massive task of cleaning up unused objects? The data could sit in S3 for years with no benefit or ROI to the business. It would be best if you took action on a proactive basis. The best case is to write less or have a mechanism built into your applications or processes to delete data once it isn’t required automatically.

Final Note

AWS S3 cost optimization is an ongoing effort your organization should invest in. We are sure there are other opportunities to save on S3. If we still need to include something, please let us know!

Enjoying this post? Don't miss the next one!
Subscribe to get our latest product updates and blog posts.
Please enable JavaScript in your browser to complete this form.
Name
Comments are closed.