AWS S3 Cost Optimization: Best Practices & Guide

Amazon Simple Storage Service (Amazon S3) is an object storage service. Users of all sizes and industries can store and protect any amount of data for virtually any use case, such as data lakes, cloud-native applications, and mobile apps.

Amazon S3 is one of the most inexpensive, scalable and versatile ways to store your data in the cloud. However, if you’re relying on S3 to store, access and transfer large volumes of data, costs and complexity can quickly escalate — leading to thousands of dollars in unnecessary S3 costs.

If you’re struggling with high Amazon S3 costs, there are some best practices you can follow to help. By choosing the right storage classes and managing your S3 data efficiencly, you can significantly advance your AWS S3 cost optimization efforts. First, let’s quickly dive into the basics.

What are the Amazon Simple Storage Service (S3) classes?

Amazon S3 stores data in various classes designed to accommodate different requirements. Choosing the appropriate storage class is essential for S3 cost optimization.

S3 Standard Storage Class

This class is designed for frequently accessed data and provides high durability, performance and availability (data is stored in a minimum of three Availability Zones). Costs are higher for S3 Standard than for other S3 storage classes. S3 Standard is best suited for general-purpose storage for a range of use cases requiring frequent access, such as websites, content distribution or data lake.

S3 Standard - Infrequent Access Tier

This class offers a lower storage cost option for data that is less frequently accessed, but still requires rapid access when needed. It is similar to Amazon S3 Standard but has a lower storage price and higher retrieval price. It is ideal for long-term storage, backups, and as a data store for disaster recovery files.

S3 One Zone - Infrequent Access Tier

While S3 Standard and S3 Standard-IA store data in a minimum of three Availability Zones, S3 One Zone-IA stores data in a single Availability Zone. It is less expensive than S3 Standard-IA, and you might consider using it if you want to save on storage and you can easily reproduce your data in the event of an Availability Zone failing. For example, it is suitable for storing secondary backup copies or other data that can be recreated.

S3 Intelligent-Tiering

This class automatically moves data between Standard and Standard-Infrequent Access access tiers to optimize costs, based on your data access patterns. Suitable for data with unknown or variable access patterns, S3 Intelligent Tiering eliminates the need to analyze and specify the access frequency to save on costs (though it does require a small monthly monitoring and automation fee per object).

S3 Glacier Flexible Retrieval

Aimed at data archiving, this class provides very low-cost storage with retrieval times ranging from a few minutes to several hours. It is well-suited for infrequently accessed data, for which retrieval delays are acceptable.

S3 Glacier Deep Archive

This is the most cost-effective storage option for long-term archiving and digital preservation, where data retrieval times of 12 to 48 hours are acceptable. It is designed for infrequent access data, such as regulatory archives.

S3 Glacier Instant Retrieval

Offers the lowest storage cost for long-lived, rarely accessed data that requires millisecond retrieval times. This archive instant access tier is suitable for storing long-term secondary backup copies and older data that might still need to be accessed quickly, such as certain compliance records or seldom-used digital access.

S3 on Outposts Storage Classes

This storage class is designed for workloads that require S3 data storage on-premises to meet low latency, local data processing, or data residency needs. It offers the same S3 APIs and features for managing data, but allows you to securely store and retrieve data on your AWS Outposts as you would in the cloud, ensuring a consistent hybrid experience.

How does S3 pricing work?

Amazon S3 uses a pay-as-you-go pricing model, without any upfront payment or commitment required. S3’s pricing is usage-based, so you pay for the resource that you’ve used.

AWS offers a free tier to new AWS customers, involving 5GB of Amazon S3 storage in the S3 Standard storage class; 20,000 GET Requests; 2,000 PUT, COPY, POST, or LIST Requests; and 100 GB of Data Transfer Out each month.

After that, here are the main variables that are taken into account when calculating S3 pricing.

Storage Costs

This is the primary component of Amazon S3 pricing. You are charged for the total amount of data stored in your buckets, measured in gigabytes (GB) per month. The rate varies depending on the storage class you choose (e.g., S3 Standard, S3 Intelligent-Tiering, S3 Glacier) and the amount of data stored.

Request and Data Retrieval Costs

AWS charges for the number of requests made to your Amazon S3 buckets, such as PUT, GET, COPY, and POST requests. The cost structure differs based on the type of operation and the storage class. For example, retrieving data from S3 Glacier or S3 Glacier Deep Archive has higher costs due to the nature of the cold storage.

Data Transfer Costs

Transferring data out of Amazon S3 to the internet or to other AWS regions incurs charges. Transfers into Amazon S3 (ingress) are generally free, but egress (outbound) transfers over the free tier limit are charged per gigabyte.

Additional Features and Costs

Amazon S3 pricing also includes charges for management and analytics tools, data replication across regions or within the same region, security and access tools, and costs associated with data transformation and querying through services like AmazonS3 Select.

Furthermore, using Amazon S3 Object Lambda results in charges based on the data processed, and costs can vary significantly with server location and data transfer destinations, particularly when transferring data across different AWS regions.

What are the top 10 best practices for S3 cost optimization?

Besides choosing the right S3 storage class and understanding the pricing model, here are more best practices for Amazon S3 cost optimization.

1. Use Lifecycle Policies

Lifecycle policies in Amazon S3 enable you to automate data management based on predefined rules that dictate when objects are transitioned to another storage class or deleted. This automation helps manage costs by ensuring that data is stored in the most cost-effective manner as its utility changes over time.

Through the Amazon S3 Management Console, you can set rules to move data to S3 Standard-IA after 30 days if infrequently accessed, and to S3 Glacier Flexible Retrieval after 90 days for rarely accessed data.

Some practical tips are to define expiration actions (such as deleting outdated logs or incomplete multipart uploads after predefined periods) and to implement tagging to categorize data, enabling more granular control in applying lifecycle rules to specific datasets.

2. Delete Unused Data

You will always incur charges for data stored on S3, and you should periodically find and delete data that you’re no longer using or data you could recreate relatively easily if you needed to. Or if you’re not sure about deleting objects forever, you can archive vast amounts of data at a low cost with S3 Glacier Deep Archive.

You can delete one or more objects directly from Amazon S3 using the Amazon S3 console, AWS SDKs, AWS Command Line Interface (AWS CLI), or REST API.

You have the following API options when deleting an object:

Delete a single object – Amazon S3 provides the DELETE (DeleteObject) API operation that you can use to delete one object in a single HTTP request.
Delete multiple objects – Amazon S3 provides the Multi-Object Delete (DeleteObjects) API operation that you can use to delete up to 1,000 objects in a single HTTP request.

3. Compress Data Before You Send to S3

You incur Amazon S3 charges based on the amount of data you store and transfer. By compressing data before sending it to Amazon S3, you can reduce the amount of both.

Several effective compression methods can help optimize storage costs and efficiency. Algorithms like GZIP and BZIP2 are widely used for text data, offering good compression ratios and compatibility. LZMA provides even higher compression rates, though it typically requires more processing power. For binary data or quick compression needs, LZ4 is an excellent choice due to its very fast compression and decompression speeds.

Additionally, using file formats like Parquet, which supports various compression codecs, can further optimize storage for complex datasets by enabling efficient columnar data querying and storage.

4. Use S3 Select to retrieve only the data you need

Amazon S3 Select is a service that allows you to retrieve only a subset of data from an object, which can significantly reduce the amount of data retrieved and consequently lower your storage costs. This is particularly useful when dealing with large amounts of non-structured data stored in formats like CSV, JSON, or Apache Parquet.

When you use S3 Select, you can specify SQL-like statements to filter the data and return only the information that is relevant to your query. This means you can avoid downloading the entire file, process it on your application side, and then discard unnecessary data. By doing this, you reduce the data transfer and processing costs.

It’s worth noting that the Amazon S3 console limits the amount of data returned to 40 MB. To retrieve more data, you need to use the AWS CLI or the API.

5. Choose the right AWS Region and Limit Data Transfers

Selecting the right AWS region for your S3 storage can have a significant impact on storage costs, especially when it comes to data transfer fees.

Data stored in a region closer to your users or applications typically reduces latency and transfer costs, because AWS charges for data transferred out of an S3 region to another region or the internet.

To optimize costs, it’s advisable to store data in the region closest to the majority of your users or where your applications are hosted to minimize latency and the cost of data transfer. If your architecture permits, design applications to process data within the same region where it’s stored to avoid cross-region transfer fees. You can check out this full guide to data transfer for more tips.

AWS Cloud Cost Allocation: The Complete Guide

How to tag and allocate every dollar of your AWS spend

6. Consolidate and Aggregate Data

Consolidating and aggregating data before storing it on S3 can lead to significant cost savings, especially for use cases involving analytics and data processing.

By combining smaller files into larger ones and aggregating similar data types, you can optimize storage utilization and reduce the number of requests made to S3, which in turn minimizes costs associated with PUT, GET, and LIST operations.

Some examples include batching small files (as fewer, larger files reduces overhead and costs) and aggregating data at the source before uploading (for example, if you’re collecting log data, summarize or filter it prior to storage).

7. Monitor and Analyze Usage with S3 storage lens

Amazon S3 Storage Lens is a storage analytics tool that helps you visualize and manage your storage usage and activity across your S3 objects. With its dashboard and metrics, you can gain insights into operational and cost efficiencies within your S3 environment. You can use S3 Storage lens to:

Identify cost drivers: S3 Storage Lens provides metrics on data usage patterns, enabling you to pinpoint high-cost areas. For example, you can identify buckets where data retrieval is frequent and costs are high, or find buckets with stale data that could be moved to a cheaper storage class or deleted.

Optimize storage distribution: The dashboard allows you to see how data is distributed across different storage classes. You might find opportunities to shift data from S3 Standard to Infrequent Access tiers if the access patterns support such a move, reducing costs significantly.

View metrics on replication and data protection, helping ensure you’re not overspending on redundancy for non-critical data.

Monitor access patterns: By examining access patterns, you can adjust your storage strategy to better align with actual usage. If certain data is accessed infrequently, you can automate the transfer of this data to lower-cost storage classes using lifecycle policies.

Set Customizable metrics and alerts: You can configure Storage Lens to send alerts when certain thresholds are met, such as an unexpected increase in PUT or GET requests, which could indicate an inefficiency or a potential issue in your S3 usage pattern.

8. Use Requestor Pays

Enable the Requestor Pays option to shift data transfer and request costs to the user accessing your Amazon S3 bucket data. This feature is particularly useful if you host large datasets publicly and want to avoid bearing the cost of data egress.

For example, you might use Requester Pays buckets when making available large datasets, such as zip code directories, reference data, geospatial information, or web crawling data. When enabled, anyone accessing your data will incur the charges for requests and data transfer out of Amazon S3, while you continue to pay for the storage costs. Requestor Pays can be set on a per-bucket basis.

9. Set up IAM to limit access

Set up Identity and Access Management (IAM) to limit access to your Amazon S3 resources effectively. By configuring IAM policies, you can control who can access your Amazon S3 data and what actions they can perform. This is crucial for minimizing unnecessary data access, which can lead to additional costs, especially with operations like PUT and GET requests.

Implement least privilege access by granting permissions only to the extent necessary for users to perform their assigned tasks. You can utilize IAM roles and policies to specify allowed actions on specific buckets or objects. For instance, you might allow a group of users to only read data from a particular Amazon S3 bucket, while administrative access could be restricted to IT staff.

10. Partition your data before querying it

By partitioning data into segments based on specific keys such as date, time, or other relevant attributes, you can enable query services like Amazon Athena or Amazon Redshift Spectrum to scan only pertinent parts of your data.

You can start by defining partition keys that align with common query filters, such as time-based keys (year, month, day) or geographic identifiers (country, region). This approach ensures queries are more efficient, accessing only the necessary data for analysis.

Additionally, consider implementing a folder structure in your Amazon S3 buckets that mirrors your partitioning strategy, facilitating direct and fast access to subsets of data. Partition management process can be automated with AWS Glue or custom scripts to maintain and update partitions as new data is ingested, keeping your storage organized and cost-effective with less manual effort.

Understand and optimize your cloud costs with nOps

Whether you’re looking to optimize just your S3 costs or your entire cloud bill, nOps can help. It gives you complete cost visibility and intelligence across your entire AWS infrastructure. Analyze S3 costs by product, feature, team, deployment, environment, or any other dimension.

If your AWS bill is a big mystery, you’re not alone. nOps makes it easy to understand and allocate 100% of your AWS bill, even fixing mistagged and untagged resources for you.

nOps also offers a suite of ML-powered cost optimization features that help cloud users reduce their costs by up to 50% on autopilot, including:

Compute Copilot: automatically selects the optimal compute resource at the most cost-effective price in real time for you — also makes it easy to save with Spot discounts

ShareSave: automatic life-cycle management of your EC2/RDS/EKS commitments with risk-free guarantee

nOps Essentials: set of easy-apply cloud optimization features including EC2 and ASG rightsizing, resource scheduling, idle instance removal, storage optimization, and gp2 to gp3 migration

nOps processes over 1.5 billion dollars in cloud spend and was recently named #1 in G2’s cloud cost management category.

You can book a demo to find out how nOps can help you start saving today.

AWS S3 Cost Optimization: Best Practices & Guide

What are the Amazon Simple Storage Service (S3) classes?

S3 Standard Storage Class

S3 Standard - Infrequent Access Tier

S3 One Zone - Infrequent Access Tier

S3 Intelligent-Tiering

S3 Glacier Flexible Retrieval

S3 Glacier Deep Archive

S3 Glacier Instant Retrieval

S3 on Outposts Storage Classes

How does S3 pricing work?

Storage Costs

Request and Data Retrieval Costs

Data Transfer Costs

Additional Features and Costs

What are the top 10 best practices for S3 cost optimization?

1. Use Lifecycle Policies

2. Delete Unused Data

3. Compress Data Before You Send to S3

4. Use S3 Select to retrieve only the data you need

5. Choose the right AWS Region and Limit Data Transfers

AWS Cloud Cost Allocation: The Complete Guide

6. Consolidate and Aggregate Data

7. Monitor and Analyze Usage with S3 storage lens

8. Use Requestor Pays

9. Set up IAM to limit access

10. Partition your data before querying it

Understand and optimize your cloud costs with nOps

Tags

Is your AWS bill a big mystery?

Subscribe for Updates

Related Blog Posts

Start now with nOps

Products

Solutions

Resources

Company

Documentation

Solutions

Platform

Resources

Documentation

Company

The Best Tools By Category

Cloud Cost Guides

Karpenter

Commitment Management