Amazon Simple Storage Service (Amazon S3) is an object storage service. Users of all sizes and industries can store and protect any amount of data for virtually any use case, such as data lakes, cloud-native applications, and mobile apps.
Amazon S3 is one of the most inexpensive, scalable and versatile ways to store your data in the cloud. However, if you’re relying on S3 to store, access and transfer large volumes of data, costs and complexity can quickly escalate — leading to thousands of dollars in unnecessary S3 costs.
If you’re struggling with high Amazon S3 costs, there are some best practices you can follow to help. By choosing the right storage classes and managing your S3 data efficiencly, you can significantly advance your AWS S3 cost optimization efforts. First, let’s quickly dive into the basics.
What are the Amazon Simple Storage Service (S3) classes?
S3 Standard Storage Class
S3 Standard - Infrequent Access Tier
S3 One Zone - Infrequent Access Tier
S3 Intelligent-Tiering
S3 Glacier Flexible Retrieval
S3 Glacier Deep Archive
S3 Glacier Instant Retrieval
S3 on Outposts Storage Classes
How does S3 pricing work?
Amazon S3 uses a pay-as-you-go pricing model, without any upfront payment or commitment required. S3’s pricing is usage-based, so you pay for the resource that you’ve used.
AWS offers a free tier to new AWS customers, involving 5GB of Amazon S3 storage in the S3 Standard storage class; 20,000 GET Requests; 2,000 PUT, COPY, POST, or LIST Requests; and 100 GB of Data Transfer Out each month.
After that, here are the main variables that are taken into account when calculating S3 pricing.
Storage Costs
Request and Data Retrieval Costs
Data Transfer Costs
Additional Features and Costs
Amazon S3 pricing also includes charges for management and analytics tools, data replication across regions or within the same region, security and access tools, and costs associated with data transformation and querying through services like AmazonS3 Select.
Furthermore, using Amazon S3 Object Lambda results in charges based on the data processed, and costs can vary significantly with server location and data transfer destinations, particularly when transferring data across different AWS regions.
What are the top 10 best practices for S3 cost optimization?
1. Use Lifecycle Policies
Through the Amazon S3 Management Console, you can set rules to move data to S3 Standard-IA after 30 days if infrequently accessed, and to S3 Glacier Flexible Retrieval after 90 days for rarely accessed data.
Some practical tips are to define expiration actions (such as deleting outdated logs or incomplete multipart uploads after predefined periods) and to implement tagging to categorize data, enabling more granular control in applying lifecycle rules to specific datasets.
2. Delete Unused Data
You will always incur charges for data stored on S3, and you should periodically find and delete data that you’re no longer using or data you could recreate relatively easily if you needed to. Or if you’re not sure about deleting objects forever, you can archive vast amounts of data at a low cost with S3 Glacier Deep Archive.
You can delete one or more objects directly from Amazon S3 using the Amazon S3 console, AWS SDKs, AWS Command Line Interface (AWS CLI), or REST API.
You have the following API options when deleting an object:
- Delete a single object – Amazon S3 provides the DELETE (DeleteObject) API operation that you can use to delete one object in a single HTTP request.
- Delete multiple objects – Amazon S3 provides the Multi-Object Delete (DeleteObjects) API operation that you can use to delete up to 1,000 objects in a single HTTP request.
3. Compress Data Before You Send to S3
You incur Amazon S3 charges based on the amount of data you store and transfer. By compressing data before sending it to Amazon S3, you can reduce the amount of both.
Several effective compression methods can help optimize storage costs and efficiency. Algorithms like GZIP and BZIP2 are widely used for text data, offering good compression ratios and compatibility. LZMA provides even higher compression rates, though it typically requires more processing power. For binary data or quick compression needs, LZ4 is an excellent choice due to its very fast compression and decompression speeds.
Additionally, using file formats like Parquet, which supports various compression codecs, can further optimize storage for complex datasets by enabling efficient columnar data querying and storage.
4. Use S3 Select to retrieve only the data you need
Amazon S3 Select is a service that allows you to retrieve only a subset of data from an object, which can significantly reduce the amount of data retrieved and consequently lower your storage costs. This is particularly useful when dealing with large amounts of non-structured data stored in formats like CSV, JSON, or Apache Parquet.
When you use S3 Select, you can specify SQL-like statements to filter the data and return only the information that is relevant to your query. This means you can avoid downloading the entire file, process it on your application side, and then discard unnecessary data. By doing this, you reduce the data transfer and processing costs.
It’s worth noting that the Amazon S3 console limits the amount of data returned to 40 MB. To retrieve more data, you need to use the AWS CLI or the API.
5. Choose the right AWS Region and Limit Data Transfers
Selecting the right AWS region for your S3 storage can have a significant impact on storage costs, especially when it comes to data transfer fees.
Data stored in a region closer to your users or applications typically reduces latency and transfer costs, because AWS charges for data transferred out of an S3 region to another region or the internet.
AWS Cloud Cost Allocation: The Complete Guide
6. Consolidate and Aggregate Data
Consolidating and aggregating data before storing it on S3 can lead to significant cost savings, especially for use cases involving analytics and data processing.
By combining smaller files into larger ones and aggregating similar data types, you can optimize storage utilization and reduce the number of requests made to S3, which in turn minimizes costs associated with PUT, GET, and LIST operations.
Some examples include batching small files (as fewer, larger files reduces overhead and costs) and aggregating data at the source before uploading (for example, if you’re collecting log data, summarize or filter it prior to storage).
7. Monitor and Analyze Usage with S3 storage lens
Amazon S3 Storage Lens is a storage analytics tool that helps you visualize and manage your storage usage and activity across your S3 objects. With its dashboard and metrics, you can gain insights into operational and cost efficiencies within your S3 environment. You can use S3 Storage lens to:
Identify cost drivers: S3 Storage Lens provides metrics on data usage patterns, enabling you to pinpoint high-cost areas. For example, you can identify buckets where data retrieval is frequent and costs are high, or find buckets with stale data that could be moved to a cheaper storage class or deleted.
Optimize storage distribution: The dashboard allows you to see how data is distributed across different storage classes. You might find opportunities to shift data from S3 Standard to Infrequent Access tiers if the access patterns support such a move, reducing costs significantly.
View metrics on replication and data protection, helping ensure you’re not overspending on redundancy for non-critical data.
Monitor access patterns: By examining access patterns, you can adjust your storage strategy to better align with actual usage. If certain data is accessed infrequently, you can automate the transfer of this data to lower-cost storage classes using lifecycle policies.
Set Customizable metrics and alerts: You can configure Storage Lens to send alerts when certain thresholds are met, such as an unexpected increase in PUT or GET requests, which could indicate an inefficiency or a potential issue in your S3 usage pattern.
8. Use Requestor Pays
Enable the Requestor Pays option to shift data transfer and request costs to the user accessing your Amazon S3 bucket data. This feature is particularly useful if you host large datasets publicly and want to avoid bearing the cost of data egress.
For example, you might use Requester Pays buckets when making available large datasets, such as zip code directories, reference data, geospatial information, or web crawling data. When enabled, anyone accessing your data will incur the charges for requests and data transfer out of Amazon S3, while you continue to pay for the storage costs. Requestor Pays can be set on a per-bucket basis.
9. Set up IAM to limit access
Set up Identity and Access Management (IAM) to limit access to your Amazon S3 resources effectively. By configuring IAM policies, you can control who can access your Amazon S3 data and what actions they can perform. This is crucial for minimizing unnecessary data access, which can lead to additional costs, especially with operations like PUT and GET requests.
Implement least privilege access by granting permissions only to the extent necessary for users to perform their assigned tasks. You can utilize IAM roles and policies to specify allowed actions on specific buckets or objects. For instance, you might allow a group of users to only read data from a particular Amazon S3 bucket, while administrative access could be restricted to IT staff.
10. Partition your data before querying it
By partitioning data into segments based on specific keys such as date, time, or other relevant attributes, you can enable query services like Amazon Athena or Amazon Redshift Spectrum to scan only pertinent parts of your data.
You can start by defining partition keys that align with common query filters, such as time-based keys (year, month, day) or geographic identifiers (country, region). This approach ensures queries are more efficient, accessing only the necessary data for analysis.
Additionally, consider implementing a folder structure in your Amazon S3 buckets that mirrors your partitioning strategy, facilitating direct and fast access to subsets of data. Partition management process can be automated with AWS Glue or custom scripts to maintain and update partitions as new data is ingested, keeping your storage organized and cost-effective with less manual effort.
Understand and optimize your cloud costs with nOps
Whether you’re looking to optimize just your S3 costs or your entire cloud bill, nOps can help. It gives you complete cost visibility and intelligence across your entire AWS infrastructure. Analyze S3 costs by product, feature, team, deployment, environment, or any other dimension.
If your AWS bill is a big mystery, you’re not alone. nOps makes it easy to understand and allocate 100% of your AWS bill, even fixing mistagged and untagged resources for you.
nOps also offers a suite of ML-powered cost optimization features that help cloud users reduce their costs by up to 50% on autopilot, including:
Compute Copilot: automatically selects the optimal compute resource at the most cost-effective price in real time for you — also makes it easy to save with Spot discounts
ShareSave: automatic life-cycle management of your EC2/RDS/EKS commitments with risk-free guarantee
nOps Essentials: set of easy-apply cloud optimization features including EC2 and ASG rightsizing, resource scheduling, idle instance removal, storage optimization, and gp2 to gp3 migration
nOps processes over 1.5 billion dollars in cloud spend and was recently named #1 in G2’s cloud cost management category.
You can book a demo to find out how nOps can help you start saving today.