Why should S3 Customers start with Intelligent Tier?
AWS Customers rely on S3 heavily as an object store, data lake, repository for all things including snapshots, backups, configurations, prebuilt code artifacts etc. Since the default S3 storage access tier is Standard and as things start to accumulate, the cost also starts increasing based on the amount of used storage. This post goes into why Intelligent Tier ("INT") should be considered as first choice for storage class.
About Intelligent Tier
S3 has multiple storage classes based on the access patterns and price.
- Standard
- S3 Intelligent Tier
- Standard-Infrequent Access
- One Zone Infrequent Access
- Glacier
- Glacier Deep Archive
The Intelligent Tier class (introduced in 2018) is designed to optimize costs by automatically moving data to the most cost-effective access tiers based on the access patterns. For a small added cost of monitoring objects and automation fee (monthly charge of $2.5 for every million objects monitored), Intelligent tier would automatically move the objects between standard and infrequent access tiers so users end up paying lot less compared to letting things accumulate in standard tier and higher cost. This was aimed against objects that had unknown or changing access patterns. IT can be configured at bucket, prefix or object tag level.
Why Intelligent Tier and Why Now?
S3 allows lifecycle policies to transition objects between various storage classes. The lifecycle policy configuration can be applied against all objects in a S3 Bucket or use a filter based on tags and or prefixes for applying to a specific set of objects. Since S3 lifecycle policy does not support providing access time as a filter, users have to know which objects has not been accessed recently or their access patterns if they want to move from Standard to Infrequently Accessed Storage class or other storage classes. Intelligent Tier certainly helps in these situations when dealing with unknown or changing access patterns.
For customers with clear knowledge of the access patterns, they can avoid Intelligent tier costs and set things up to be moved directly to the various tiers (move files to Infrequent access after a week or two, move things to Glacier after a month or some event etc). This means the onus of monitoring and applying transition between tiers is left to the user and an understanding of the usage patterns. It also requires practicing good hygiene (with some help of automation) to realize this.
In the absence of active monitoring and oversight, most objects just tend to stay and accumulate in the Standard tier, even if the objects are not being accessed frequently or really need to be in Standard tier which translates to higher storage bills.
Intelligent tier makes it one lesser housekeeping task to keep track of the objects and moving them from Standard to Infrequent access tier for both known or unknown access patterns while dealing with multitude of ever-growing number of S3 buckets and objects.
Avoid Access Fees
A different problem arises if the customer moves the objects to Infrequent Access but ends up accessing it more heavily leading to access cost. S3 does not support transitioning from IA to Standard storage class. For such changing access patterns, transitioning over to Intelligent Tier can help avoid the access fee while transparently moving the objects between infrequent and frequent access tiers based on access patterns.
Cold Storage and Archive Instant Access Support
Even though Intelligent Tier was useful for most customers, there were some use cases where the object can be potentially stored in cold storage - as archives in Glacier for even more savings. In Nov of 2020, S3 announced support for Intelligent Tier to span across both hot, warm and cold data by way of supporting transitions to Archive access tiers. Users can save files and move them across the entire breadth of S3 storage tiers based on their access patterns - hot (via Standard), warm (via Infrequent access), and cold (via S3 Glacier Archive Access and Deep Archive Access). The transition has minimum days requirements (Standard to Infrequent Access is minimum 30 days, IA to Archive after 90 days, Archive to Deep Archive 180 days). Both Standard and Infrequent access tiers provide the low latency and high throughput performance while the archives tier provide same performance and pricing as S3 Glacier Storage class.
In Nov of 2021, S3 announced additional support for Archive Instant Access in Intelligent Tier which immediately allows customers to instantaneously access archives for those rare circumstances when one cannot wait for lengthy access. This new option allows customers the same fast access from hot, warm and cold storage layers as needed and associated cost savings as Intelligent Tier manages the transition of the objects between the various tiers based on access patterns. This also correlates with the introduction of a Glacier Instant Retrieval (moving the former Glacier to the new Glacier Flexible Retrieval storage class) storage class.
Many customers would like to move things into archives to save costs but are unsure when there would be a demand or need to retrieve data suddenly (like end customer wants to look at old data all of a sudden) and hence are unable to realize cost savings provided in the archive class. Now with the introduction of Archive Instant Access and automated management of the objects within the Intelligent tier, they can both access the object with very low latency and also save on cost.
The automation and monitoring aspect of Intelligent Tier also removes the hassle of knowing the access patterns for objects getting stored in the various buckets, setting up lifecycle policies or other housekeeping actions to move the objects across various tiers.
Refer to the S3 pricing page for more details.
Intelligent Tier is straightforward when starting with a fresh bucket and unknown access patterns for the object residing in them. Simply set the storage class to INTELLIGENT_TIERING when making a put-object API call. Here’s an example of put-object using the CLI (all SDKs provide the same functionality):
Request
Response
Existing Buckets
If the number of objects in an existing bucket is large (millions of objects), it would be better to first run the S3 Storage analytics to get an understanding of the access patterns and object size and age distributions and then come up with either cleanup, consolidation or movement between tiers. This would reduce the time, storage and overall cost if most objects can be deleted or consolidated into compressed files, and moved immediately to IA or archive tiers. Use S3 Batch to consolidate, cleanup, filter or tag objects as necessary and then use lifecycle policies to transition to Intelligent Tier. Once objects are transitioned to Intelligent Tier either via direct selection or lifecycle changes, one would be able to see associated IT monitoring costs show up in Cost Explorer as well as Cost and Usage reports.
Cost explorer will display Intelligent Tier related transition cost as <region>-Requests-Tier4. Refer to AWS S3 usage report for more details on various S3 cost usage types.
Similarly, one can use Athena to query Cost and Usage Reports for S3 costs (refer to Athena for querying CUR):
Now that the objects have transitioned to Intelligent Tier, what if the actual object is not in Frequent Access but in Infrequent Access tier?
While objects stored in Infrequent Access tier has the same durability (99.999999999%), low latency and throughput as the Frequent Access tier, Infrequent Access has lower 99.9% availability compared to 99.99%. For the 0.1% when data is not available, you will receive a 500 Internal Error indicating that S3 was unable to handle the request at that time. However, the object will typically be available on retry of the GET within milliseconds. All AWS SDKs have a built-in retry logic such as the ClientConfiguration class in Java which simplifies your implementation.
What if objects are in one of the Storage classes?
You can query the status of an object using the S3 HEAD Api call and see the storage class.
If the object is in Archive, it would show up as below:
If its undergoing restore, then an additional entry (
Few things to remember when enabling Intelligent Tier:
Key is to decide whether the objects would meet the above-mentioned minimum size and duration requirements before using the Intelligent Tier.
While Intelligent tier can help with more efficient access and better cost optimization when dealing with unknown access patterns, its still incumbent on the owner of the buckets to maintain S3 hygiene through:
We saw how Intelligent Tier can help reduce cost for most S3 storage customers as well as details around when, what and how to go about utilizing it. Hope this helps AWS Customers make informed decisions and leverage Intelligent Tiering to operationalize cost and efficiency from S3 Storage.
Cost Comparison and Benefits
The biggest cost benefit is the automated savings in moving from Standard tier to Infrequent tier based on access patterns and also moving things to archive once the objects are not really required for fast access or leverage the archive instant access if fast access is still needed from archived data. In addition, there is no access cost when accessing the Infrequent tier (Standard tier has no access cost). S3 Intelligent Tiering customer have realized cost savings of up to 40% for infrequent access compared to frequent access and the archive access tiers can help reduce storage costs up to 95% for rarely accessed objects, while the archive instant access can still reduce up to 68% compared to infrequent access.The automation and monitoring aspect of Intelligent Tier also removes the hassle of knowing the access patterns for objects getting stored in the various buckets, setting up lifecycle policies or other housekeeping actions to move the objects across various tiers.
Refer to the S3 pricing page for more details.
Adopting Intelligent Tier
New BucketsIntelligent Tier is straightforward when starting with a fresh bucket and unknown access patterns for the object residing in them. Simply set the storage class to INTELLIGENT_TIERING when making a put-object API call. Here’s an example of put-object using the CLI (all SDKs provide the same functionality):
Request
aws s3api put-object --bucket test-cur-report-bucket --storage-class INTELLIGENT_TIERING --key testcurreports/year=2020/month=4/sample.pdf --body sample.pdf
Response
{
"ETag": "\"k495igh39485hbdeur74586012345e2m\"",
"VersionId": "MOOHGESP_XyVpnZL0JhQs4YXPAmRSppu"
}
Existing Buckets
If the number of objects in an existing bucket is large (millions of objects), it would be better to first run the S3 Storage analytics to get an understanding of the access patterns and object size and age distributions and then come up with either cleanup, consolidation or movement between tiers. This would reduce the time, storage and overall cost if most objects can be deleted or consolidated into compressed files, and moved immediately to IA or archive tiers. Use S3 Batch to consolidate, cleanup, filter or tag objects as necessary and then use lifecycle policies to transition to Intelligent Tier. Once objects are transitioned to Intelligent Tier either via direct selection or lifecycle changes, one would be able to see associated IT monitoring costs show up in Cost Explorer as well as Cost and Usage reports.
Cost explorer will display Intelligent Tier related transition cost as <region>-Requests-Tier4. Refer to AWS S3 usage report for more details on various S3 cost usage types.
Similarly, one can use Athena to query Cost and Usage Reports for S3 costs (refer to Athena for querying CUR):
SELECT
bill_payer_account_id,
product_servicecode,
line_item_usage_account_id,
line_item_usage_type,
DATE_FORMAT((line_item_usage_start_date),'%Y-%m-%d') AS day_line_item_usage_start_date,
line_item_usage_start_date as startdate,
line_item_resource_id,
line_item_operation from "cur"."sabhacurreports"
where (year = '2021' ) and month = '4'
and line_item_product_code = 'AmazonS3'
and line_item_product_code = 'AmazonS3'
and line_item_resource_id LIKE 'sabha-cur-report%'
order by day_line_item_usage_start_date DESC
Accessing Objects from Intelligent Tier
Now that the objects have transitioned to Intelligent Tier, what if the actual object is not in Frequent Access but in Infrequent Access tier?
While objects stored in Infrequent Access tier has the same durability (99.999999999%), low latency and throughput as the Frequent Access tier, Infrequent Access has lower 99.9% availability compared to 99.99%. For the 0.1% when data is not available, you will receive a 500 Internal Error indicating that S3 was unable to handle the request at that time. However, the object will typically be available on retry of the GET within milliseconds. All AWS SDKs have a built-in retry logic such as the ClientConfiguration class in Java which simplifies your implementation.
What if objects are in one of the Storage classes?
You can query the status of an object using the S3 HEAD Api call and see the storage class.
aws2 s3api head-object --bucket test-cur-report-bucket —key
testcurreports/year=2020/month=4/testcurreports-00001.snappy.parquet
{
"AcceptRanges": "bytes",
"LastModified": "2020-07-08T23:53:05+00:00",
"ContentLength": 445162,
"ETag": "\"929a4a1dd099f1a899b74c1779a88de0\"",
"ContentType": "application/octet-stream",
"Metadata": {},
"StorageClass": "INTELLIGENT_TIERING"
}
If the object is in Archive, it would show up as below:
HTTP/1.1 200 OK
x-amz-id-2: FSVaTMjrmBp3Izs1NnwBZeu7M19iI8UbxMbi0A8AirHANJBo+hEftBuiESACOMJp
x-amz-request-id: E5CEFCB143EB505A
Date: Fri, 13 Nov 2020 00:28:38 GMT
Last-Modified: Mon, 15 Oct 2012 21:58:07 GMT
ETag: "1accb31fcf202eba0c0f41fa2f09b4d7"
x-amz-storage-class: 'INTELLIGENT_TIERING'
x-amz-archive-status: 'ARCHIVE_ACCESS'
x-amz-restore: 'ongoing-request="true"'
If its undergoing restore, then an additional entry (
x-amz-restore: 'ongoing-request="true"'
)
would indicate its being restored. Unlike S3 Glacier restore that does a temporary restore unless indicated, an object that was managed in Intelligent Tier would get restored to Frequent access from Archive and would take 30 days of no access to transition back over to Infrequent Access.Things to Keep In Mind
Few things to remember when enabling Intelligent Tier:
- New changes wrt to monitoring and automation charges for objects under 128KB and minimum duration of storage:
- S3 would not charge monitoring or automation charges on objects smaller than 128KB. They will just be treated to reside in frequently accessed tier.
- There is no more minimum storage duration for objects. Earlier a minimum 30 days was required before they either get deleted or moved or get transitioned to different frequent or infrequent access tier. Now no more pro-rated charges for objects deleted within 30 days in Intelligent tier (from Sep 2021).
- The minimum duration for transition from Standard to Infrequent Access, or from IA to Archive (or Archive Instant Access), or from Archive to Deep Archive cannot be under 30 or 90 or 180 days respectively. The transition duration can be increased but cannot be lowered.
- There would be additional 8KB for all objects at S3 standard and 40KB metadata overhead for Glacier.
- S3 Batch can also be used to delete/cleanup unnecessary object tags (as tags do constitute cost and can add up as objects accumulate).
Key is to decide whether the objects would meet the above-mentioned minimum size and duration requirements before using the Intelligent Tier.
Maintaining S3 Hygiene
While Intelligent tier can help with more efficient access and better cost optimization when dealing with unknown access patterns, its still incumbent on the owner of the buckets to maintain S3 hygiene through:
- Periodic review of the objects (leverage S3 Storage Lens)
- Using tags and partitions to manage tracking and cost allocation of objects. Also check out S3 best practice design patterns.
- Apply policies to clean up unwanted objects, or move objects directly to IA or other tiers rather than incurring additional monitoring cost with Intelligent Tier (using Lifecycle transition or expiration policies) when usage pattern is known.
- Consolidate or combine multiple related files at regular intervals (whenever possible) to reduce per object fee as things start to accumulate. The cost for both storage and retrieval of small objects adds up considerably (due to additional metadata overhead) when things are moving into archives (Glacier). So, consolidation is key.
- Clean up partial or incomplete multi-part uploads (using Lifecycle policies or listing and cleaning). S3 Storage Lens also can show details of incomplete uploads.
- Using versioning to support versions and avoid accidental deletion when required; leverage cross-region replication (CRR) backups as appropriate.
Conclusion
We saw how Intelligent Tier can help reduce cost for most S3 storage customers as well as details around when, what and how to go about utilizing it. Hope this helps AWS Customers make informed decisions and leverage Intelligent Tiering to operationalize cost and efficiency from S3 Storage.
This is very insightful. I will send to our architects.
ReplyDelete