With dozens of announcements and blogs published each week, it can be challenging to sift through and understand the most impactful updates in the world of AWS.

That’s why we developed this monthly series highlighting the most notable recent AWS news and thought leadership in FinOps and GenAI, curated by the nOps engineering team.

Find out what’s new, what’s hot, and what’s going to save you money on your AWS bill.

This month includes the general availability of cheaper Graviton4 instances, preventing problematic GenAI behaviors like hallucination, how to not die during IPv6 migration, new MAP incentive credits, new Llama models, and more.

July 2024: The latest in AWS FinOps

First, let’s dive into the updates with the biggest impact on your AWS bill.

It’s now easier and faster to get Migration Acceleration Program (MAP) credits

AWS offers MAP credits to partners migrating onto AWS resources. The latest changes help streamline the process, with a simplified approval workflow and new Strategic Partner Initiatives (SPIs) among the updates.
AWS MAP dashboard
This article breaks down everything that’s new with MAP — nOps also has a free tool to automatically tag your resources and track migration progress.

AWS Graviton4-powered EC2 R8g instances now generally available

AWS Graviton4-based EC2 instances deliver the best performance and cost efficiency for a broad range of workloads running on EC2. These instances deliver up to 30% better performance compared to Graviton3-based instances.

These long-awaited instances were originally announced 7 months ago at re:Invent — read the official release here.

Graviton4 Chips
Graviton4 chips (image source: AWS)

AWS Glue Studio now offers a no-code data preparation authoring experience

AWS Glue Studio Visual ETL has released “data preparation authoring”, a new no-code data preparation tool with a spreadsheet-style UI for business users and data analysts. This enables scalable data integration jobs on AWS Glue for Spark, simplifying data cleaning and transformation for analytics and ML.

This allows you to scale up data preparation jobs to process petabytes of data at a lower price point for AWS Glue jobs.

GenAI news & updates in July

July was a big month in GenAI, including:

Llama 3.1 405B, 70B, and 8B models from Meta now in Amazon Bedrock

Llama 3.1 405B Instruct model
Llama 3.1 405B Instruct model (image source: AWS)
Llama 3.1 models, Meta’s most advanced and capable models to date, are now on Bedrock. The Llama 3.1 models are a collection of 8B, 70B, and 405B parameter size models that demonstrate state-of-the-art performance on a wide range of industry benchmarks and offer new capabilities for your generative artificial intelligence (generative AI) applications.

Higher throughput and lower costs on SageMaker

This new inference capability for SageMaker delivers up to ~2x higher throughput while reducing costs by up to ~50% for generative AI models such as Llama 3, Mistral, and Mixtral models.

For example, with a Llama 3-70B model, you can achieve up to ~2400 tokens/sec on a ml.p5.48xlarge instance v/s ~1200 tokens/sec previously without any optimization. This can help you apply optimization techniques like speculative decoding, quantization, and compilation to your generative AI models.

Speculative decoding vs non speculative decoding
Speculative decoding vs non speculative decoding (image source: AWS)
You can read more in the accompanying blog part 1 and part 2.

How to choose a database for your GenAI applications

This great post explores the key factors to consider when selecting a database for your generative AI applications.

This includes high level considerations like familiarity, ease of implementation, scalability, performance as well as unique service characteristics of the fully managed databases with vector search capabilities currently available on AWS.

Retrieval Augmented Generation Diagram
Retrieval Augmented Generation (image source: AWS)

Making Amazon Bedrock models better and less problematic

Multi-modal generative AI capabilities of Amazon Bedrock provide an alternate, easy on-ramp into the world of image analysis, object recognition and more.

However, issues like hallucinations, bias, lack of safety controls can threaten the efficacy of models — check out these recent relevant articles.

Accelerate your GenAI distributed training workloads with the NVIDIA NeMo Framework

NVIDIA NeMo is an end-to-end cloud-centered framework for training and deploying generative AI models with billions and trillions of parameters at scale.

The NVIDIA NeMo Framework provides a comprehensive set of tools, scripts, and recipes for each stage of the LLM journey, from data preparation to training and deployment. You can deploy and manage it using either Slurm or Kubernetes orchestration platforms — read more here.

NVIDIA NeMo framework
NVIDIA NeMo framework (image source: AWS)

In other key AWS news from the past month…

New EventBridge console dashboard

The new console dashboard surfaces account level metrics, providing deeper insight into your event-driven applications and allowing you to quickly identify and resolve issues as they arise.

You can use the dashboard to answer basic questions such as “How many Buses and Pipes have I configured in my account?”, “What was my PutEvent traffic pattern for the last 3 hours?” or “What is the concurrency of my Pipe?”. It is available by default in the EventBridge console in all AWS Regions.

Amazon OpenSearch Service announces Natural Language Query Generation for log analysis

This feature lets you ask log exploration questions in plain English, which are then automatically translated to the relevant Piped Processing Language (PPL) queries and executed to fetch the requested data.

This opens up log analysis to a wider set of team members who don’t need to be proficient in PPL — they can simply explore their log data by asking questions like “show me the count of 5xx errors for each of the pages on my website” or “show me the throughput by hosts”.

New controls make it easier to search, filter, and aggregate Lambda function logs

AWS Lambda has announced advanced logging controls that enable you to natively capture logs in JSON structured format, adjust log levels, and select the Amazon CloudWatch log group for your Lambda functions.

Amazon ECS now enforces software version consistency for containerized applications

Amazon ECS now automatically enforces software version consistency for services created or updated after June 25 2024, running on AWS Fargate platform version 1.4.0 or higher and/or version v1.70.0 or higher of the Amazon ECS Agent in all commercial and the AWS GovCloud (US) Regions.

This helps prevent inconsistencies and enhance the reliability and predictability of deployments.

The best AWS engineering blogs from July 2024

How to Work With IPv6 in AWS and Not Die in the Process

Starting February 1, 2024, AWS began charging $0.005 per IP per hour for all public IPv4 addresses, which were previously free when used with actively running services. This change has been praised by some as a step towards incentivizing good internet hygiene.

While it may not yet be the end of the IPv4 era, many companies are migrating to IPv6 — check out this great article on how to do it as painlessly as possible.

Sample architecture implementing a VPC with a public dual-stack subnet, and a private IPv6 only subnet

Sample architecture implementing a VPC with a public dual-stack subnet, and a private IPv6 only subnet (image source: AWS)

Harnessing Karpenter: Transitioning Kafka to Amazon EKS with AWS solutions

Here at nOps, we’re big fans of Karpenter.

Find out how it can help in complex data environments in this case study — featuring AppsFlyer’s journey to Karpenter and how it unlocked new efficiencies for production Kafka clusters.

Architecture diagram
Architecture diagram (image source: AWS)
Related Content

AWS Cloud Cost Allocation: The Complete Guide

How to tag and allocate every dollar of your AWS spend

Integrate AWS Cost Anomaly Detection Notifications with IT Service Management Workflow

This post explains how to integrate AWS Cost Anomaly notifications with ServiceNow, so you can leverage ServiceNow workflows to review and resolve cost anomalies.
Cost Anomaly Detection Workflow
Cost Anomaly Detection Workflow (image source: AWS)

Enable CloudWatch memory metrics for Windows EC2 instances with Systems Manager

This blog post demonstrates how to reduce the administrative burden of enabling Amazon CloudWatch memory metric monitoring on Windows Server EC2 instances using AWS Systems Manager automation.

Once enabled, you can use it with downstream services like AWS Compute Optimizer for more accurate cost savings recommendations

About nOps

If you’re looking to save on your AWS costs, nOps makes it easy and painless for engineers to take action on cloud cost optimization.

The nOps all-in-one cloud platform features include:

  • Business Contexts: Understand and allocate 100% of your AWS bill down to the container level
  • Compute Copilot: Intelligent provisioner that helps you save with Spot discounts to reduce On-Demand costs by up to 90%
  • Commitment management: Automatic life-cycle management of your EC2/RDS/EKS commitments with risk-free guarantee
  • Storage migration: One-Click EBS volume migration
  • Rightsizing: Rightsize EC2 instances and Auto Scaling Groups
  • Resource Scheduling: Automatically schedule and pause idle resources

nOps was recently ranked #1 with five stars in G2’s cloud cost management category, and we optimize $1.5+ billion in cloud spend for our customers.

Join our customers using nOps to understand your cloud costs and leverage automation with complete confidence by booking a demo today!