All posts
/ENGINEERING

The Real Cost of Cloud in 2026: How Mid-Stage Startups Are Cutting AWS Bills in Half

AWS bills have gotten quietly absurd. Here's where the money actually goes for mid-stage startups in 2026, the biggest wasteful patterns, and the optimisations that produce 30-50% cuts without sacrificing velocity.

Author
ebita.ai engineering
Published
MAY 20, 2026
Read
10 min
Pie chart showing typical mid-stage SaaS AWS cost breakdown with annotations highlighting optimization opportunities

The VC mood on burn shifted in 2024 and hasn't shifted back. Two years later, the cloud bill is the second-most-discussed line item in board meetings (after headcount), and the FinOps function has gone from "nice to have at Series C" to "needs to exist by Series A." Most mid-stage startups can cut their cloud bill by 30–50% without losing velocity. The optimisations are well-known but inconsistently applied.

This is the playbook we'd run if a Series A or B founder asked "what do I do about the AWS bill" — based on the engagements we've actually run with cost-pressured teams.

Where the money actually goes

Before optimising, you need to know what you're optimising. The typical mid-stage SaaS cost shape we see, as a percentage of total AWS bill:

  • Compute (EC2, ECS, EKS, Fargate, Lambda): 35–50%
  • Data transfer (egress, inter-AZ, NAT): 10–20% (often higher than expected)
  • Storage (RDS, S3, EBS): 10–15%
  • Inference / AI workloads (Bedrock, SageMaker): 5–25% (new and growing)
  • Observability (CloudWatch, X-Ray): 5–15% (often much higher than expected)
  • Managed services (RDS, ElastiCache, OpenSearch, MSK): 10–20%
  • Everything else: 5–10%

The exact shape varies, but two categories almost always over-shoot the team's mental model: data transfer and observability. Teams know they're paying for compute. They are routinely surprised by data transfer and CloudWatch.

The first move in any optimisation engagement is to get this breakdown in writing for your bill. AWS Cost Explorer plus a couple of tags will do it. Without this, you'll spend time optimising the wrong things.

The 80/20 list — biggest wins in order

1. Right-size your compute

The single most common pattern: instances that were sized in 2023 for a peak that never came back, or for a workload that's now half the original size. The same instance types running at 8–12% CPU utilisation. You're paying for capacity you're not using.

The fix is mechanical:

  • Pull the CPU and memory utilisation graphs for every long-running instance.
  • Anything consistently below 30% CPU on a non-burstable type → consider a smaller instance or a t-family burstable.
  • Anything consistently below 50% memory with the rest unused → consider a memory-light variant.

This single optimisation is typically 15–30% of the EC2 bill for teams who haven't done it. The fear is "what if traffic spikes?" — auto-scaling exists, and the math overwhelmingly favours right-sizing the base and letting the autoscaler handle spikes.

2. Savings Plans and Reserved Instances

If you have predictable baseline compute usage — and you do, by definition, if you're past PMF — you should be on Savings Plans for most of it.

Compute Savings Plans give 20–30% off on-demand pricing in exchange for a 1- or 3-year commitment. They're flexible across instance types within the family. For most mid-stage teams, the 1-year plan at 70–80% of baseline utilisation is the right answer — you pocket the discount on the predictable portion without committing capacity you might not need.

The mistake we see: founders who heard "Savings Plans are complex" or "you should wait until you understand your workload" and then are still on 100% on-demand at Series B. The complexity is real but the math is straightforward. A two-hour analysis and a 30-minute commit produces 25% off the compute bill for the next year.

3. Kill the unused stuff

The accumulated detritus of a 3-year-old AWS account is enormous:

  • Unattached EBS volumes from terminated instances.
  • Old EBS snapshots from the AMI you built and never used.
  • S3 buckets with lifecycle rules that should be moving cold data to Glacier but aren't.
  • Long-idle Load Balancers from features that got cut.
  • Elastic IPs not attached to anything.
  • CloudWatch Logs groups with Never Expire retention.

Each item is small. Together, this is reliably 5–15% of the total bill. We've seen teams find a $4,000/month line that turned out to be a forgotten OpenSearch cluster from a 2023 spike that never got terminated.

Run AWS Trusted Advisor plus a tool like Cloudability or Vantage or even just a careful script that lists every billable resource and asks "what is this." Twice a year, formally. The accumulated savings pay for the audit many times over.

4. Tame CloudWatch and observability

CloudWatch can be 15% or more of the AWS bill if you're not paying attention. The drivers:

  • Logs: ingestion is cheap; storage is not. Every application log going to CloudWatch with no retention policy costs money forever.
  • Custom metrics: each unique metric costs more than people think, especially with high cardinality.
  • High-frequency metrics: 1-second resolution is 4x the cost of 60-second resolution.

The optimisations:

  • Set retention on every log group. 14 days is fine for most. Production audit logs may need more, application debug logs need much less.
  • Aggregate high-cardinality metrics before publishing. Per-customer metrics for thousands of customers will quietly destroy your bill.
  • Choose the right resolution. 60-second metrics are sufficient for most use cases.
  • Consider a third-party APM (Datadog, Honeycomb, Better Stack) once your observability volume justifies it — they often work out cheaper than self-managing CloudWatch at scale.

This is often the single biggest "Huh, I had no idea" line item in a cost audit.

5. Inter-AZ and NAT gateway costs

Cross-availability-zone data transfer inside AWS is not free. Two services in different AZs talking to each other are paying $0.01–0.02 per GB transferred. For a chatty microservice constellation, this adds up surprisingly fast.

NAT gateway charges are worse. Every byte going through a NAT gateway is paying both per-hour and per-GB. Teams that route all egress through a single NAT for "security" without thinking about the cost often pay 5–15% of their bill for the privilege.

Fixes:

  • Pin chatty services to the same AZ where availability tolerance allows. Use cross-AZ redundancy for stateful tier (databases) where the cost is justified by reliability.
  • VPC endpoints for S3 and DynamoDB so traffic doesn't go through NAT. This is free, takes an hour, and saves real money.
  • Audit your NAT bill. If it's surprisingly high, find out what's egressing — it's often an analytics SDK, a misconfigured monitoring agent, or a logging shipping pattern.

6. RDS — the biggest opportunity nobody touches

RDS bills are often the largest single line item. The optimisations:

  • Right-size the instance. Same logic as EC2. Most prod databases run at 15–30% CPU; they could be on smaller instances or smaller IOPS tier.
  • Use Reserved Instances. The discount is similar to compute savings plans and the math is identical.
  • Move dev/staging to smaller instances that scale down when not in use.
  • Aurora I/O-Optimized vs Standard — for IO-heavy workloads, the Optimized tier is cheaper net of IO charges.
  • Storage right-sizing. Storage that grows automatically often grows past what you need.
  • Backup retention — every snapshot has a cost. Keep what's required for RPO; expire the rest.

A focused week on RDS is often 25–40% of the RDS bill.

7. Inference costs — the new one

Inference is the cost line that didn't exist for most companies in 2022 and is now 5–25% of the bill. The optimisations:

  • Use prompt caching. Most providers now offer it. For RAG-heavy workloads, this is a 40–70% saving on the inference line.
  • Route by complexity. Simple tasks shouldn't go to the most expensive model. A small router that picks between Haiku/Mini/Flash for easy work and the larger models for hard work is a 30–50% win.
  • Cache aggressively at the application layer. Don't call the model twice for the same input.
  • Batch where possible. Batch APIs from providers can be 50% cheaper than realtime.
  • Watch the embedding bill. Embedding all of S3 to "build a vector store" before knowing what you'll search is a classic premature-cost.

This is the line item with the fastest cost growth on most bills, and where 2025-era patterns have lots of room to be optimised by 2026 patterns.

lib/llm-router.ts
typescript
// Two-tier model router. ~70% of traffic hits the cheap model. The eval
// suite gates promotion of new cheap-model versions so a regression
// doesn't quietly degrade quality.
import { Anthropic } from "@anthropic-ai/sdk";
const CHEAP = "claude-haiku-4-5";
const SMART = "claude-sonnet-4-6";
export async function complete(prompt: string, opts: { complex?: boolean } = {}) {
const model = opts.complex || prompt.length > 8000 ? SMART : CHEAP;
const client = new Anthropic();
return client.messages.create({
model,
max_tokens: 1024,
messages: [{ role: "user", content: prompt }],
});
}

8. S3 — quiet, accumulating

S3 costs creep. Every uploaded file lives forever unless you tell it not to. Two optimisations cover most of it:

  • Intelligent-Tiering. S3 will automatically move objects to cheaper tiers based on access patterns. For most bucket use cases this is a free 10–30% saving with no operational change.
  • Lifecycle rules. Move old objects to Glacier or delete them. Set the rule once, save forever.

Avoid the trap of "let's save everything in case we need it later." Storage costs compound. The cost of re-creating data when you actually need it is usually less than the cost of storing it for years.

We spent a week on cost optimisation and got 38% off the bill. The first three days were finding things we'd forgotten existed. The last two days were savings plans and CloudWatch retention. There was no clever architecture work — just the boring stuff we'd been deferring.

Head of Infrastructure/Series C SaaS, $80k/mo AWS

The audit cadence that works

Cost optimisation is a recurring practice, not a one-time project.

  • Weekly: A 15-minute review of the AWS bill change vs the previous week. Anomalies caught here are cheap to fix.
  • Monthly: A 1-hour review of the top 10 cost drivers by service. Trends are visible here; structural changes get planned.
  • Quarterly: A formal cost optimisation review. Right-sizing, savings plans, unused resource clean-up, observability audit.
  • Annually: Architecture-level review. Are the choices we made when we were small still right at this scale?

Teams that run this cadence don't have cost crises. Teams that don't get them quarterly.

When to bring in help

You probably don't need a FinOps consultant if your bill is under $20k/month — the optimisations above are tractable in-house. Once you're past $50k/month and the team isn't actively working on cost, a focused outside engagement usually pays for itself in 30–60 days. The patterns we surface are predictable, and a senior engineer who has done this five times will find the same things in a week that the in-house team would find in a quarter.

The signal that you need help: nobody on the team can confidently answer "what's the biggest cost driver and what's the trend." That means the data isn't being looked at.

Frequently asked questions

Should we move off AWS to save money?

Usually no. The cost saving on the hyperscaler is rarely as large as the migration cost, and the operational dependencies on AWS services accumulate quickly. Stay, optimise. Consider a multi-cloud strategy only if you have specific reasons (latency, regulation, customer contractual requirements).

Is the hyperscaler's startup credit programme worth it?

Yes, take it. Just don't let it become an excuse to defer cost discipline. Teams that build with discipline during the credit period pay for the post-credit transition in a single quarter. Teams that don't are looking at a 3-month margin shock.

Spot instances — yes or no?

Yes, for workloads that tolerate interruption (batch jobs, async work, fault-tolerant services with quick recovery). The savings are 60–90% off on-demand. Not for state-ful single-instance services without checkpointing.

What about Kubernetes — does it help or hurt cost?

Hurts more often than helps for small-to-mid teams. The operational complexity exceeds the savings. Beyond about 20 services or 30 engineers, the cost-of-coordination math flips and k8s starts to make sense. Below that, ECS/Fargate or App Runner is cheaper.


Closing thought

The AWS bill is one of the few line items where a couple of focused weeks of work can produce a permanent 25–40% reduction. It's also one of the few line items where the "we'll get to it" answer compounds against you — every month you defer is a month of overspend that doesn't come back.

If you want a fixed-scope cloud cost audit aimed at startups under $200k/month AWS spend, we offer a one-week engagement that returns a prioritised list of cuts with implementation effort and savings estimates. Most of our audits find more than 25%; most are paid back in under 90 days.

/SHARE