Tracking Customer Cost-of-Serve from Your Trace Data

If you can't see which customer is consuming 40% of your CPU, you can't price your enterprise tier. Here's how to derive cost-of-serve per customer from OpenTelemetry traces.

Lhoussine

May 9, 2026·7 min read

Tracking Customer Cost-of-Serve from Your Trace Data

If you sell enterprise tiers, you've had this conversation: a customer wants a 30% discount, your CFO asks what that does to gross margin, and nobody actually knows because per-customer cost is a guess. The data to answer this is in your traces. You just have to compute it.

For broader SaaS observability context see the SaaS observability page and the per-tenant SLOs guide.

The simplest cost model

Start with a single approximation: the cost to serve a customer is proportional to the time your servers spend handling their requests. If a customer accounts for 5% of total request-seconds, they account for roughly 5% of your variable infrastructure cost.

The formula:

customer_cost_share = sum(duration of all spans for this tenant) / sum(duration of all spans across all tenants)
customer_monthly_cost = customer_cost_share × monthly_infrastructure_cost

For a SaaS with $20K/month in cloud infrastructure and a customer accounting for 8% of request-seconds, their cost-of-serve is roughly $1,600/month. If their MRR is $1,500, that customer is unprofitable.

The query

Assuming you've tagged spans with tenant.id (see per-tenant SLOs):

SELECT
  span_attributes['tenant.id'] AS tenant,
  sum(duration_ns) / 1e9 AS total_seconds,
  count() AS request_count,
  sum(duration_ns) / 1e9 / 86400 / 30 AS avg_concurrent_load
FROM otel_traces
WHERE
  span_kind = 'SERVER' AND
  timestamp > now() - INTERVAL 30 DAY
GROUP BY tenant
ORDER BY total_seconds DESC;

This gives you the raw per-tenant compute time. Multiply by your cost-per-second to get dollars.

To compute cost-per-second:

cost_per_second = monthly_infrastructure_cost / (total_request_seconds_across_all_tenants)

For a service running 24/7 on $5K/month of compute serving 50M request-seconds/month, that's $0.0001 per request-second. A customer with 5M request-seconds costs you $500/month in compute alone.

Refinements

The simple model is good enough for first-pass pricing decisions. Three refinements make it more accurate:

Database query cost. Traces include database span durations. Database compute is typically 30–50% of total backend cost; weighting database span time at 1.5× general server time captures this.

Outbound API cost. If you call paid third-party APIs (Stripe, OpenAI, Twilio), those have per-request costs. Tag outbound spans with app.api.cost_usd and sum per tenant.

Storage by tenant. Traces don't include this directly; query your database tables grouped by tenant to get bytes-per-tenant. Most SaaS data costs ~$0.10/GB/month at scale.

A more complete formula:

total_cost = (compute_seconds × $/s) +
             (db_seconds × 1.5 × $/s) +
             sum(api_costs) +
             (gb_stored × $/gb)

Per-customer profitability dashboard

Joined with your billing data:

SELECT
  t.tenant,
  t.total_cost,
  b.monthly_revenue,
  b.monthly_revenue - t.total_cost AS gross_margin,
  (b.monthly_revenue - t.total_cost) / b.monthly_revenue AS margin_pct
FROM (
  SELECT
    span_attributes['tenant.id'] AS tenant,
    sum(duration_ns) / 1e9 * 0.0001 AS compute_cost,
    -- ... add other cost components
    compute_cost AS total_cost
  FROM otel_traces
  WHERE timestamp > now() - INTERVAL 30 DAY
  GROUP BY tenant
) t
JOIN billing_data b ON t.tenant = b.tenant_id
ORDER BY margin_pct ASC;

The customers at the bottom of this list are your problem children. Sometimes they're early-stage customers ramping up usage; sometimes they're plan-mismatched (using enterprise features on a starter plan); sometimes they're outright unprofitable and should be repriced.

What to do with the data

Three concrete uses:

Plan repricing. If your $99/month plan customers are costing you $130/month to serve, the plan is wrong. Either raise the price or move the high-cost customers to a higher tier. Trace data tells you which customers, not just that the plan is bad.

Enterprise contract sizing. When negotiating a custom enterprise deal, you can quote a price floor based on actual cost-of-serve plus a margin target. Beats guessing.

Churn risk identification. Counterintuitively, the customers most likely to churn are often the ones costing you the most — they're using the product hard and any pricing change disproportionately affects them. Cross-reference cost-of-serve with usage frequency and support ticket volume to find at-risk accounts.

The vendor-pricing parallel

The same logic applies in reverse to your APM/observability bill. If your APM vendor charges per-host but only 3 hosts are doing 80% of the actual work, you're overpaying. Per-host pricing is a rough approximation of actual cost-of-serve from the vendor's perspective; usage-based pricing matches it better.

This is one reason SecureNow and similar tools use $/TB scanned: it scales with your real usage, not with infrastructure that may or may not be doing useful work.

The honest limitation

This methodology assumes your cost is proportional to compute time. For most SaaS that's roughly true; for some it isn't.

Storage-heavy SaaS (file hosting, video) — compute time is misleading, you need to track bytes-stored per tenant.
AI/ML-heavy SaaS — GPU time per inference dominates and isn't visible in regular traces. Tag inference spans separately.
Bandwidth-heavy SaaS (CDN, video) — egress cost dominates. Tag spans with response byte counts and weight accordingly.

For the typical web app SaaS, compute time is the right proxy and the simple model works. For specialized cases, refine the cost model with the dominant axis.

Setup time

If you already have OpenTelemetry traces with tenant.id attribution, this is one query and one dashboard panel — about 30 minutes. If you don't have tenant ID tagging yet, that's the prerequisite (instructions) and adds 1–2 hours.

The first time you see your customer profitability sorted by margin, three numbers will surprise you. That's the point.

Frequently Asked Questions

What's customer cost-of-serve?

The cost your business incurs to serve one specific customer — server time, database load, third-party API calls, storage, support burden. Used for pricing decisions, churn analysis, and identifying which enterprise deals are actually profitable.

Why use trace data instead of cloud billing tags?

Cloud billing tags work for shared infrastructure but miss the granular per-request CPU/memory consumption that traces capture directly. Combine them — cloud billing for fixed costs, traces for variable consumption.

What's the simplest cost model?

Total request count × average request duration, weighted by some cost-per-second factor (your blended infrastructure cost / total request seconds). Refine from there.

How accurate is this?

Within 10–20% of true cost for most SaaS. Refine by adding database query cost, outbound API cost, and storage by tenant. The first iteration is good enough to drive pricing decisions.

Tracking Customer Cost-of-Serve from Your Trace Data

Tracking Customer Cost-of-Serve from Your Trace Data

The simplest cost model

The query

Refinements

Per-customer profitability dashboard

What to do with the data

The vendor-pricing parallel

The honest limitation

Setup time

Related

Frequently Asked Questions

What's customer cost-of-serve?

Why use trace data instead of cloud billing tags?

What's the simplest cost model?

How accurate is this?

Recommended reading