How to Migrate from Datadog APM to OpenTelemetry in One Afternoon
A pragmatic four-hour migration playbook from Datadog APM to OpenTelemetry-native observability — including the gotchas nobody warns you about.
How to Migrate from Datadog APM to OpenTelemetry in One Afternoon
Most "migration guides" for observability are 47 pages of marketing dressed as documentation. This one assumes you have a Node application using the dd-trace package, you want to swap it for OpenTelemetry, and you have an afternoon. Here's the order of operations.
If you'd rather see a comparison of where the migrated traces could land, the Datadog alternative comparison covers the destination options.
Pre-flight (10 minutes)
Open three browser tabs:
- Your current Datadog APM dashboard for one service. Note the top 5 charts your team actually uses.
- Your application's
package.json. Confirmdd-traceis independencies, notdevDependencies. - Your service's start command. Find where the agent is initialized — usually
import 'dd-trace/init.js'at the top ofapp.js, or via aNODE_OPTIONS='-r dd-trace/init'env var.
If you're in containers, find the entrypoint command in your Dockerfile or Helm chart. Same logic applies.
Step 1 — install the OTel SDK (5 minutes)
npm install @opentelemetry/sdk-node @opentelemetry/auto-instrumentations-node @opentelemetry/exporter-trace-otlp-http
Or, if you want auto-instrumentation plus the SecureNow firewall in one package:
npm install securenow
(This is the one I'll use for the rest of the post because it collapses six dependencies into one. Pick whichever you prefer.)
Step 2 — replace the start command (2 minutes)
Find the line where dd-trace is loaded. There are usually two places it can be:
// Option A: explicit import at the top of app.js
import 'dd-trace/init.js';
// Option B: NODE_OPTIONS env var
NODE_OPTIONS='-r dd-trace/init' node app.js
Replace whichever you have with:
node -r securenow/register app.js
# or:
node -r @opentelemetry/auto-instrumentations-node/register app.js
That's the actual swap. Restart the service. You're now emitting OpenTelemetry spans instead of Datadog's proprietary format.
Step 3 — point at a destination (5 minutes)
The default OTLP endpoint is http://localhost:4318 (the OpenTelemetry collector). If you don't have one running, set:
SECURENOW_API_KEY=snk_live_... # if using SecureNow
# or:
OTEL_EXPORTER_OTLP_ENDPOINT=https://your-otlp-endpoint
The first option (SecureNow) is the simplest because it's a managed destination with a free tier. The second is what you'd do if you're standing up SigNoz, Grafana Tempo, or any self-hosted OTLP backend.
If you want to keep Datadog as the destination during the migration (so you don't have to rebuild dashboards immediately), Datadog accepts OTLP at:
OTEL_EXPORTER_OTLP_ENDPOINT=https://trace.agent.<region>.datadoghq.com
OTEL_EXPORTER_OTLP_HEADERS=DD-API-KEY=<your-datadog-api-key>
This is the path most teams take: data layer → OpenTelemetry; destination → still Datadog (for now).
Step 4 — verify (15 minutes)
Hit the service with a few requests. In your destination dashboard, look for:
- A trace with the right
service.name(matches your app) - Spans for HTTP requests, with the right path and status code
- Database spans, if applicable
- Error events on traces that 500'd
If something is missing, the most common cause is that the SDK loaded after your framework started. The -r flag is critical because it loads the OTel SDK before any of your require() calls patch the framework. If you skip the -r flag and try to import OTel at the top of app.js, you'll see partial coverage.
Step 5 — port custom traces (30 minutes)
If you have hand-written dd-trace API calls, they need to be rewritten to OpenTelemetry's API. The good news: the concepts map 1:1. The bad news: the symbol names differ.
| Datadog | OpenTelemetry |
|---|---|
tracer.startSpan('op') | tracer.startSpan('op') |
span.setTag('key', 'value') | span.setAttribute('key', 'value') |
span.finish() | span.end() |
tracer.scope().activate(span, fn) | context.with(trace.setSpan(context.active(), span), fn) |
Search the codebase for dd-trace imports and tracer. references. Replace with the OTel equivalents. A team with 30 hand-written spans typically finishes this in 30–45 minutes.
Step 6 — port the dashboards (90 minutes)
This is the biggest chunk and it's mostly clicking, not engineering. List the 5–10 dashboards your team actually uses (be honest — the count is always lower than people think). For each:
- Latency histograms → query the same data in your new tool's chart language
- Error rate → either a status-code histogram or an exception count
- Throughput → request count per minute
- Top endpoints → group by
http.target(OTel's standard attribute name)
If you're moving to PromQL-based tools (Grafana, Mimir, SigNoz), most of the queries are one-line PromQL. If you're moving to a SQL-based tool (SecureNow, ClickHouse-backed), the queries are SQL on the spans table.
Step 7 — port the alerts (45 minutes)
Same logic as dashboards. The actual alert rules (P99 > 500ms for 5 minutes, error rate > 1%, etc.) should be identical — only the query syntax changes. For most teams, alerts are 10–20 rules; budget two minutes per rule for finding the right metric and setting thresholds.
Step 8 — flip the agent off (5 minutes)
Once the new destination has 24–48 hours of data and you're confident, disable the Datadog agent's tracer:
DD_APM_ENABLED=false
Or remove the agent install entirely if you also want logs and infrastructure metrics off Datadog. The agent will keep running for log forwarding until you replace that too — which is a separate migration with its own checklist.
What goes wrong (and how to fix it)
"Spans missing entirely." The -r flag loaded after your framework. Confirm your start command actually has -r securenow/register or -r @opentelemetry/auto-instrumentations-node/register before the script name.
"Spans are there but missing attributes." OTel's auto-instrumentation respects framework conventions; Datadog's was more aggressive about attaching custom attributes. If a specific attribute is missing, you can add it in a span processor or set it manually on the active span.
"Errors aren't linked to traces." OTel's exception recording uses span.recordException(err). If you have global error handlers (Express's error middleware, etc.), OTel auto-instrumentation patches them, but custom error pipelines may need an explicit call.
"Trace propagation broken between services." OpenTelemetry uses W3C traceparent headers by default; Datadog uses x-datadog-trace-id. If you have two services where one is OTel and one is still on Datadog, configure both to support both headers — most SDKs allow this with a propagators config flag.
The afternoon's actual time budget
Realistic per-service: 30–60 minutes for the SDK swap and verification. Multiply by the number of services. Add 90 minutes for dashboards (one-time, not per-service). Add 45 minutes for alerts. Add a coffee break.
For a 5-service stack: ~5 × 45 minutes = 3.75 hours of focused work, plus 90 minutes of dashboard/alert work, plus parallel verification = ~6 hours. One engineer, one afternoon.
For a 50-service stack: do it 5 services at a time over a sprint. Don't do them all at once — the parallel verification is what makes it safe.
After the migration
You're now OpenTelemetry-native. The destination question — Datadog, SecureNow, SigNoz, Grafana Cloud — is now a separate decision, and changing it is a config-flag change rather than a rewrite. For most teams the second-order benefit (negotiation leverage at renewal) pays for the migration on its own.
Related
Frequently Asked Questions
How long does a Datadog-to-OTel migration actually take?
For a single Node service: 30 minutes to 2 hours including verification. For a 10-service stack with shared instrumentation: roughly an afternoon. The time-consuming part is dashboard recreation, not the SDK swap.
Can I keep my Datadog dashboards?
Not directly. Datadog's query language doesn't translate to OTLP or PromQL. Plan to rebuild the 5–10 dashboards your team actually uses; ignore the rest. Most teams find this is the biggest single chunk of the migration.
What about my custom Datadog metrics?
OpenTelemetry has a metrics API; the migration is one-to-one for counters, gauges, and histograms. The catch is that you need to also migrate your alerting rules to whatever destination understands the new metric names.
Will I lose data during the cutover?
Not if you run both in parallel for 48 hours. Standard practice: install OTel alongside the Datadog agent, send traces to both, verify, then disable the agent. Zero-downtime is achievable for any team that's careful.
Recommended reading
If your team uses Sentry for frontend errors and needs backend distributed tracing without doubling the Sentry bill, here's the OpenTelemetry path that doesn't make you choose.
May 9Five approaches to bot blocking in Express, ranked by effort vs. effectiveness. From a 5-line allowlist to a full IP-reputation firewall — all without Cloudflare, AWS WAF, or any new infrastructure.
May 9Fastify hooks (onRequest) and the SecureNow preload both work cleanly. Here's the production setup for IP blocking and user-agent filtering.
May 9