Why Your Sales Dropped on Tuesday: How We Built RCA That Actually Diagnoses
How Clevrr's RCA pipeline diagnoses what moved a D2C business this week, in the time it takes to read an email.
How Clevrr's RCA pipeline diagnoses what moved a D2C business this week, in the time it takes to read an email.

The Monday-morning question
Every D2C founder has lived through some version of this Monday morning.
Sales were down 18% over the weekend. Meta spend went up. Google held steady. Orders look about the same volume on Saturday but collapsed on Sun- day. AOV is suspiciously flat. The growth lead pings the analyst. The analyst opens six dashboards, exports two CSVs, and starts the slog.
Four hours later, the answer comes back: “It looks like a mix of higher discounts on Meta-attributed orders and a spike in RTO from Tier-2 cities, plus we lost the Bangalore Sale code that was running last week.”
That answer is correct. It’s also four hours too late, and it’s still missing the part where someone has to decide what to do about it.
This is the problem Clevrr’s RCA pipeline was built to solve — not “summarize my dashboard,” but “tell me what actually moved my business yesterday, the way a good analyst would, in the time it takes to read an email.”
Why “ask the LLM” doesn’t work
The obvious first instinct is to throw the whole metric set at an LLM and ask:
“Why did sales drop?”
We tried it. It’s terrible. Three reasons:
The LLM doesn’t know what to look at first. Without structure, it picks whatever looks most dramatic. A 250% spike on a campaign with 400 spend reads as a bigger deal than a 6% dip on a campaign burning 3 lakh a day. Anyone who has run a D2C business knows that’s backwards.
It hallucinates causal chains. Sales dropped, ad spend was high, so the LLM “concludes” ad fatigue. Sometimes that’s right. Sometimes the real cause was a discount code expiring, and ad spend is incidental. Without a fixed traversal, you can’t tell which it is.
You can’t audit it. If the same prompt produces different reasoning on different days, you can’t trust it, and you can’t improve it.
The lesson we kept relearning: the LLM is a great analyst, but only if you hand it the right slice of data at the right moment. Topology is our job. Judgment is theirs.
The shape of an analyst’s brain
If you watch a senior D2C analyst do RCA, they don’t open a fresh notebook. They follow a tree they’ve internalised over hundreds of investigations.
Net Sales is off. Is it orders, AOV, discount, or refunds? Orders. Is it paid or organic? Paid. Which platform? Meta. Is it spend, reach, click-through, or conversion? CTR is fine, CVR collapsed. Which campaigns? These two, both retargeting. What changed there?
It’s always the same tree, walked top-down, narrowing at each step. The branches are stable. The data underneath them changes daily.
So we hardcoded the tree.
Net Sales (root)
└── Orders
├── Paid Sources
│ ├── Meta → full funnel (Spend → Impr → CPM → CTR → Clicks → CVR → CPC → LPV → A2C → CI → Purchases) → top campaigns
│ └── Google → same funnel → top campaigns
├── Unpaid Sources (Organic, Social, Direct, Other)
└── New vs Returning Customers
├── AOV
│ ├── AOV Bands (<₹500, ₹500-1k, ₹1k-2k, ...)
│ └── Collections
├── Discount
│ ├── Discount Codes
│ └── Discount by Ad Source
└── Fulfillment Loss
├── Cancelled Orders → reasons, products
├── RTO → by city
└── Returns → products, payment type
About 150 nodes once you expand it. Each node is a metric or dimension, with a current value, a previous-period value, and a delta. The tree is the same for every brand, every day. What changes is the data inside it.
This is the single most important design choice in the whole system. It sounds boring. It’s the reason the rest works.
Step 1: Build the tree, fill it in
The tree builder is what does this. Given a brand and a date range, it:
Stamps out the fixed topology — the branches and their leaves are the same every run.
Walks every node that has a metric attached and fetches the current and previous period values from the warehouse via our shared metric engine
— the same path the dashboard uses, so numbers in the report match numbers on screen.
For nodes that are dimensions rather than metrics — AOV bands, dis-
count codes, RTO cities, top campaigns — it runs purpose-built break- down queries that group orders, shipments, or ad rows by the relevant key.
Computes the delta and a parent-delta contribution for each child: “this child explains 34% of its parent’s movement.” That single number is what lets you say “the AOV drop is mostly the 500-1k band shrinking,” instead of just “AOV dropped.”
Reconciles parent and child numbers. The Orders KPI and the sum of Paid + Unpaid Orders have to agree, or the analyst (and the LLM) will lose trust on the first read. So the core sales KPIs — net sales, gross sales, orders, discounts, AOV — all come from one canonical summary query, the same one the children’s breakdowns sit on, instead of two different aggregation paths that can drift apart.
A handful of small but load-bearing things live in this layer:
One adjustment factor per brand. Some brands route partial rev- enue through specific discount mechanics. A per-brand adjustment factor flows into the sales summary so gross sales and discounts are reported consistently with how the brand thinks about them.
Currency vs. count safety. A spend node ( ) and a clicks node (count) can’t be summed into their parent — the units don’t match. The con- tribution calculation explicitly checks the unit matches the parent before computing a percentage. You’d be surprised how easy it is to silently produce nonsense without this.
Best-effort failures. Every breakdown query is wrapped to capture failures and keep going. The RCA run that ships at 7 AM doesn’t get to fail because one of fifteen subqueries returned malformed JSON.
By the end of this step, we have a fully-populated tree: every node has a number, a delta, a contribution share, and a formatted display value. No LLM has been called yet.
Step 2: Let the analyst look at the tree
Now the LLM enters. But not to ask “what happened?” — to ask “which of these movements are worth talking about?”
This is the context selector. It walks the tree depth-first and, for each of the four top-level branches (Orders, AOV, Discount, Fulfillment), sends the LLM the full branch and the root context — the business scale — and asks: which of these nodes deserve a place in the narrative?
The prompt is built around three explicit ideas, because every one of them is something that goes wrong when you don’t say it out loud:
Numbers, not adjectives. “Orders dropped from 1,245 to 980, delta −21.3%” — not “orders declined.”
Mechanisms, not vibes. If CTR is down, name the reason — creative fatigue, audience saturation, broken landing page — instead of saying “performance is weak.”
Signal vs. noise against the business scale. A 200% jump on a base of 3 orders is noise. A 5% dip on 50,000 orders is the entire story. The LLM has the root metric in its context for exactly this reason.
Each branch is evaluated in its own thread — four workers, one per branch — and each call returns a structured response with an include/skip flag, a confi- dence, a priority, and a reason per node. We persist every one of those decisions into a trace, so when someone asks why a particular node ended up in the report we can show them.
After the four branches finish, there’s a global review pass. The LLM sees the union of selections across all branches plus a sample of rejected nodes, and gets one shot to add anything it missed or drop anything that became redundant once the branches were combined. This is the step that catches “you picked both Meta CTR drop and Meta CVR drop, but the real story is one upstream of the other.”
A few production realities worth calling out:
Prompt hashing. Each call’s prompt is hashed and stored. If we change the prompt, runs from before and after that change are distinguishable — important for evals.
Token accounting. Every call’s input/output tokens are recorded per phase and per branch. Cost per RCA is a number we can monitor, not a guess.
Traces nest correctly. The branch threads carry their parent’s tracing context, so each branch’s span nests under the parent run instead of float- ing off as an orphan. Small thing, makes debugging multi-branch runs sane.
Streaming variant. The same selector has a live mode that emits events node-by-node, for the dashboard view where users watch the analysis as- semble in real time. The static and streaming paths share the exact same selection logic.
Step 3: Turn selections into a report
By this point we have somewhere between 15 and 40 selected nodes — the “this matters” subset of the tree. Two more agents take it from here:
The summary agent writes the narrative. It’s the same agent that powers the interactive chat-based RCA, but called synchronously with the selections pre-formatted as context. So the manual chat experience and the automated email come out of the same brain.
The issues & todos agent turns the narrative into action. For every ad-side selection, we attach the platform’s top campaigns — full nested campaign → ad set → ad shape, with both periods’ metrics — so the agent can produce specific issues like “Retargeting campaign X dropped CTR from 1.8% to 0.9% — ad set Y is the culprit, three creatives below
benchmark” instead of “Meta performance declined.”
Both get rendered into the email template, which ships at the scheduled time, alongside a run log that gets persisted for the dashboard’s audit view.
What this is actually solving
The framework for RCA in performance marketing isn’t a mystery. Pull the levers apart — audience, creative, bids, budget, plus everything downstream of the click — figure out which one moved, model what would have happened if it hadn’t, translate that into action. Most growth teams know the framework.
The bottleneck isn’t the framework. The bottleneck is that doing this every day, for every brand, across hundreds of dimensions, faster than the next standup, is a labour problem. And throwing one big LLM call at it doesn’t solve it; it just creates a new failure mode where the answers sound confident and aren’t repeatable.
The split that worked for us is the one that matches how the work actually decomposes:
The LLM is in the loop in the two places where judgment and language live. Everything else is code, because everything else should be code.
What we’d still like to fix
A few things on the list, in case you’re building something similar:
Counterfactuals. The natural next step beyond “what moved” is “what would have happened if you’d done X instead.” We have the data shape to support it; the modelling is still ahead of us.
Variance attribution at the cohort level. New vs. returning is a bi- nary split today. Real cohort decomposition (acquisition channel × month
× LTV band) is where this starts paying real money.
Brand-specific tree extensions. Some brands genuinely need a fifth L1 branch — for instance, anything offline-heavy needs a “Channel Mix” branch. The tree should be configurable per brand without giving up the topology guarantees.
But the foundation — fixed tree, deterministic data, LLM as judgment layer, full decision audit — has held up well enough that we ship it daily, and the four-hour Monday morning is a four-minute email read.
That’s the win we wanted.
Join hundreds of D2C brands using Clevrr AI to automate their growth and efficiency.