When Agents negotiate who pays for the conversation?
- Graham Anderson
- Mar 24
- 16 min read

Series: From Automation Tools to Agentic Orchestration - A Telemetry Guide for the Travel Industry. This follows Part 1b: The Agentic Coordination Stack Arrives.
It’s always good to talk ..
A recent conversation with someone in my network surfaced a question that neither of us could answer. He is working with two organisations that have connected their AI agents via A2A - the Agent-to-Agent protocol that standardises how independent AI systems collaborate regardless of the frameworks they were built on. Both agents are live. Structured requests flow in both directions. It works.
We were comparing notes on telemetry capture - how to tag input tokens, output tokens and system tokens across the integration - when a straightforward question stopped us both, “Who pays?” fast followed by a second question, “Does it matter?”
The answer to "Who pays?" was less dramatic than I expected. What the question forced me to examine was more important. This article frames that thinking.
Right now, each organisation absorbs its own token costs. Organisation A pays for the inference its agent consumes when formulating a request. Organisation B pays for the inference its agent consumes when reasoning about the response. Simple, clean and entirely unexamined. It is not a decision. It is the absence of a decision. Nobody agreed this was the right model. It is just what happens when you build the integration and the token bill arrives.
The moment you ask whether it should work that way, the question cascades. If Organisation B is the service provider, should Organisation A’s query cost be Organisation B’s problem - a new form of distribution cost? What happens when queries do not convert into a progressive action or even a commercial transaction? What happens when the volume of agent-to-agent queries scales to thousands per day, most of which produce no transaction? What happens when the chain extends beyond two organisations to five, six, ten - each burning tokens on coordination that may or may not produce a commercial outcome?
What counting tokens taught me, inference cost as the operating cost of intelligence
I have spent the last month building observability into an AI product pipeline - not at the “did the system return a response” level, but at the level of understanding what the model received, what it relied on and what it ignored. I was motivated to do this as I was testing out my first telemetry article. The work started as an accountability exercise … am I writing some truths or practicing what I preach? Let me take a scenario where if someone disagrees with an AI-generated recommendation, can I explain exactly why the system produced it?
What I did not expect was what token-level telemetry would reveal about the economics of model reasoning. I had assumed token costs were essentially a billing line item - something you optimise by choosing the right model and keeping prompts tight. That is true as far as it goes but it misses something. Token costs are also a signal. A model reasoning with good context - clear inputs, well-structured instructions, relevant data - tends to be cheaper and more accurate. A model compensating for missing or ambiguous inputs consumes more tokens doing so. I did not appreciate this until I saw the same pipeline step vary by 25% in token consumption across different inputs, doing the same job with different source material. Redundancy and vagueness are computationally expensive. When the prompt is unfocused - duplicated context, unconstrained output space, rules restated in three places - the model uses the room it is given. Tighter instructions produced shorter, equally accurate output at lower cost.
More revealing still, inputs can be present in the system but not influential in the model’s reasoning. In one case, I observed a critical data block was correctly looked up, validated, stored in the pipeline and captured in telemetry – however it was never actually injected into the model’s input. The model produced plausible-looking outputs by guessing from adjacent context. Fifty-one automated tests passed. The telemetry caught it because it could see the gap between what was available and what was used.
This matters beyond any single pipeline. It reveals a principle. The cost of inference is not fixed. It varies with the quality of the inputs and the effectiveness of the receiving agent’s prompts. The model does not tell you when it is guessing. It produces outputs with the same fluency regardless of whether it is reasoning from strong evidence or compensating for weak context. Only the telemetry - the detailed record of what went in, what came out and what the model reported about its own reasoning - makes the difference visible.
The hardware industry figured this out before most of the people building software did. Nvidia’s pivot toward inference-optimised chipsets is a bet that the dominant cost of AI at scale is not training but running. Training is writing the recipe book - expensive, done once. Yes there will be improvements and second and third editions to that book. The models will keep evolving. That said, inference is like running the restaurant - every customer, every order, every day. Most organisations focus on the visible capital expenditure of training while the operational cost of inference, which compounds with every AI workload added, is where the real financial exposure sits. At scale, 60–70% of compute spend is inference, not training. An emerging discipline, sometimes (now) called inference economics or FinOps for AI, is forming around exactly this recognition.
This is one required shift from “doing AI” to “being AI prepared.” Treating inference costs as a line item your engineering team manages is “doing AI.” Recognising that inference is the operating cost of your organisation’s intelligence - and that this cost changes fundamentally when your agents start coordinating across organisational boundaries - is the beginning of “being AI prepared.” The moment your agent calls another agent outside of your organisation, the inference cost is no longer just yours. It is distributed and nobody has agreed how to distribute it.
The coordination cost layer
The settlement protocols attracting headlines and investment are solving an important problem: how does money move after agents agree on a transaction? Those rails are arriving. However, the coordination cost - the inference spend that happens before, during and around every agent interaction - sits in a blind spot. Settlement protocols tell you money moved. Coordination protocols tell you agents communicated. Nothing in the current stack tells you what the coordination itself cost or who should bear that cost.
To illustrate why this matters, start with the simplest case - the same kind of bilateral exchange that surfaced the question in the opening conversation - and then follow what happens as the chain gets longer.
The query asymmetry where a single exchange reveals the structure
Take two organisations. A traveller's agent sends a request to a hotel's agent. Even in this simplest case, the cost is not symmetric. It is more complex than it first appears.
Both sides carry input and output token costs. The requestor's agent consumes input tokens (its system prompt, user context, conversation history) and generates output tokens (the structured request). The responder's agent consumes input tokens (its own system prompt, the received request, its inventory and pricing context) and generates output tokens (the reasoned response). The requestor's agent then consumes further input tokens processing the response and generates output tokens interpreting the result. A single round-trip exchange involves at minimum three inference calls.
In practice, the agent layer is not a simple pass-through. It reasons about the request, makes decisions about how to fulfil it, acts by calling structured APIs, databases and pricing engines underneath, observes the result and if the result is insufficient, reasons again. That think-do-observe-think loop is the agentic behaviour that distinguishes this from traditional integration. The infrastructure below the agent remains deterministic and predictable. The coordination cost sits in the reasoning and decision layer on top: the tokens consumed interpreting the request, deciding how to act on it, evaluating whether the result is adequate and structuring the response. That layer is where the variable cost, the quality uncertainty and the absence of an agreed allocation model all live.
How the cost distributes depends on how each side's agent is built. A responder running a fully autonomous reasoning loop, i.e. interpreting the request, evaluating options, iterating until the result is adequate, will consume significantly more tokens than one running a structured workflow that maps requests to predefined steps. In practice, most responder-side agents today sit somewhere between those extremes. A hotel's agent is more likely to query the property management system and return availability than to reason autonomously about whether the room is a good fit for the potential guest. At current mid-tier model pricing, output tokens cost 3–5× more than input tokens, so whichever side does more generative reasoning bears the heavier cost. However, that is an architecture choice, not a fixed law.
The structural asymmetry sits elsewhere: in volume control. In human commerce, the volume of non-converting enquiries is naturally constrained by human attention. A person can only make so many phone calls. In an agentic world, both sides pay tokens for every exchange. The requestor controls the volume. An agent can query a thousand suppliers if the use case justifies it. The responder does not control the inbound volume. A hotel's agent faces queries from potentially thousands of different requestors' agents, each expecting a response. The requestor decides how many queries to send and can budget for it. The responder absorbs whatever arrives. That is the asymmetry that matters. Not cost per call, but control over how many calls you are exposed to.
All this so far is assuming a clean single exchange. If the responder's agent needs clarification - because the request lacked sufficient context or was ambiguously structured - the exchange becomes a multi-round conversation. Each round trip carries a full set of input and output tokens on both sides. What should have been one exchange becomes three or four, with the cost multiplying accordingly. The cheapest A2A exchange is one where the request is clear enough to resolve in a single round trip - which makes input structure a product design decision with direct economic consequences.
At the simplest level, this resembles existing API economics: the provider bears the compute cost of processing requests, manages search-to-book ratios and throttles or bills for overages. However, traditional API calls have predictable per-call costs, return structured data without interpretation and do not trigger clarification loops. An LLM-based agent processing an A2A request has variable cost per call, reasons rather than retrieves and may initiate multi-round exchanges that multiply the cost in ways a standard rate card cannot anticipate. Emerging practices like semantic caching and model routing address the per-call cost but not the cross-organisational allocation question.
The reasoning quality cost is when an input shapes another organisation's bill
Staying with the bilateral exchange. From my own telemetry work, I know that models guess when inputs are insufficient. That guessing can consume more tokens overall or at the very least wastes tokens. In a cross-organisation A2A exchange, one organisation's poorly structured context becomes another's inflated token bill. If the requesting agent sends ambiguous constraints or incomplete preference data, the responding agent spends more inference working harder to reason about the request - and may still produce a suboptimal result. The traveller gets a room that was available rather than the room that was optimal. The hotel misses a chance to match a high-value guest to a premium product.
The reverse is also true. If the responding agent's own reasoning is inefficiently structured - redundant instructions, unconstrained output, poorly scoped context - it inflates its own costs regardless of how well-formed the incoming request was. Neither side can optimise the other's inference economics. Each organisation controls the quality of its own agent's reasoning but bears the consequences of the other's.
This is qualitatively different from deterministic system-to-system integration. When an OTA sends a structured query to a GDS, the response is binary: the data is available or it is not. The price is what it is. There is no interpretation cost. When one LLM-based agent sends a request to another LLM-based agent, the interpretation of that request is probabilistic. The receiving model might weight certain inputs heavily, deprioritise others or ignore them entirely depending on how it was built, what context it carries and the inherent randomness in how language models generate responses. The same query sent to the same agent twice might produce different quality coordination at different token costs. That variability is new.
The compound cost happens when the front doors multiply
The examples so far assume a direct exchange between two agents. In travel, the path from a traveller's intent to a confirmed booking, to days leading up to the traveller’s trip, things that may happen on their trip and post trip, rarely involves just two parties.
Today, when a traveller searches an OTA, the OTA manages the downstream handoff through established pipelines - wholesalers, aggregators, channel managers - using structured, deterministic integrations that already work. In an agentic world, the agent-to-agent exchange sits at the front door of that pipeline. The traveller's agent communicates intent to the OTA's agent (or directly to a supplier's agent) and the existing infrastructure behind that entry point does what it already does. A well-structured intent arriving via A2A could actually reduce the downstream workload by sending more precise queries through the existing pipeline. A poorly structured one creates the same broadcast search the system was already built for, but with token cost added on top.
The cost question beyond a single exchange takes two distinct forms that are worth separating.
The breadth cost, i.e. many independent front doors. A single trip involves exchanges with many different providers across the full lifecycle. The traveller's agent coordinates with an accommodation provider, an insurance agent, a visa advisory agent, an airport transfer agent. On the day of travel, it checks real-time flight status, queries a ground transportation agent. In-trip, it books local experiences, finds an alternative restaurant when the first is full. Post-trip, it manages dispute resolution, loyalty credit reconciliation, feedback. Each of those is a separate A2A exchange at a separate front door with a different provider's agent. Each carries a token cost on both sides and a reasoning quality dimension. Most are independent of each other - the insurance exchange does not feed into the restaurant booking. Multiply fifteen or twenty independent front-door exchanges per trip by millions of agent-equipped travellers and the aggregate coordination cost becomes material, even if each individual exchange is cheap.
The chain cost, i.e. when front doors connect. Some exchanges are not independent. A flight delay triggers a hotel check-in renegotiation, which triggers a ground transport rebooking. Here, the output of one exchange becomes the input of the next. If the traveller's agent misinterprets the flight delay or packages the changed context poorly, that degradation carries into the hotel renegotiation and compounds into the transport rebooking. This is where quality degrades across a chain - not because agents are guessing at random, but because each downstream exchange inherits the quality of whatever came before it. These connected scenarios are narrower than the full lifecycle but they are where the cost and quality risks concentrate.
The common thread across both is 'source quality'. Whether the exchanges are independent or connected, the accuracy and completeness of the traveller's intent and context needs to be consistent across every one. If the traveller's agent packages preferences poorly for the hotel booking, it probably packages them poorly for the insurance query and the transfer booking too. That is not a compound problem - it is a systemic quality problem at the source. A traveller's agent that structures context well benefits every front door interaction. One that structures context poorly degrades every interaction independently. The cost of poor source quality is not visible in any single exchange. It is visible in aggregate across a lifecycle of exchanges, each slightly worse than it needed to be.
This dynamic applies wherever multi-party coordination involves probabilistic reasoning, e.g. insurance claims routed through broker chains, logistics optimised across carrier networks, clinical decisions coordinated across specialist referrals. Travel is a great example where the lifecycle is long and the number of front doors per journey is high.
Chain of thought reasoning - where a model exposes its working before reaching a conclusion - partially addresses this within a single system. It is how my own telemetry captures whether the model relied on the right inputs. However, reasoning traces are internal. No protocol requires sharing them across organisational boundaries and few organisations would want to - they contain proprietary logic. The gap is not about whether individual agents can explain themselves. It is about whether the coordination between them can be explained by anyone.
What is genuinely different, three things
A grounding check is necessary here because the temptation in any discussion of agentic economics is to present everything as unprecedented. It is not.
The “who pays for the conversation” question has been solved, imperfectly, for decades. When a travel agent phones a hotel, the hotel pays for the receptionist. When an OTA sends a search query to a GDS, there is a transaction fee structure (segment rebates, look to book overages, etc). When a website serves a search result, the supplier pays a commission or a click fee. The cost of coordination being absorbed somewhere in the value chain is not new. It is usually invisible because it is bundled into distribution costs, headcount or platform fees.
Settlement disputes between intermediaries are not new either. In a previous article, I described arriving at a hotel with a party of seven for a stay booked and paid for months in advance - only to discover the hotel had no record of our payment. Somewhere in the chain of intermediaries, settlement had failed. The money had left us but had not reached them. That happened in a human-mediated system with existing technology. Adding AI agents to this chain does not create settlement ambiguity. It inherits it.
So what is genuinely different? I think three things, and only three.
First, the volume control asymmetry at scale. In human commerce, the volume of enquiries is naturally constrained by human attention on both sides. In agentic commerce, the requestor controls how many queries to send and can budget accordingly. The responder does not control the inbound volume and absorbs whatever arrives. That asymmetry - not in cost per call but in control over exposure - is an observable consequence of how agent-to-agent coordination works. It creates an economic dynamic where being discoverable and responsive becomes an escalating commitment, one that smaller organisations may struggle to manage. This is new.
Second, the probabilistic reasoning layer. When two deterministic systems exchange structured data, the interpretation cost is zero. The data is what it is. When two probabilistic models exchange context, interpretation is variable. The receiving model might weight inputs differently each time. Inputs can be present but not influential. The same query might produce different outcomes at different costs. This variability does not exist in deterministic integration and it introduces a cost dimension that is novel.
Third, the potential volume. The oft-cited 2% figure for consumer willingness to grant AI full booking autonomy measures a very specific thing: total unsupervised delegation - a level of trust most people would not extend to a human agent either. That is not the question that matters for coordination cost. The question is whether the assist-me use cases, i.e. where the human stays in the loop but agents coordinate behind the scenes, generate enough inter-agent traffic to make the coordination cost material. The adoption trend for AI-assisted trip planning is clearly upward. However, nobody is measuring inter-agent coordination volume specifically, which means the cost question is invisible until someone starts counting. My instinct is that it is already enough to matter. That is opinion, not evidence.
Everything else such as liability, reconciliation, regulatory ambiguity, etc is an existing problem wearing new clothes. The coordination stack does not create these problems. It amplifies them by removing the human judgment that currently provides the safety net, while adding complexity that makes diagnosis harder. That amplification is real and it matters. That said, it is amplification, not invention.
The question that remains open
The conversation that started this article asked two questions, “Who pays?” and “Does it matter?”
Having worked through the economics, the asymmetries and the volume question, I think the answers are less dramatic than they felt when the questions first came up.
Who pays? Probably the same way the industry has always managed distribution infrastructure costs … absorbed into platform economics through commercial agreements between platforms and their participants, bundled into usage tiers and revenue shares and priced accordingly. Adding agent inference cost to that bundle is not a paradigm shift. It is a new line item. The travel industry has decades of practice at this.
Does it matter? At current token prices, probably not as a financial problem. A single bilateral A2A exchange costs cents. Prices are falling. This is not going to break anyone's budget today.
However, working through those questions surfaced things that do matter regardless of whether the cost itself is material.
The reasoning quality gap is real. Models guess when inputs are insufficient and neither side of an A2A exchange can see or control the other's reasoning quality. That is a governance problem whether the token cost is a cent or a dollar.
The volume control asymmetry is real. The responder cannot control the inbound volume of agent queries. Managing that requires the same kind of tooling - throttling, quality gates, usage tiers - that platforms already use for API access. Organisations building agent-facing endpoints should be designing these controls now, not after the traffic arrives.
The telemetry requirement is real. If you cannot trace what your agent sent, what it received, what the model relied on and what it ignored, you cannot diagnose coordination failures, you cannot optimise costs and you cannot explain your agent's decisions when someone asks. Building that observability early, before you need it, is significantly cheaper than retrofitting it after a problem surfaces.
The accountability absence is real. If nobody owns the coordination cost, nobody owns the coordination quality. The governance direction globally is toward explainability and accountability for automated decisions. Getting the scaffolding in place early - telemetry, quality gates, cost visibility - is not about solving a crisis. It is about being ready for questions that are coming whether the volumes are high or not.
If I had to choose one place to start, it would be telemetry because you cannot manage what you cannot see and you cannot govern what you cannot measure.
My working hypothesis, and I flag this as opinion not analysis, is that the organisations which treat agent coordination as an engineering and governance discipline from the start will have a structural advantage over those who bolt it on later. Not because the cost is large today, but because the practices that manage cost also manage quality, accountability and trust.
In October 2025, I wrote that the coordination stack was here but the hardest work had not started. More of it has been built since then than I expected. The settlement rails are real. The identity layers are emerging.
The coordination cost question turned out to be less about money than I expected when I started writing this article. It is about the working practices, the observability and the governance scaffolding that good cost discipline forces you to build. The organisations that build it early will not just manage their costs better. They will coordinate better, fail less and recover faster. That is the real answer to “Does it matter?”
Further Reading
The coordination cost problem is being discovered empirically across the industry, even if the cross-organisational economics remain unaddressed. These pieces approach the question from different angles:
"The Multi-Agent Trap" - Towards Data Science, March 2026. Research-backed analysis of coordination overhead, including the 3.5× token cost multiplier in multi-agent versus single-agent implementations and a failure taxonomy drawn from 1,642 execution traces.
"We Spent $47,000 Running AI Agents in Production" - Towards AI, December 2025. A story of cost escalation in a four-agent production system using A2A and MCP, documenting how coordination costs compound in ways that pre-deployment estimates failed to predict.
"Multi-Agent Swarms and the Economics of Coordination Overhead" - James Fahey, August 2025. Frames coordination overhead as a "hidden tax" on multi-agent performance and proposes economic thinking - including an "Agent GDP" metric - for managing agent systems at scale.
"Towards Multi-Agent Economies: Enhancing A2A with Ledger-Anchored Identities and x402 Micropayments" - arxiv, July 2025. A research prototype integrating x402 micropayments into the A2A protocol, demonstrating technical feasibility while identifying unresolved challenges in registry standardisation and settlement latency.
"The Hidden Economics of AI Agents: Managing Token Costs and Latency Trade-offs" - Stevens Institute, January 2026. An engineering-focused guide to inference cost management, covering caching, routing and the "unreliability tax" that compounds in agentic architectures.
"A Better Method for Identifying Overconfident Large Language Models" - MIT News, March 2026. Research demonstrating that LLMs can be confidently wrong and that cross-model disagreement is a more reliable indicator of true uncertainty than self-consistency checks. Directly relevant to the question of whether agents in a coordination chain can detect when a counterpart is guessing.
“Inference Economics: Solving 2026 Enterprise AI Cost Crisis” - AnalyticsWeek, March 2026. Frames inference cost as an enterprise governance discipline, validating the thesis that the shift from training to running is where the real financial exposure sits. Useful context for organisations building the FinOps capabilities this article argues are necessary.



Comments