There are two credible paths to building agentic experiences. The first is platform-first: stand up a unified agent framework with the core capabilitiesâmulti-turn conversation, a knowledge base, and memoryâand then layer in signifiers and affordances that fit your environment. The second is scenario-first: begin with the thinnest viable surface and add only the features that demonstrably create value beyond what ChatGPT or Copilot already provide, bringing in memory and other “platform” features only once they have earned their keep. The platform-first approach yields a consistent engineering experience and lets teams reuse prior agent work, but it risks poor agentâscenario fit. The scenario-first approach can feel messier and demands more from product managers, yet it validates real-world use cases faster. I don’t claim one approach is universally betterâstartups and large companies face different constraintsâbut I do believe there is only one way to prototype: ship quickly, test explicit hypotheses, and iterate without delay.
A clarifying question keeps this cadence honest: what is the minimum version of the product that lets us learn whether the solution can find productâmarket fit? Counterintuitively, you often do not need a working prototype to answer that. Walking through end-to-end customer scenarios frequently reveals whether a proposed feature fits existing workflows and where it will break. That said, some questions hinge on new engineeringâexperiences that are hard to reason about in the abstract. In those cases, the objective is not to “build the demo,” but to surface and test the assumptions that matter. Each design choice should map to the outcome it seeks and to the user challenge it addresses; this is where a PRD should specify the agent workflow, the hypothesis, and the validation path. The simpler the stack, the more learning cycles you can run with less effort, which is the real engine of progress.
The AI-Powered Development Advantage for Agile Prototyping

Survey data from a16z Enterprise reveals why AI-powered tools are gaining traction over traditional low-code solutions, with natural language interaction and rapid prototyping leading the advantages.
Modern AI coding tools make this possible. Cursor, GitHub Copilot, and Claude Code compress build time by generating boilerplate, suggesting common patterns, and helping troubleshoot. A single engineer can now produce a functional MVP in a fraction of the time that used to require a small team. Much like Figma tightened the collaboration loop in design, these tools narrow the gap between product intent and implementation. The result is not merely faster engineering; it is broader participation. Product managers, designers, even sales and customer success teams can test ideas more directly, while engineers concentrate on production-grade systems and reliability concerns that truly benefit from their specialization.
Involving Cross-Functional Stakeholders in Agent Workflows
An agentic experience is only as good as our understanding of the underlying problem. This is especially true for expert workflowsâconsumption-based cost estimation or SOC investigation, for exampleâwhere product and engineering teams are rarely the domain experts. Involving architects, sales engineers, and analysts only at the prompt-iteration stage is not enough. To build agent behaviors that actually fit, we have to internalize existing workflows and best practices, then design signifiers and affordances that match practitioner expectations. Language, steps, intermediate outputs, and handoffs should mirror how experts already think and work. When the agent speaks their dialect and respects their process, adoption follows because the experience feels native rather than novel for novelty’s sake. For evidence-driven workflow discovery, I use the market-to-product approach outlined in Social Listening for Product Insight.

This is exactly where the Figma analogyâKevin Kwok’s point about non-linear returns from tighter collaboration loopsâbecomes operational. Figma did not just make drawing easier; it made critique, alignment, and decision-making happen in the same place, by the right people, at the right time. AI coding assistants catalyze a similar shift for agentic products: they collapse the distance between a domain expert’s intent and a working prototype, making assumptions explicit, turning tacit heuristics into checkable rules, and surfacing disagreements while they are still cheap to resolve. When prototypes function as shared canvasesâco-edited by PMs, engineers, and subject-matter expertsâthe loop tightens further: experts shape the signifiers and workflows, product sharpens the hypotheses, and engineering focuses on robustness and safety. The compounding return comes not from adding more features, but from aligning agent behavior with the realities of the domain. That same loop is the feedback layer I describe in The Real-Time Feedback Layer.
Learnings from the Cost Estimator Agent
To ground these principles, let’s look at an agentic implementation of a cost estimation scenario
Project Context
Customers need accurate cost estimates for budget planning and solution comparison, yet consumption-based pricing is notoriously hard to predict. We heard repeatedly from the field that this uncertainty stalls decisions and, in competitive deals, can tilt outcomes against us. Existing tools do not help enough. Web calculators feel like black boxes with coarse, inflexible inputs and little transparency. Spreadsheet models are opaque and fragile, with assumptions scattered across cells. Both often ask for inputs customers do not understand or cannot provide without heavy translation.

In other words, this is not a known unknowns problem where a general-purpose copilot can retrieve an answer upon request. Nor is it an unknown knowns problem where the customer already has a tried-and-true estimation method and we simply need to automate it. It is often an unknown unknowns problem: customers do not know what to ask, and they do not have the raw data in the needed form. The result is planning paralysis and, ultimately, stalled or lost deals.
Design Rationale
Designing for “unknown unknowns” required optimizing along three intertwined dimensions. First, we focused on transparency and control so that users could see the reasoning behind estimatesâthe assumptions, intermediate calculations, and trade-offsâand adjust inputs with confidence. Numbers without narrative do not build trust, and trust is the currency of estimation. Second, we embedded domain expertise directly in the experience. Instead of pushing the knowledge gap back to the user, the system translated familiar factsâindustry patterns, ingestion profiles, retention policiesâinto the metrics the pricing model requires, pre-populating where possible and teaching as it went. Third, we treated estimation as a process rather than a form, and we designed for iterative refinement. The goal was not a one-shot answer but a guided conversation that converges on confidence.
At a basic level, we began with an agent side-panel, similar to a Copilot, to unify product documentation, pricing schemas, and frequently asked questions. This supported conversational guidance throughout the estimation process, but it also exposed three frictions we had to solve in order to achieve fit. First, use-case discovery was weak: without strong signifiers, users did not know what to ask and often ventured beyond the agent’s scope. Second, chat lacked context: humans are economical with effort, so expecting users to restate all the fields they had filled and the stage they were in created unnecessary friction. Third, people don’t know what they don’t know: there is a structural gap between what customers know about their business (for example, number of users, typical event patterns) and what we require to estimate costs (for example, daily gigabytes ingested). Simply asking, “How many gigabytes per day?” does not bridge that gap.

These insights shaped a prototype with two synchronized surfaces: a pricing panel and an agent panel kept in bidirectional sync. Edits in the graphical interface updated the conversation’s context, and the agent’s reasoning flowed back as explanation cards anchored beside the fields they affected.
In brownfield scenarios, the agent could pull relevant account signals to prefill inputs and explain each value’s provenance. In greenfield scenarios, the experience offered size recommendationsâsmall, medium, large, enterpriseâthat users could apply with one click, each accompanied by clear rationales and editable assumptions.
When hard numbers were missingâsay, daily ingestion in gigabytesâthe agent asked questions users could answer about environment size, event rates, and retention needs, then converted those responses into derived estimates, showing the math and inviting adjustments. Under the hood, a focused knowledge base provided product and pricing facts, while three structured workflowsâvolume estimation, pricing estimation, and design recommendationsâgave the conversation shape and kept it oriented toward decisions rather than dialogue for its own sake.
Evaluation and Benchmarking
Agent platforms encourage generality, but effectiveness must be demonstrated on concrete tasks. We evaluate the experience by asking whether it completes representative estimation scenarios end to end, how its outputs compare to human-expert baselines, and how quickly it converges to a result stakeholders trust. Accuracy matters, but so do user effort and confidence. When building agentic experiences, we should track time to an acceptable estimate, the number of clarifying turns, and whether users report understanding and accepting the assumptions they carry forward. Scenario coverage also matters: behavior needs to hold not only in the “happy path,” but across brownfield and greenfield cases, high-volume and bursty workloads, and strict-retention and cost-optimized policies. When behavior degrades, it should degrade gracefully with clear explanations, ranges, or a handoff to a human expert.
In larger organizations, evaluation pairs with safeguards. Data validation and drift monitoring ensure that quotes reflect current pricing and product information, with alerts when underlying references change. Guardrails protect embedded expert logicâestimation methods and pricing strategiesâagainst prompt injection and leakage of system instructions, and they constrain access to sensitive APIs. Finally, bad-case handling is a first-class requirement: the system detects ambiguous inputs, surfaces low-confidence steps, and offers conservative defaults or escalation paths rather than silently producing spurious precision. Specifications and engineering plans that omit scenario walkthroughs, benchmarks, and safeguards drift toward imagined use cases and weak agentâscenario fit; those that include them turn agentic ambition into reliable impact.
Closing Thoughts
Choose a build path that fits your context, but always prototype to learn, not to impress. Use AI tools to shorten the distance between ideas and feedback. Bring domain experts into the design of signifiers and workflows so the agent respects reality. Make reasoning visible, embed expertise at the point of need, and shape the experience for iterative refinement. Then prove it with scenario-based evaluation and strong guardrails. This, I believe, is how you truly iterate at the pace of AI.