Text transcript

The Agent Development Lifecycle — Showcase with Harry Castle

AI Summit Held March 24–26
Disclaimer: This transcript was created using AI
  • Julia Nimchinski:
    Awesome. We have to transition to our… Thank you again. We have to transition to our next session, and we welcome Harry Castle, AI Transformation Architect at Niro. What a treat! Harry, will be walking us through the new life cycle of agent development. What an incredible framework, and… and read.
    Can’t wait to get into it, Harry, but before we do, please share, what’s in your AGENTIC OS? What tools are you using?

    Harry Castle:
    That’s a great question. I think we at Miro have approached this in a really, sort of experimental state. I think that we are thinking first about how do teams need to solve their AI needs, and we have then decided on paths dependent on what teams need access to what.
    So we think about it in sort of a layered approach, and we try and observe how AI is falling into people’s work and how they’re optimizing, and then we sort of roll those parts out to the business.

    Julia Nimchinski:
    Let’s get into your presentation, super excited.

    Harry Castle:
    Sounds good. I’m gonna share my screen… Can you see it?

    Julia Nimchinski:
    Yep,

    Harry Castle:
    Awesome. Awesome. So, first of all, a little bit, about me. So, I’ve been working at Miro now for five and a half years. I joined when we were less than 500 people during our, sort of, COVID blitzscaling period.
    We were working on building a sort of best-in-class business technology stack that could help us scale from, sort of, 500, employees and, sort of, 10, 20 million users to today with, 1,800 employees and over 100 million users.
    Today, we are now powering our AI innovation workspace, and we are an AI-first company developing AI products, to help companies become AI-first.
    I’m excited to share with you a bit more about my journey over the last year as I moved from sort of all-encompassing business technology into a more focused approach towards AI transformation and systems and tech architecture to support that AI transformation. So let’s get into it.
    Most organizations I speak to right now are somewhere on the same journey. They’ve got agents running, things are moving fast, and somewhere in the back of their minds, there’s this nagging questions of, are we really in control of this? Today, I want to share a little bit about how we answered that question at Miro.
    My talk is called Chaos to Confidence, and it’s a story about how we went from a handful of exciting but fragile AI experiments to repeatable, auditable, production-grade way of deploying agents at scale. Everything I’m gonna share is, something we’ve actually lived through. The good, the messy, and the occasionally very embarrassing.
    The gap between we have agents and we trust our agents is where most organizations are stuck right now, and this talk is about closing that gap. Firstly, let me start by painting a picture of the problem that we were trying to solve a little over a year ago. A year ago, I watched a beautifully built AI agent get killed in a security review.
    Not because it was doing anything wrong, but because nobody could answer a basic question about what data it touched, what decisions it made, or what happened when it failed. We had an agent tested and approved by end users, it was bringing value, and it got crushed because no one had any idea how it worked or what it did.
    That moment crystallized the problem for me. Anyone can spin up an agent in minutes now. The tooling is incredible, the models are powerful, and the barrier to entry is basically zero.
    You can build fuller end-to-end solutions using Claude Code that tap into API endpoints and create usable outputs, but that same accessibility creates a governance nightmare for IT, security, and privacy teams.
    Nearly two-thirds of McKinsey respondents say that their organizations haven’t begun scaling AI pilots, and I’d argue a big reason for that is this exact problem. Without structure, agents become a new category of shadow IT, but operating at a speed and scale that traditional shadow IT never could.
    The question we kept asking ourselves was, how do we enable experimentation without breaking compliance or our DPAs? Because both things have to be true simultaneously. The obvious answer was to reach for our existing lifecycle frameworks, but that’s where we hit our next problem.
    Our product and engineering teams already had strong lifecycle discipline. The product development lifecycle gave us governance and consistency for software. So the first instinct was to just apply that to agents. But it didn’t work. And it didn’t work for very specific reasons that are worth naming clearly.
    Agents operate across diverse platforms that use… that your software development lifecycle was never designed for. Their behavior is non-deterministic. The same input doesn’t always produce the same output. And they access data in dynamic, sometimes unpredictable ways.
    They’re continuously learning and drifting from the version you originally tested, and crucially, they require human-in-the-loop patterns that don’t map cleanly onto any traditional testing or deployment model. On top of all of that, you’re dealing with LLM hallucinations as a first-class risk, not an edge case.
    The software development lifecycle was built for systems that do exactly what you tell them. The AADLC that we’ve built at Miro is built for systems that sometimes decide for themselves. So we went back to first principles and asked, what would a lifecycle look like if it was purpose-built for agents from day one?
    What we landed on is the agent and automation development lifecycle. And the core idea is deceptively simple. A standardized process for documenting, evaluating, building, and releasing agents. The AADLC has five stages. Request intake, requirements and risk validation, solution design and build, QA and validation, and assessment and handover.
    But more important than the stages themselves is what the AADLC actually achieves structurally. It defines ways of working across AI transformation, data, security, privacy, and business systems. All the teams that need to be in the room from day one, not as an afterthought.
    It ensures value and compliance are evaluated together rather than sequentially, and it creates an audit trail and institutional learning that compounds across every agent you ship.
    You can think of it less as a checklist and more as a shared language, a way for everyone from an analytics engineer to a chief privacy officer to talk about the same agent in the same terms. Before we go stage by stage, here’s the one-sentence version of what the AADLC actually is.
    It’s a standardized process for documenting, evaluating, building, and releasing agents. But that one sentence contains three distinct promises. Standardization. Shared ways of working across every team that touches an agent. No more five parallel conversations about the same system, no more teams working in silos, building the same solution.
    evaluation. Value and compliance are assessed from day one, not bolted on at the end, when it’s expensive to change anything. And deployment. Every agent creates an audit trail and institutional learning. The tenth agent you ship is faster and safer than the first, because the organization gets smarter with every cycle.

  • Harry Castle:
    Let me walk you through each stage in practice. The most expensive mistake you can make with AI agents is building the wrong thing with great discipline. Stage 1 exists to prevent exactly that. Before anyone writes a single prompt or pulls a dataset, the request intake stage filters ideas for real business value.
    It sounds obvious, but in practice, most organizations skip this entirely. Someone sees a demo, or a cool LinkedIn post, gets excited, and 3 months later, you’ve got an agent solving a problem that either doesn’t exist, or is already being solved by something else within the organization. Our intake process forces two honest questions up front.
    What problem are we actually solving for? And does a solution already exist that solves that problem today? From there, the request is logged with scope, impact, and requester context, and a go-no-go decision is made jointly by a product owner and a solutions architect.
    This stage alone has saved us significant development time, not by slowing things down, but making sure we’re accelerating in the right direction towards key organizational objectives. Once a request clears intake, we move into the stage where the real work gets defined and assessed.
    This is the stage most teams skip straight past, and it’s the one that causes the most pain 6 months later. Stage 2 brings together biz tech, product, architecture, security, data analytics, and engineering in a single, collaborative moment before any building begins.
    Together, the team defines the build approach, maps the data sources and system integrations the agent will touch, and critically completes a full security and privacy assessment. The output is a risk impact score with mitigations that are baked into the design itself and not bolted on afterwards.
    Think of this as the moment where you find out whether your exciting new agent idea has a fatal flaw, while it’s still cheap to change course. The single biggest governance failure we see is risk assessments that happen after the build. Stage 2 makes risk a design input, not a deployment blocker.
    With risk understood and mitigation defined, the team can move into design and build with real confidence. Now we actually get to build, but in a way that’s architecturally intentional, rather than just moving fast and hoping for the best.
    Before anyone writes a line of code or connects a single API, we define the architecture in five distinct layers. And I want to walk you through each one, because this is where the AADLC really earns its keep. At the bottom is the data platform. This is Snowflake, and our data mesh. This is the foundation.
    Every agent we build draws from trusted, governed enterprise data. If this layer isn’t solid, nothing above it can be trusted. Above that sits the knowledge layer, semantic models that convert raw data into business understanding. This is what stops an agent from seeing a number and not knowing what that number means.
    The knowledge layer gives data context. Then the intelligence layer. RAG, vector databases, model gateways. This is the engine. It’s where AI capabilities live. Retrieval, reasoning, model routing. This layer is what makes the agent smart. Above that is the automation and agentic platform. Workflows and orchestration.
    This is where the agent actually runs. Workato sits here for us, this layer decides what happens, in what order, triggered by what, and handed off to whom. And at the top, the experience layer. Slack, CRM, native applications. Or even a vibe-coded front end. This is the only layer most end users will ever see.
    The AE gets a Slack message, the people team gets a communication sent out, nobody sees the four layers underneath that made it happen. The reason this architecture matters is that every agent we build maps to all five layers explicitly. So when something breaks, you know exactly where to look.
    When you need to extend it, you know exactly where you need to build. And when security asks what data the agent touches, you can point to the layer and show them. Architecture without RACI is just a diagram, and RACI without architecture is just a spreadsheet. Stage 3 makes both reels simultaneously.
    Once the build is complete, we don’t just ship it, we validate it rigorously. Testing an AI agent is fundamentally different from testing software, and if you’re using your existing QA playbook unchanged, you’re almost certainly missing the most important failure points. Stage 4 is a structured QA process designed specifically for agent behavior.
    It checks three things in parallel. That the build matches the design defined in Stage 3, that all the controls and mitigations agreed in Stage 2 are actually present and functional, and that data handling, logging, human review checkpoints, and fail-safes are all working as intended. This isn’t just functional testing, it’s behavioral validation.
    We’re not just asking, does this work, but does it work in a way that we’d be comfortable explaining to our legal team, to our customers, to our board, or even to our internal stakeholders? The human review checkpoints are particularly non-negotiable.
    Every agent we ship has a defined point at which a human can inspect, override, or halt what the agent is doing. And once QA signs off, we move into the final gate before deployment. This is the stage where the organization formally accepts the agent, and formal acceptance is what turns a prototype into a productional system.
    Security and privacy review all the mitigations from Stage 2, and create a risk record for anything that remains outstanding. If the residual risk rating is above a certain threshold, additional approvals or compensating controls are required before we proceed.
    The business requester then formally accepts the residual risk, which is an important act of ownership that most organizations skip. After that, user acceptance testing validates that the solution actually solves the original problem from the intake stage, and then deployment and change management follows.
    It’s a clean, documented handover with a clear chain of accountability. The risk record and formal acceptance aren’t bureaucracy, they’re institutional memory. When something goes wrong 6 months from now, and something always inevitably does, you know exactly who decided what and why.
    So you can point to the right owner, and they can take ownership of the resolution.

  • Harry Castle:
    So that’s the life cycle in theory. Now let’s talk about what it looks like in practice. The best test of any framework is whether real teams actually use it to ship real things. Here are 3 agents we’ve put through the full AA DLC at Mira.
    The first one is the Marketing Claims Bank AI Agent. It’s a conversational AI interface that turns scattered research data into an instantly searchable, always available knowledge base for content teams. The next one is PACE, Personalized Automated Communication Engine.
    Our people ops and internal comms teams were spending a significant amount of time manually crafting and sending targeted messages to different employee groups. Benefits updates, policy changes, compliance deadlines, all done by hand, all dependent on someone remembering to send the right message to the right people at the right time.
    The consequence of getting that wrong isn’t just an annoyed employee, it’s a missed compliance deadline. And at scale, that’s illegal exposure problems.
    Pace automates that entire workflow, the right message to the right employee group at the right time, without anyone having to manually build a list, write the communication from scratch, or chase the send. And then there’s the one I’m most proud of, the AE Deal Copilot.
    This is a combined agentic automation, meaning it’s not just an AI layer on top of existing data, it’s an orchestrated system that actually actively connects multiple live data sources and acts on them together.
    This agent generates actionable account intelligence for strategic enterprise accounts by combining AI-derived expansion signals with real-time Salesforce pipeline hygiene data and Slack deal room activity.
    When triggered, it queries Snowflake for pre-computed Cortex expansion intelligence, evaluates field completion across all open Salesforce opportunities, ranked by win rate impact, and delivers a formatted summary to a specific Slack channel or in a DM to the right salesperson.
    The output enables account executives to act on prioritized expansion opportunities and pipeline hygiene gaps without manual research. It’s built in Workato, zero manual research needed by a sales rep, prioritized expansion opportunities and pipeline hygiene gaps on demand. That’s what Agentic actually looks like in a go-to-market context.
    Not a chatbot, but a system that goes and gets the intelligence, connects the dots, and tells you what to do next. Six months in, the thing that surprised us most wasn’t the individual agent outcomes, it was what the lifecycle itself was doing to our organization. Four things stand out. Repeatability.
    We no longer reinvent the process for every agent. Each new one moves faster than the last, because the foundations are already there. Alignment. We’re at stage 2 is now a complete sentence that instantly orients every stakeholder in the room, from an engineer to a CISO. Security and quality.
    Compliance is now designed in, not reviewed in, and that’s a fundamentally different risk posture. And scale. We’ve moved from two hero agents, that a handful of people knew how to maintain, to a portfolio of production-grade systems that the organization trusts and relies on. The compounding effect is real.
    Every agent you put through the AADLC makes the next one cheaper, faster, and safer to ship. So the real question is, how do you start? And I’m gonna give you a few, a generally practical answer. I want to leave you with something you can actually do this week.
    Not a 12-month transformation roadmap, just a first step that’s small enough to start and meaningful enough to matter. Three things. First, run a lightweight intake and risk workshop for whatever agent idea is already sitting in your backlog. One hour, the right people in the room, two honest questions.
    What problem are we solving, and what’s the risk if something goes wrong? Second, define a cross-functional RACI, and stand up a simple five-stage board, even in a spreadsheet, that mirrors the AADLC stages. Third, use our upcoming AADLC template on the Miraverse as your starting point, and adapt it to your governance context.
    You don’t have to implement the full lifecycle on day one. Start with one agent, one workflow, and run it through the full five stages. That’s your proof of concept for the lifecycle itself.
    The organizations that are gonna win with agents aren’t the ones moving fastest right now, they’re the ones building the foundation that lets them move fast sustainably. The template that we’re working on will be a good foundation for this.
    There’s also a blog post that I’ve written about this exact topic, and I’m generally happy to talk through how this might apply in your context. You’re free to connect with me on LinkedIn to talk more. The main thing that I would say is don’t rush this. You have time for this.
    The industry is making you feel like everything’s moving extremely fast, and you have to pay attention to every new development, everything that’s happening. But by thinking through how this is going to work long-term, you’re miles ahead, and I think creating those foundations will reap the rewards later on. There’ll be a template, as I mentioned.
    It’s, it’s on its way. If you’re a Miro user, you’re free to use it. Before I hand it back, I have one final thought, and this shift from chaos to confidence with AI Agent isn’t a technology problem. It’s an operating model problem. The technology is already good enough.
    What most organizations are missing is the shared language, the structured process, and the cross-functional trust to move from experiments to production at scale. That’s why we built the AA DLC. You can find the full write-up at miro.com slash blog slash agent.
    dash automation dash development dash lifecycle, and the Miraverse template will be on the way, as I mentioned. Thank you very much. The agents are ready. The question is whether your organization is ready to govern them. And I’m happy to take any questions that you have.

    Julia Nimchinski:
    Fascinating session. Thank you so much, Harry. We share the blog post on our Slack, so everybody’s welcome to read it, and Devin. One question here, do ROI and scale need to be addressed and measurable before starting?

    Harry Castle:
    I mean, it depends. I think, I think that if you… we need to have clear… clearly defined objectives. I think that what we’re seeing a lot is that when people start working on a solution, it… the scope drift is… is massive, because the tools that we’re using are suggesting opportunities for a better solution.
    But sometimes that causes drift from the original outcome. So I would say, as long as you’re clear about what objectives you’re trying to… resolve or impact, then you have a good starting point to be able to implement such a solution.
    And it can be very small, you know, like, anything… when we started, we were working primarily in back office things, just things that people were doing that didn’t make sense to us. And now we’ve grown into fully supporting go-to-market finance teams on making their processes better using AI.

    Julia Nimchinski:
    Another question here, people are asking what failures typically occur post-deployment despite passing QI and validation?

    Harry Castle:
    I mean, we’ve seen a lot of different things, one, you know, one part of this is we… Build our solutions in deterministic ways, because we want to have as much control over the output as possible.
    The reason that we do that is when we’re fully relying on LLM chatbot-based agents to provide core instruction, the inputs can… the outputs can vary depending on many factors, including what that initial input is. And we see that drift with model updates, for example.
    We see also that drift based on whether or not people are providing the right context, whether they’re prompting correctly, and we can’t really control how an end user is using the solution that we’re providing.
    But what we can control is what is the pipeline that that initial prompt goes through that leads us to coming up with an output that Is structured and is repeatable as much as it can be.

    Julia Nimchinski:
    Quite of a futuristic question here. How does this evolve, over the next 12 to 18 months?

    Harry Castle:
    Hmm. I mean, I don’t know how this evolves over the next 6 months, but but I think, you know. It’s hard to say, it’s hard to say. I, I am currently in the process of figuring out how this can work in the context of vibe-coded applications.
    Right now, we really try and keep everything in, like, the core systems that we rely on, so that we can manage access control, we can manage, you know, data policies and things like this, because obviously there is inherent risk with all of that.
    But everyone having access to solutions that allow them to vibe code things means that we have to rethink this constantly, and that’s what we’re trying to do right now, is create policies, and ways of working and solutions and processes like the AADLC that will work in 12 months’ time, but I can’t guarantee that at all.

    Julia Nimchinski:
    Folks are asking also what tools and infrastructure are required.

    Harry Castle:
    So, that’s a great question. When we started on this journey, which was… more than a year and a half ago, we had a bunch of different integration platforms. That’s really how we saw this working for us, was integration platforms where we would have AI transformation and AI nodes within the processes, so these automations.
    we ended up, partnering with Workato, which is an integration platform as a service, and they’ve been a great partner for us in order to, as that core orchestration layer, for us to build these agents.

    Julia Nimchinski:
    Makes sense, and How do you prevent this framework from becoming a slow approval layer?

    Harry Castle:
    It’s a great question. Hold people accountable, I would say, is the main thing. you know, this didn’t come out of a vacuum. This came because we were presented with a problem that needed to be solved, that we… us in AI transformation, we didn’t need to solve this alone. We needed to solve this with privacy, with security, with data teams.
    So, I think that that’s the main point, is that we need to assign responsibility for these things, because the agents, for now, in 2026, are going to be standard, and we should all be striving to figure out how to build them and to incorporate them into our workflows.
    And I think that we need to hold, you know, our teams accountable to support us in developing them.

    Julia Nimchinski:
    Harry, folks are asking you to add some color to the UAT process, and they’re wondering about confidently wrong outputs that are hurting adoption.

    Harry Castle:
    That’s a good one. Yeah, this is tricky. This is tricky, because… you know, the best person to validate whether or not an agent is working as intended is the person that brought up the issue at the beginning. They’re the original stakeholder that made that request.
    So they’re signing off on the outputs, but they’re also signing off on those original requirements, right? If the outputs don’t satisfy their original requirements, we go through iterations. We improve them.
    And it can also be the case that after using the solution for 2-3 months, they realize, hey, actually, I’ve come up with a bunch of improvements that we need to implement. And then we roll straight back through the process again, and all of those improvements for a version 2 get revalidated.
    But again, it’s an iterative process, and I think that that’s actually what makes this interesting and exciting, is you’re co-building a way of working that everyone can get involved in and figure out how it can work for your team.

    Julia Nimchinski:
    Another one here, who truly owns risk in practice, the business requester or security?

    Harry Castle:
    That is a great question. So, the way that we’re doing it at Miro is that the requester owns the final risk. The security team owns the identification of risk, and also owns providing the risk mitigations to the building team, so our core AI transformation team, so that we can build those mitigations into the process.
    If a risk is too high, if we identify that we’re tapping into data that we shouldn’t be tapping into for this use case. we will just not prioritize that use case, and that remains to be the case.
    We are not a… we’re not here to basically breach compliance laws and mess around with, core European data practices, but it’s important that we do identify where risk lies, and that the right person has to own the risk.

    Julia Nimchinski:
    And another one here, how do you measure agent reliability beyond functional correctness?

    Harry Castle:
    Great question. Part of the intake that we have is we have to identify core metrics and objectives that the agent needs to have an impact on. So we’re constantly measuring, and we do a lot of A-B testing, where one group of employees, one team might have access to a solution, versus another team that don’t have access to that solution.
    So, we can then have an idea of how that’s impacting their workload, how much time is being spent using the application. And we’re… we’re still building out that observability functionality, but, you know, as I said, it’s a work in progress.

    Julia Nimchinski:
    Folks want to know what percentage of requests are rejected at Stage 1, and why.

    Harry Castle:
    Very, very good question. I would say… That’s a… that’s a… it’s a tough question to answer. I guess in percentage terms, it’s probably only, like, 50… I would say.
    maybe 30% that actually get rejected, but I think that usually that’s the case, is because our team gets identified as you know, these are the people that need to be worked with in order to roll out these solutions.
    So people come to us, they talk to us, we talk through their plan, and once they… we’ve identified, okay, yeah, actually, this use case is valuable, they then go through the intake.
    Obviously, having conversations with people doesn’t necessarily scale if everyone is suddenly asking for agents, but but for now, yeah, I would say only about 30% get rejected.

    Julia Nimchinski:
    Since we only have one minute here, Harry, I’m just curious about what’s next for Miro. You just recently acquired Reforge. Yeah, share anything you’re allowed to share. Super excited for all of the updates and what’s coming.

    Harry Castle:
    Yeah, I mean, we’re extremely happy to have Reforged, joining Miro. I think it’s, it’s… it’s… it’s gonna be really interesting. I think the future for us is… we recognize that Miro as a platform is enabling people, to have… basically, that we’re enabling the barriers to entry for AI automation and AI developments to go really low.
    Anyone that has access to a Miro board is able to create automations to chat with sidekicks, and and so having this Reforged team join us, where we have this AI transformation expertise.
    Joining the organization really positions us as, as sort of partners for everyone in terms of how they’re approaching their AI transformation and how Miro can sort of serve as, that core layer to help them do that.

    Julia Nimchinski:
    Fascinating. And you mentioned that this is coming… that this framework is coming to Miraverse, but where should our community go? What’s the best next step to reach out to you directly on LinkedIn?

    Harry Castle:
    Yes, absolutely. Feel free to reach out to me on LinkedIn. I’m talking a lot about this topic at the moment, and I’ll be sharing any news about the release of this framework over there.

    Julia Nimchinski:
    Awesome. Thank you so much, Harry. Amazing session. Thank you.

    Harry Castle:
    Have a good day. Bye-bye.

Table of contents
Watch. Learn. Practice 1:1
Experience personalized coaching with summit speakers on the HSE marketplace.

    Register now

    To attend our exclusive event, please fill out the details below.







    I want to subscribe to all future HSE AI events

    I agree to the HSE’s Privacy Policy and Terms of Use *