22 September 2026 14:15 - 14:45
The new product stack: Specs, evals & reliable AI agents
Most AI products perform well in demos. Far fewer perform reliably in production.
As organizations race to operationalize generative and agentic AI, product teams are discovering that traditional product development approaches break down when systems become probabilistic, adaptive, and increasingly autonomous.
In this main stage session, Balachander Keelapudi and Madhuri Peri from AWS explore the emerging shift toward spec-driven and evaluation-driven AI product development — and why these frameworks are quickly becoming foundational for building trustworthy, scalable AI systems.
Drawing from real-world experience building reliable AI agents and enterprise AI workflows, the session explores:
- why evals are becoming a core product management capability,
- how leading teams are moving beyond prompt engineering toward systematic AI quality frameworks,
- the operational realities of building reliable AI agents,
- and how non-technical product leaders can successfully drive AI initiatives without needing deep ML expertise.
Attendees will gain a practical understanding of how modern product organizations can move from AI experimentation toward scalable, measurable, production-ready AI systems.
23 September 2026 11:00 - 11:45
The AI evaluation lab for product teams: Building reliable AI agents
AI products cannot be managed with “gut feel” alone.
As AI systems become increasingly agentic, multimodal, and autonomous, product teams need new ways to define quality, measure reliability, and operationalize trust at scale.
In this highly interactive Product Lab workshop, Balachander Keelapudi and Madhuri Peri from AWS lead a hands-on session exploring how teams can use specs, evals, and structured testing frameworks to build more reliable AI products and agentic workflows.
Designed for product, operations, and transformation leaders - including those without deep engineering backgrounds - the workshop focuses on practical implementation rather than theory.
Through live exercises, collaborative problem-solving, workflow examples, and real-world case studies, attendees will explore:
- how to design meaningful AI evaluations,
- the difference between traditional QA and AI eval frameworks,
- how to define success criteria for AI agents,
- identifying failure modes and operational risks,
- and how product teams can improve AI reliability without slowing innovation.
Participants will leave with practical frameworks, implementation approaches, and actionable methods they can apply immediately within their own AI products and workflows.