Unusual use cases, bleeding-edge prototypes, or new approaches to voice, video, or interaction—think spatial audio, metaverse comms, etc.
Real time systems generate a constant stream of events, yet most AI models are still designed to run offline and produce reports rather than actions. This gap between prediction and decision is where many production systems fail. Developers often have accurate models but no safe, reliable way to use them in live environments. This talk focuses on how to design AI driven decision systems that act on live events with low latency and high reliability. It explains how models connect to event streams, APIs, and rule engines, and how decisions flow through real software systems. The emphasis is on architecture and system behavior rather than algorithms or math. Using real examples from large scale retail and marketing platforms, the session highlights common failure modes such as delayed signals, noisy data, unstable actions, and lack of explainability. It then shows practical design patterns like guardrails, decision thresholds, rollback strategies, and continuous monitoring that allow AI systems to operate safely in production. The ideas and patterns discussed translate directly to WebRTC, telephony, and real time communication systems where decisions such as routing, prioritization, or optimization must happen quickly and predictably. Attendees will leave with a clear reference architecture and practical guidelines they can apply to their own real time systems.
For years, the story of AI has been about going bigger — bigger models, bigger data, bigger GPUs. But a powerful counter-trend is emerging: going smaller and smarter. Small Language Models (SLMs) and Tiny LMs are reshaping how we think about deploying and using AI. In this talk, we’ll explore how this shift is enabling organizations and individuals to run advanced language capabilities on edge devices, low-cost GPUs, and even mobile hardware. We’ll look at what’s driving this movement — from efficiency breakthroughs like pruning and quantization to new training approaches that let smaller models punch far above their weight. More importantly, we’ll talk about why this matters: making AI more accessible, energy-efficient, privacy-friendly, and deployable in real-world environments where massive compute isn’t an option. We’ll also look ahead at what the next few years might hold for the SLM ecosystem — including personalized on-device models, hybrid AI architectures, and new business opportunities enabled by this “small is powerful” era. Whether you’re an AI practitioner, product leader, or just curious about where the field is heading, this session will give you a clear view of the trend, future, and impact of this major shift in AI.
AI is automating cognitive work faster than most organisations realise, but the gap between what the technology can theoretically do and what it's actually doing in practice is still enormous. This talk looks at where that gap sits today, what current labour market research says about how it's closing, and which parts of cognitive work are genuinely at risk versus which parts remain distinctly human. Grounded in real deployment experience and recent research, not forecasts.
Modern retail platforms operate at a scale where traditional observability—metrics, logs, and alerts—is necessary but no longer sufficient. At Walmart Global Tech, our checkout platform processes millions of real-time transactions daily, where even brief degradation translates directly into customer impact and lost revenue. In such environments, engineers don’t just need visibility—they need reasoning. This talk explores how agentic AI systems can move beyond passive observability to actively reason about failures in large-scale, highly distributed retail systems. Instead of alert floods and manual triage, we apply AI agents that correlate signals across dependencies, deployments, traffic patterns, and historical incidents to infer why a failure is happening—and what to do next. Drawing from real production lessons and my open-source work on Dependency-OPS-Sentinel (DOS)—an AI-driven DevOps intelligence system adopted by teams and featured in the PySpark community—I will demonstrate how failure reasoning can be modeled as a graph problem, not a dashboard problem. AI agents traverse dependency graphs, evaluate blast radius, detect change-induced instability, and recommend mitigations such as rollback, traffic shaping, or graceful degradation. Attendees will learn: Why observability breaks down at extreme scale and high availability targets How agentic AI differs from rule-based automation in incident response Architectural patterns for AI-assisted failure reasoning using telemetry and dependency graphs Guardrails for building deterministic, trustworthy AI agents in mission-critical systems Practical lessons from deploying these ideas in global retail environments This session is aimed at architects, SREs, and platform engineers building real-time, highly available systems, and looking to evolve from reactive monitoring to self-reasoning operational intelligence.
I’ll walk through high-level architectural decisions, implementation, and key challenges of integrating Voice AI into healthcare information systems, where patient data security and compliance are critical. The talk will also cover lessons learned from building a production-oriented voice AI prototype, highlighting how early design choices impact latency, system design, and data handling under real-world constraints.
We put real-time transcription and a language model into a live call's media path to assist agents. This talk covers the architecture — media forking, streaming speech-to-text, model-driven suggestions — the latency war that decides whether any of it is usable, and the honest list of things that didn't work.
2026 is the year AI agents move from demos to production, but most multi-agent systems still fail in the wild. This talk dives deep into the architectural patterns, orchestration frameworks, and engineering principles behind reliable agentic systems. We'll explore tool use, memory, planning loops, and inter-agent communication, alongside the often-overlooked challenges of observability, cost control, and failure recovery. Expect real-world case studies, anti-patterns to avoid, and a practical blueprint for building agents that ship.
As enterprise AI deployments mature from experimental pilots to mission-critical applications, managing the associated cloud infrastructure costs has become a top priority for the C-suite. Explore how organizations are balancing aggressive AI innovation with sustainable cloud spending. Discuss strategies for forecasting compute costs, optimizing cloud resource allocation, and proving the business ROI of their infrastructure investments. Discover how to architect a cloud strategy that scales your AI capabilities without breaking your budget.
LLM progress now depends heavily on one practical issue: training stability at scale. Sparse Mixture-of-Experts (MoE) models are especially sensitive, since routing drift can overload experts, collapse utilization, and stall learning. In this talk, I will share an "anti-loss-spike" playbook from a recent open-weight run: a 400B-parameter MoE with 13B active parameters per token, trained for 17T tokens with an unsmoothed loss curve and zero loss spikes. I will start with the failure pattern we saw, router drift, overload, MaxVio divergence, and plateau, then cover the fixes that restored steady convergence: bounded and momentum expert-bias updates (SMEBU), z-loss for logit stabilization, a precision fallback from MXFP8 to BF16, better balancing objectives, and data/packing choices that reduced step-to-step variance. You will leave with a concrete checklist for stability instrumentation and first-response fixes to keep large open-weight runs on track.
Modern telecom and VoIP platforms increasingly demand flexible, real-time billing architectures that can scale independently from the switching core. This talk explores how to integrate CGRateS with FreeSWITCH using the Event Socket Library (ESL) to implement externalized, real-time billing and charging workflows. We will walk through the architecture, event flow, and implementation details required to decouple rating logic from the switching layer while maintaining low latency and high reliability. The session covers how CGRateS processes call events in real time, how ESL enables event-driven communication with FreeSWITCH, and how to build scalable charging pipelines for prepaid and postpaid scenarios. CGRateS is a battle-tested Enterprise Billing Suite with support for various prepaid and postpaid billing modes.
In a world where all the public data flowing around is collected and AI-based tools increased the capacity of bad actors to launch attacks, this presentation looks at the components Kamailio offers to run it as a security guardian for VoIP and telephony platforms and the capabilities to ensure the privacy and confidentiality between end users and operator core networks.
Scaling a telephony platform to handle tens of thousands of concurrent calls is a complex challenge for any deployment. This session shares real-world experiences from ASTPP, an open source billing and routing platform, highlighting strategies for high concurrency, load balancing, monitoring, and maintaining resilience under heavy traffic. Attendees will have practical, transferable insights that can be applied using the open source telephony platforms.
APIBAN is a free service helping protect your network from unwanted SIP and HTTP traffic. Find out what's new in APIBAN and how we remain FREE (as in beer). Also, what's new with Kamailio?
On the last day of the conference, attendees will go head to head for amazing prizes and bragging rights to be the more elegant, fan favorite demo.
The telecommunications and real-time communications industry operates under strict latency, availability, and reliability requirements. Even small performance issues can lead to dropped calls, poor audio quality, or service outages. This talk focuses on how AI can be practically applied within real-time communication systems (VoIP, messaging, and event-driven platforms) to improve system stability and operational efficiency. It explores real-world approaches to using AI for anomaly detection, traffic prediction, intelligent alerting, and automated scaling without disrupting critical real-time workflows. Key topics include: AI-driven detection of call quality degradation and traffic anomalies Predictive insights for handling peak load and sudden traffic spikes Intelligent observability using metrics, logs, and traces Automating operational decisions while maintaining low latency and high availability Lessons learned from applying AI in production telecom environments Attendees will gain actionable insights into how AI can enhance not replace core engineering practices in real-time communication platforms.
Roundtable TBD
AI voice is exploding right now and this community has been preparing for this exact moment for twenty years. We already cracked the brutally hard problems: sub-second latency at scale, media that never breaks, conversational state that survives reality, and infrastructure that stays rock-solid under fire. Those solutions are the foundation production AI voice actually needs. In this keynote, Anthony Minessale shares how the builders behind FreeSWITCH and SignalWire are combining that battle-tested platform with modern AI, so we can all ship intelligent systems faster and more reliably. The smartest teams in the room aren’t starting the real-time stack from scratch. They’re standing on mature infrastructure so they can focus their firepower on the intelligence, the agent experiences, and the breakthroughs that will define the winners. This is our category to win. Just like the great platforms before us (Linux, databases, the web), the biggest victories come when builders choose leverage and ship together instead of diluting effort on the plumbing. Everyone in this room is part of building what comes next, and the teams that move fastest on top of proven foundations are the ones who will own the future.
The AI voice agent space is drowning in noise. 10,000+ LinkedIn posts a month, almost none showing architecture. This talk draws a hard line between demo-grade and carrier-grade. I'll walk through SignalWire's three-layer AI Agent control plane -- typed functions, state machines, and prompts -- and why collapsing them into a single system prompt is the root cause of every production failure nobody posts about. I'll introduce Programmatic Governed Inference (PGI), the architectural principle that separates code-driven business decisions from AI-driven conversational decisions, and show a concrete PGI technique: zero-argument tool calls that read from validated stored state instead of letting the LLM re-supply data on every call. Fewer arguments, fewer failures, guaranteed consistency. Everything shown is open source at github.com/signalwire. No demos, no happy-path videos. Just architecture, code, and the failure modes the industry skips.