We put real-time transcription and a language model into a live call's media path to assist agents. This talk covers the architecture — media forking, streaming speech-to-text, model-driven suggestions — the latency war that decides whether any of it is usable, and the honest list of things that didn't work.
