AI Agent Case Study: Voice Platform Built With Zero Human Code (2026)

An AI coding agent built a complete multi-system voice intelligence platform — Twilio, Microsoft Teams, Supabase, n8n — without a single line of human-written code. Every workflow, every database schema, every configuration file, every shell script. All authored and deployed by the agent through API calls and file writes. The human directed the architecture, set constraints, and reviewed output. The agent did the building. This is what that looked like.

The Brief

A joint venture between AI Governance Group and Black Gazelle needed something specific: an AI system that dials into Microsoft Teams meetings via the public telephone network, listens to the full conversation in real time, and at precisely the right moment, asks one highly targeted question.

The domain is Action Learning, a methodology where peer enterprise leaders work through complex issues, difficult decisions, and high-stakes workplace challenges together. The AI doesn’t moderate. It doesn’t summarize. It listens deeply to everything being said and contributes a single, well-crafted question that moves the group’s thinking forward.

The stakes are unusual. If the system asks one bad question out of a thousand, that’s a failure. Every question has to be excellent. Every time. There is no room for the kind of verbose, hedging output most AI systems produce. The question has to be precise, grounded in what was actually said, and delivered without preamble.

Security and data locality were non-negotiable. Everything runs and is stored in Europe. All data, models, and systems are either European or locally hosted, giving the client a high degree of control despite Fountain City being a US-based company.

The entire technical build for the voice infrastructure, the system that handles dialing in, transcribing, and speaking back into live calls, was executed by an AI coding agent. Zero lines of human-written code. Zero manual UI configuration. Every n8n workflow node, every SQL migration, every TwiML template, and every shell script was authored and deployed by the agent through API calls and file writes.

The Architecture

Six systems had to work together. Each one handles a different layer of the problem:

Layer	Technology	Role
Telephony	Twilio Programmable Voice	PSTN dialing, DTMF tones, audio codec negotiation, call lifecycle
Transcription	Twilio Real-Time Transcription (Deepgram Nova-3)	Speech-to-text, streamed as HTTP webhooks
Orchestration	n8n (self-hosted, Docker)	Workflow logic, webhook endpoints, API coordination
Database	Supabase (managed PostgreSQL)	Transcript storage, session state
Conferencing	Microsoft Teams (Audio Conferencing)	The meeting room the system dials into
Infrastructure	AWS EC2 + Cloudflare + Nginx	Hosting, SSL termination, reverse proxy
AI Agent	Claude Code CLI (Anthropic)	The builder — wrote and deployed everything

The core architectural constraint, set by the human director before a single line was written: Twilio owns the ears and mouth, n8n owns the brain. No custom server. No WebSocket handling. No raw audio streaming. No media processing. All audio stays inside Twilio’s infrastructure. The orchestration layer only ever touches text and REST API calls. The AI agent respected this constraint throughout every milestone.

How the AI Agent Built It

The build happened across three milestones. Each one was independently testable and had a concrete “it works” checkpoint before moving forward. This is Part 1 of the case study, covering the voice infrastructure. Part 2 will cover the operator dashboard and AI question generation pipeline.

Milestone 1: Infrastructure Validation

The directive was simple: prove that Twilio, n8n, and Supabase can all talk to each other. No call logic yet.

The agent created a Supabase schema by generating SQL and executing it against the Postgres database over IPv6. A transcripts table with columns for session tracking, segment ordering, and confidence scores, plus a composite index for ordered retrieval.

Then it built three n8n webhook endpoints by composing workflow JSON and deploying it through the n8n REST API. No human touched the n8n UI. The agent created a transcript receiver webhook that parses Twilio’s form-encoded POST data, filters for final transcriptions only, and inserts rows into Supabase. It also built a TwiML server webhook for XML responses and a status callback webhook for call lifecycle events.

The agent configured n8n credentials for both Twilio and Supabase through API calls, setting up API key pairs, database connection strings, and authentication headers programmatically. Then it validated the entire pipeline end-to-end with curl commands simulating Twilio webhook payloads, confirming data flowed from webhook to n8n to Supabase.

The first real surprise came here. n8n’s Supabase node uses internal field names that don’t match the documentation: dataToSend and fieldsUi and fieldId instead of the documented fieldsToSend, fieldValues, and fieldName. The agent discovered this through trial-and-error API calls, inspecting n8n’s node source patterns, and corrected the workflow JSON accordingly. A human clicking through the UI would never have encountered this, but a human reading documentation would have been equally misled.

Milestone 2: Dial-In and Transcription

The directive: dial into a real Teams meeting, transcribe the audio, store it in Supabase.

The agent built a call initiator workflow using the Twilio REST API to place an outbound PSTN call to a Teams dial-in number. It engineered the TwiML response to handle the Teams auto-attendant sequence: an initial pause for Teams to answer, a DTMF digit sequence with calibrated timing to enter the conference ID, real-time transcription activation with Deepgram Nova-3, and a long pause to hold the call open.

Live testing uncovered several things documentation doesn’t tell you. Transcript text arrives inside a JSON string in TranscriptionData (not TranscriptionText), and the sequence field is SequenceId, not SequenceNumber. The agent updated the workflow parsing logic in-place via API calls.

It also caught that ampersand characters in webhook URLs within TwiML must be encoded as &, and that Twilio silently rejects malformed XML with a generic error code (12100). Diagnosed from Twilio’s error logs, fixed without human intervention.

DTMF timing was the last piece. Teams auto-attendants vary by tenant, so the agent ran 3 to 5 test calls, adjusting pause durations between each attempt, until the digits were accepted reliably. Total cost for this calibration: about $0.50 in Twilio call charges.

The result: a real Teams meeting was dialed into, and live utterances from meeting participants appeared in the Supabase transcripts table in real time, correctly ordered by session, segment, and sequence number.

Milestone 3: Voice-Out (AI Speech Injection)

The directive: make the system speak into the live call, then restart transcription so nothing is lost after the AI talks.

The agent built a voice-out trigger workflow exposing a webhook that accepts a JSON payload with the text to speak, the session ID, and the Twilio CallSid.

This milestone produced the most significant discovery. Twilio’s <Start><Transcription> runs as a sidecar process. When TwiML is replaced via the Call Update API, the old transcription session does not automatically stop. Without explicit cleanup, transcription sessions stack up silently, producing duplicate or ghost utterances. The agent solved this by prepending <Stop><Transcription> to every voice-out TwiML payload, using unique session names tied to the segment counter.

It implemented segment tracking using n8n’s workflow static data, which persists across webhook executions. Each voice-out increments the segment number, so transcripts before and after each AI interjection are correctly sequenced.

The agent also added XML escaping for dynamic text (AI-generated responses can contain ampersands, angle brackets, quotes, and apostrophes that break TwiML) and enforced the TwiML 4,000-character size limit on the Call Update API, capping AI response text at 3,500 characters to leave room for XML overhead.

The result: the system successfully spoke AI-generated text into a live Teams call, then seamlessly resumed transcription. Multiple speak-listen cycles were tested, with all segments correctly tracked in the database.

This screen shows a test point of the tool where we show multiple candidates and which one the AI is about to speak so we can be sure the system is indeed asking the most effective question into the conversation.

The AI Agent’s Toolkit

The agent operated entirely through a terminal session on the EC2 instance. Here is what it actually used:

For n8n workflow authoring, it used the n8n REST API to create and update complete workflow definitions as JSON, including node positions, connections, parameters, and credential references. Every workflow was built API-first. The n8n web UI was only used by the human for visual verification after the fact. It also used the credential API to set up Twilio and Supabase authentication, and the workflow activation API to toggle workflows when forcing webhook re-registration after updates.

For database work, it ran psql over IPv6 directly against Supabase’s Postgres instance for schema creation, data inspection, and debugging. It wrote migration files to the repo for version control.

For Twilio integration, it used the Twilio REST API via curl to initiate test calls, update live calls with new TwiML, and query call status. It inspected Twilio error logs to diagnose webhook failures and TwiML parse errors.

For infrastructure, it managed the n8n Docker container, wrote and validated Nginx reverse proxy configs and SSL certificate setup, and created Cloudflare WAF rules to bypass bot challenges on webhook paths.

Everything else: file writes for documentation, shell scripts, environment files, and SQL migrations. Git commits at each logical milestone.

One of several AI-generated workflows in N8N. Keeping the logic in N8N means that a human can review the system easily and understand at a glance what the AI is building, even if the person is not technical. Code blocks (which with {} ) are used to do any logic parsing.

What the Human Actually Did

Before the build started, the human provisioned accounts on Twilio, Supabase, Microsoft Azure, and Cloudflare. Purchased a Twilio phone number. Set up an Azure app registration and Teams bot for audio conferencing. Provisioned an AWS EC2 instance, installed Docker, deployed n8n via Docker Compose with a public URL, and populated a .env file with API keys, tokens, and connection strings.

During the build, the human directed each milestone (“Start M2. Here’s the Teams dial-in number and conference ID.”), set architectural constraints (“No WebSockets. No audio processing. Twilio handles all media.”), reviewed the agent’s work by checking n8n workflows in the UI and querying Supabase to verify data, course-corrected when needed (“The DTMF timing is too fast, add more pause.”), and approved destructive actions when the agent asked for confirmation before overwriting workflows or restarting services.

What the human never did: write or edit any code. Configure any n8n workflow node through the UI. Manually set up any database table or index. Debug any API integration by hand. Write any Nginx config or Cloudflare rules.

This is the pattern we call “human as director, AI as builder.” The human’s expertise was essential for architectural decisions, constraint-setting, and knowing when the agent’s output was correct. The mechanical work of translating those decisions into running code was entirely delegated. It’s a different skill set: systems thinking and quality judgment, not implementation detail.

The Gotchas: What the Agent Had to Figure Out

Building integrations between four live systems produces edge cases that no documentation covers completely. This table is the unedited record of what the AI agent encountered and resolved:

Problem	How the Agent Solved It
n8n Supabase node uses undocumented internal field names	Tried the documented names, got errors, inspected n8n source patterns, iterated until the correct names were found
Twilio sends `Final` as string `"true"`, not boolean	Discovered through live webhook inspection, updated the n8n IF node comparison
Transcript text nested inside a JSON string field	Parsed `TranscriptionData` JSON to extract `.transcript` and `.confidence`
TwiML `&` in URLs causes silent Twilio rejection	Diagnosed from Twilio error code 12100, applied XML entity encoding
Two webhooks in one n8n workflow require specific response mode	Both must use `responseMode: "responseNode"`, not the default `lastNode`
n8n API workflow updates don’t take effect on active webhooks	Learned to deactivate, update, then reactivate the workflow to force re-registration
Transcription sessions persist as sidecar processes after TwiML replacement	Added explicit `<Stop><Transcription>` to every voice-out payload with unique session names
Teams auto-attendant timing varies by tenant	Ran iterative test calls, adjusting DTMF pause characters until digits were accepted reliably

Each of these would have cost a human developer minutes to hours of debugging. The AI agent encountered and resolved them within its normal workflow, typically within 2 to 3 retry cycles.

Anyone can claim an agent built something. The gotchas prove it encountered real-world messiness and worked through it.

Cost of the Build

Under $5 in total telephony costs for the entire build-and-test cycle.

Item	Cost
Twilio test calls (M1 through M3, approximately 15 calls)	~$3.00
Twilio phone number (monthly)	$1.15
Supabase (free tier)	$0.00
n8n (self-hosted)	$0.00
AWS EC2 (existing instance)	Marginal
Claude Code CLI usage	Per-token API costs

The primary cost was AI agent API usage. The infrastructure and telephony costs were negligible.

What This Proves

AI agents can handle multi-system integration plumbing. This was not a single-API wrapper or a CRUD app. The agent coordinated four live external services, each with its own authentication model, data format, and behavioral quirks. It managed form-encoded webhooks, XML generation, JSON APIs, SQL DDL, Docker containers, and Nginx configs in a single continuous workflow.

No-code platforms become more powerful with AI agents. n8n is a visual workflow builder, but the AI agent never used the visual interface. It authored workflows as JSON and deployed them via REST API. This is faster than clicking through a UI, produces version-controllable artifacts, and allows the agent to iterate programmatically when something doesn’t work. The “no-code” platform became a headless orchestration engine.

The human role shifts from builder to director. The human’s expertise was essential for architectural decisions, constraint-setting, and recognizing when the agent’s output was correct. But the mechanical work of turning those decisions into running code was entirely delegated. This is a different skill set: systems thinking and quality judgment, not implementation detail.

Debugging is where AI agents earn their keep. Half the work in any integration project is diagnosing why System A’s output doesn’t match System B’s expected input. The agent’s ability to make an API call, inspect the error, form a hypothesis, modify the code, and retry in a tight loop — without fatigue or frustration — is where the productivity gain is largest. The gotcha table above is the evidence: eight real problems, each solved through methodical iteration that would have cost a human developer significant debugging time.

Documentation becomes a natural byproduct. Because the agent operates through explicit tool calls and file writes, every action is logged. The project ended up with comprehensive documentation (milestone specs, a rebuild-from-zero guide, exported workflow JSON, SQL migrations) not because someone sat down to write docs, but because the agent’s working method inherently produces artifacts.

The Hardest Parts (And They Weren’t Code)

The most time-consuming part of this project was, ironically, not the coding. It was getting permissions, credentials, DNS changes, and other access systems granted and permitted. At one point we waited about a week for Microsoft to provision the agent with the right credentials to act on calls as expected. Once all the tools and system access points were in place, things moved at breakneck speed again.

The other part that required significant iteration was the quality of the questions themselves. Getting an AI to ask precisely the right question, without going into extended preamble before and after it, is hard. Conventional wisdom says that to increase an AI’s output quality, you should make it think out loud. That’s what all agentic systems do when they have “thinking mode” enabled: in the background, they reason through the problem step by step, and this greatly increases reliability and correctness. The challenge here was getting to a high-quality result with no space at all for that kind of self-dialogue.

This was solved during an initial proof-of-concept phase where extensive testing was done across a hundred-plus case study conversations using synthetic data. All of it was human-validated for quality to train and refine the system until it consistently produced excellent questions.

This pattern was used as a loop to bulk test large amounts of synthetic data until we got the quality we need to ensure consistent, excellent AI questions

What the Client Says

Ghislaine Caulat from the client team: “I really want to thank you, Sebastian, for the way you engage with us, for your transparency and for your precision. You have demonstrated a deep and effective understanding of our needs and Action Learning principles. I appreciate this a lot!”

How to Reproduce This Pattern

The prerequisites: an AI coding agent with shell access running on the target server, all external accounts provisioned with API keys stored in an environment file, a clear architecture document that becomes the agent’s persistent reference, and a milestone-based build plan where each phase is independently testable.

The workflow loop: the human directs (“Here’s the architecture. Here are the credentials. Build Milestone 1.”), the agent builds (creates schemas, configures credentials, builds workflows, tests end-to-end), the human reviews and provides corrections, the agent iterates and moves to the next milestone. Repeat until the system is complete.

Two things make this pattern work reliably. First, constraint-driven architecture. The more clearly you define what the system should not do, the fewer wrong turns the agent takes. Second, API-first tooling. Every system in this stack (Twilio, n8n, Supabase, Cloudflare) exposes a REST API. If a component can only be configured through a GUI, the agent cannot touch it.

If you want to learn how to work this way, we run agentic coding training for development teams and agencies.

What We Learned Building This

This system is the first step in building an agentic system that can sit with leaders and provide deep insight into important meetings and discussions. Performing at the level of thought that difficult decisions and high-stakes situations demand.

The project was fun to build. With the agentic coding approach, it runs smoothly at a fraction of the time, cost, and complexity. The end result is a better designed and built system than we could have produced by hand-coding it the traditional way.

There is still serious craft required to work this way, but the effort is in the solution design. Collaborating with AI to refine the solution until it’s better than either could produce alone.

As Martin Fowler’s team at ThoughtWorks notes, “CLI coding agents represent a fundamental shift from AI as a writing assistant to AI as a development partner.” This case study is what that shift looks like in production, with real systems, real clients, and real money flowing through the wires.

We build managed autonomous AI agents for companies that need this kind of capability. We also run agents in our own operations — an autonomous SEO research agent and an autonomous content pipeline that produced this article. The gap between conversational AI and agentic AI is where the real value lives, and it’s where we spend our time.

What’s Next

With the MVP completed the application will now be tested in real life calls. Based on feedback we will spec and document the contents of the subsequent development phase. Our next phase will focus primarily on quality improvements, customer feedback but also on technical aspects like system scaling to handle multiple parallel sessions.

Frequently Asked Questions

Can AI coding agents really build production integrations without human code?

Yes. This case study documents a complete multi-system integration (Twilio, Microsoft Teams, Supabase, n8n) where zero lines of code were written by a human. The AI agent authored every workflow, every database schema, every configuration file, and every shell script. The human role was provisioning accounts, directing milestones, and reviewing output.

What kinds of projects are best suited for AI agent development?

Projects with API-first tooling, clear architectural constraints, and milestone-based build plans. The systems being integrated need to expose REST APIs. If the only way to configure a tool is through a GUI, the agent cannot work with it. Multi-system integrations, workflow automation, and data pipeline projects are strong candidates.

How much does it cost to have an AI agent build a multi-system integration?

In this case, the infrastructure and telephony costs for the entire build-and-test cycle were under $5. The primary cost is the AI agent’s API usage (per-token pricing) and the human director’s time for supervision and review. Traditional development of equivalent scope would typically involve multiple developer-weeks of effort.

What role does the human play when an AI agent is doing the building?

The human is the director, not the builder. That means three things: provisioning accounts and API keys before the build, directing the approach at each milestone, and reviewing what the agent produces. The human’s expertise in architecture, constraint-setting, and quality judgment remains essential. The mechanical implementation is delegated.

What happens when the AI agent encounters a bug it can’t solve?

In our experience, the agent resolves most integration bugs within 2 to 3 retry cycles by inspecting error logs, forming a hypothesis, modifying the code, and retesting. For problems requiring information the agent doesn’t have (like tenant-specific Teams auto-attendant timing), it runs iterative tests to converge on the solution empirically. The human steps in with domain knowledge or redirects the approach when the agent is stuck.

How does this compare to traditional development approaches?

Traditional development would have a developer manually reading API documentation, writing code, configuring UIs, debugging integration issues, and documenting the work separately. With the agent-based approach, all of those steps happen in a continuous automated loop. The agent also produces documentation as a natural byproduct of its working method, and it can iterate on bugs without fatigue. The trade-off is that the human needs to know enough about the systems to direct well and recognize correct output.

AI Agent Case Study: How an AI Coding Agent Built a Voice Intelligence Platform Without Writing a Single Line of Code

The Brief

The Architecture

How the AI Agent Built It

Milestone 1: Infrastructure Validation

Milestone 2: Dial-In and Transcription

Milestone 3: Voice-Out (AI Speech Injection)

The AI Agent’s Toolkit

What the Human Actually Did

The Gotchas: What the Agent Had to Figure Out

Cost of the Build

What This Proves

The Hardest Parts (And They Weren’t Code)

What the Client Says

How to Reproduce This Pattern

What We Learned Building This

What’s Next

Frequently Asked Questions

Can AI coding agents really build production integrations without human code?

What kinds of projects are best suited for AI agent development?

How much does it cost to have an AI agent build a multi-system integration?

What role does the human play when an AI agent is doing the building?

What happens when the AI agent encounters a bug it can’t solve?

How does this compare to traditional development approaches?

Want to know more?

What problems do you help solve?

What are your capabilities?

Who are you?

Real experts you can talk to.
Just one call away:

I’m not ready to talk, but I want to learn more about you

We take your business to the next level.

Like what you are reading? Sign up here.

Impact Manifest

Impact Manifest

The Brief

The Architecture

How the AI Agent Built It

Milestone 1: Infrastructure Validation

Milestone 2: Dial-In and Transcription

Milestone 3: Voice-Out (AI Speech Injection)

The AI Agent’s Toolkit

What the Human Actually Did

The Gotchas: What the Agent Had to Figure Out

Cost of the Build

What This Proves

The Hardest Parts (And They Weren’t Code)

What the Client Says

How to Reproduce This Pattern

What We Learned Building This

What’s Next

Frequently Asked Questions

Can AI coding agents really build production integrations without human code?

What kinds of projects are best suited for AI agent development?

How much does it cost to have an AI agent build a multi-system integration?

What role does the human play when an AI agent is doing the building?

What happens when the AI agent encounters a bug it can’t solve?

How does this compare to traditional development approaches?

Want to know more?

What problems do you help solve?

What are your capabilities?

Who are you?

Real experts you can talk to.Just one call away:

I’m not ready to talk, but I want to learn more about you

We take your business to the next level.

Like what you are reading? Sign up here.

Impact Manifest

Impact Manifest

Real experts you can talk to.
Just one call away: