Manifesto · why we exist

Voice AI ismisnamed.

By Kalpesh Upadhyay · Founder, iBridge Digital · May 2026

Read

~6 min

The voice part is the cheap part. Speech-to-text and text-to-speech are commodity APIs you rent for fractions of a cent per minute. A college sophomore can wire one up in an afternoon. The hardest engineering problem in this category is not making a computer talk on the phone. The hardest engineering problem is everything that happens between the listening and the speaking — and that is the part most vendors in this space have not built.

I have spent twenty-five years shipping production software for clients across forty industries. I have watched a dozen technology trends arrive, peak, and either deliver or evaporate. AI in healthcare is the first trend in a long time where the gap between the demo and the production deployment is wide enough to swallow entire operations teams. That gap is what this company exists to close.

The wrappers

Most of what is being sold as "healthcare voice AI" right now is a thin layer of prompt engineering wrapped around three commodity APIs: speech recognition, a large language model, and voice synthesis. The vendor adds a UI, picks a vertical, raises a round, and ships. The demos are beautiful. They have to be — the demo is the entire product.

Then the customer turns it on in production and discovers what nobody talks about in the demo. The agent says things it was never taught to say. The agent confidently makes claims about authorization status that do not exist. The agent has no record of why it routed a call the way it did. When the payer changes its IVR menu on a Tuesday morning, the agent loops in the menu tree for forty minutes before someone notices.

The wrappers are not built for production. They are built for the round. There is a difference, and an operations team learns the difference in their first month, often at significant cost.

The wrappers are not built for production. They are built for the round.

What is actually hard

The unsexy parts. The parts no founder gets to demo on a podcast. The parts that are five percent of the marketing surface area and ninety-five percent of whether the deployment lives or dies.

A serious voice AI for healthcare needs scope discipline — the agent must refuse to say things it has not been authorized to say, even when the prompt could plausibly produce them. It needs an authority chart — a defined set of who can teach the agent what, with verification before any teaching takes effect. It needs layered knowledge — foundation models for language, authoritative APIs for facts (CMS coverage, payer policy, AAPC codes), and clinic-specific memory that the customer can edit. It needs hash-chained audit — a tamper-evident log of every state change, queryable by the customer's engineering team. It needs PHI-zero architecture — patient data lives only at the boundary, only for the duration of one call, and is destroyed before any persistent system writes the call.

None of this shows up in a thirty-second demo video. All of it shows up in production within the first ninety days. This is what we have spent the last several years building.

The thing I am most likely wrong about

I am willing to be wrong about how big this market is. I think the operational layer of US healthcare is one of the largest underserved software markets on earth. I might be overestimating it. Healthcare is famously slow to buy, famously protective of incumbent vendors, famously skeptical of anything that touches clinical workflow. We may discover that the market we want exists at one tenth of the size we believe.

I am willing to be wrong about timing. The voice models keep improving. Within a few years, what is hard today (mid-call disambiguation, payer IVR navigation, graceful handoff) will be solved by the foundation models themselves. The discipline layer we are building — the scope, the authority, the audit — will still matter, but the wrappers will get more convincing in their demos and harder to distinguish from us in the procurement review.

I am willing to be wrong about positioning. We have chosen to call our agents "colleagues" rather than "assistants" or "bots." I think this captures something true about how a serious agent should be onboarded, trained, and supervised. But I might be selling a frame the market isn't ready to buy.

What I am not willing to be wrong about is the architecture. PHI never persists to the platform. Reasoning operates on typed proofs, not patient records. Every decision is captured in an audit log the customer can query. These are not features. They are the structural commitments of the company. If we are wrong about anything, we are not wrong about these.

What I am asking

If you are evaluating voice AI for a clinic group or an RCM operation, I am asking you to ask any vendor — including us — the same five questions:

Show me the audit log. Not in a deck. Live. Show me what gets captured for one call, and let me query it from your dashboard.
Tell me what your agent will refuse. If the answer is "our agent is helpful and will try to answer anything," you are looking at a wrapper. A serious agent has an explicit, customer-editable scope.
Where does PHI live, and for how long? If patient data sits in a database the vendor can read, you are buying a compliance liability disguised as a product.
What happens when your agent does not know? Does it hallucinate? Does it escalate? Does it route the call cleanly to a human with the full context attached, or does the human start over at hold-position-one?
Run a hundred of my real calls through your system before we contract. Any vendor unwilling to run a benchmark on actual customer calls is selling you their best-case scenario, not yours.

These five questions will eliminate most of your evaluation set in the first conversation. The vendors that remain will be the small set of companies actually building production systems. We are one of them. There are others. The point is to find them.

Why I am writing this

Because the cost of getting voice AI wrong in healthcare is asymmetric. The wrong vendor does not waste your quarter; it wastes your year and burns your operations team's trust in AI for the next several years after that. The right vendor compounds. Picking correctly is one of the more consequential procurement decisions a clinic group or RCM operation will make this decade.

I have a commercial interest in writing this, obviously. We are a vendor. We want your business. But I would write this manifesto whether or not I owned a company, because I have watched too many operations teams get burned by demo-driven sales cycles, and the only durable defense is buyers who know what to ask.

If you are at the start of your evaluation, read on through our security posture. If you have already been burned and are evaluating again, send me an email at business@ibridgedigital.com. I read every one personally, and I will tell you honestly whether what we have built is a fit for what you need. If it is not, I will tell you what is.

— Kalpesh Upadhyay
Founder, iBridge Digital · Mumbai · May 2026

The architecture
behind the argument.

Watch a call resolve through our two-zone boundary model. PHI lives at the boundary, briefly, and never persists. The core operates on typed proofs only. Every state change is captured in a hash-chained audit log.

See the architecture