Voice AI

Dukan

Voice-AI Ordering Platform for India's Kirana Stores

Multilingual voice agent on Bolna AI that takes phone orders for neighbourhood supermarket chains in Hindi, English, and Marathi. Customers request a callback; the agent dials, identifies them by name from a pre-seeded record, takes the order with code-switching support, validates the delivery address, and writes the order to the merchant’s realtime dashboard. Disputes, refunds, and bulk orders escalate to human operators with full conversation context. Built end-to-end solo in ~14 hours as a take-home for Bolna AI’s Full-Stack Engineer role.

Scroll

Provider-agnostic voice abstraction (1 env var swap)

Live tool-drift diff with deployed Bolna agent

End-to-end shipped solo in 14 hours

Key Features

Provider-agnostic voice layer (`VoiceProvider` interface) with adapters for Bolna (live), Vapi, and a custom ElevenLabs + Deepgram stack — swappable via one env var

5 webhook-driven tools (lookup_customer, catalog_search, validate_address, place_order, escalate_to_human) with shared-secret header auth replacing brittle IP allowlists

Live drift detection: /agent page fetches the deployed agent from Bolna's API on each render and diffs every tool against local source-of-truth (synced / drift / missing badges)

Multilingual catalog search with Devanagari + Marathi aliases, tokenizer dropping Hindi/Marathi unit stop-words (किलो, ग्राम, एक, do)

Pre-call customer upsert so the agent greets by name from word one (no "who am I speaking to" cold-start)

Realtime operator dashboard with live call indicator, order pipeline (pending → confirmed → dispatched → delivered), and escalation queue with resolution notes

Server-side SKU + price re-verification on place_order — LLM never sets the order total (security boundary)

Transcript polymorphism handling (string OR array OR null) normalised at the boundary, defensive against Bolna webhook shape variance

Reproducible Bolna setup via scripts: create-bolna-agent, patch-tool-headers, sync-system-prompt, verify-bolna-tools, diagnose-bolna-tools — no dashboard clicking

Tech Stack

Next.js 16 (App Router)React 19 Server ComponentsTypeScript strictSupabase (Postgres + Realtime + Auth)Bolna AIDeepgram (STT)BunVercelTailwind v4Zod

Walkthrough

Merchant dashboard walkthrough — live call indicator, order pipeline, escalation queue, and the /agent diff view (5 min)

Call recording

Uncut audio of a real customer call — Hindi/English code-switching, agent confirms the order, places it

The Problem

Kiranas live on the phone — but phones don't scale

India's 13 million neighbourhood supermarkets process roughly ₹40 lakh-crore a year, and most still take orders over the phone. A typical three-outlet kirana spends ₹95k a month on phone operators alone. At peak hours, 15–25% of incoming calls drop — the customer hangs up and orders from Zepto or Blinkit instead.

The work is also brutally repetitive: 60–80% of calls are repeat customers with similar baskets week after week. That's the gap voice AI should close — not the work humans do well (judgment, escalation), but the work humans do badly (taking the same order from the same person for the fortieth time).

Outcome metric: call-to-order conversion at peak hour. Today ~60%. Target with the agent: 90%+.

Architecture

Provider-agnostic voice layer

The whole voice backend sits behind a single `VoiceProvider` interface — `createAgent`, `dispatchOutbound`, `parseWebhook`, `verifyWebhookSignature`, `getCallRecording`. Bolna is the live adapter; Vapi and a custom ElevenLabs + Deepgram + LLM stack are stubbed. One env var (`VOICE_PROVIDER`) swaps the entire backend.

On top of that, the /agent page does something most voice-AI projects skip: it fetches the deployed agent from Bolna's API on every render and diffs each tool against the local source-of-truth catalogue. Drift, missing tools, mismatched required-params — all surface as per-tool badges. The page becomes a live audit, not a static README.

Local files are the source of truth; Bolna is the deployment target. The /agent page proves the two are in sync.

Engineering Decisions

The trade-offs that mattered

Tool auth: started with Bolna's documented IP allowlist (13.203.39.153). The webhook worked, but every tool call returned 401 — because Bolna's tool runtime is a worker pool with shifting IPs, not a stable origin. Swapped to a shared-secret header (`x-dukan-tool-secret`) injected per-tool via Bolna's `tools_params.headers`. IP allowlist stays only on the webhook (stable origin there).

Multilingual catalog: Deepgram transcribes Hindi speech in Devanagari. The agent then passes "आलू" or "टमाटर" to `catalog_search`. Our seed only had romanised aliases, so every Hindi query returned zero results. Fix: tokenize the query, drop Hindi/Marathi unit stop-words (`किलो`, `ग्राम`, `एक`, `दो`), search aliases + name_default + `name_localized->>hi` + `name_localized->>en` per token. Then a one-shot migration to enrich aliases with Devanagari and Marathi variants (`batata`, `kanda`, `sakhar`).

Pre-call upsert: when a customer submits the callback form, we save them (and their address) to the DB *before* dispatching the call. So when the agent invokes `lookup_customer` during Bolna's 2-second welcome message, it returns `found: true` and the agent greets by first name from word one. No cold-start "who am I speaking to."

Server-side price verification: `place_order` re-verifies SKUs and recalculates total_paise from the catalogue. The LLM never sets the order total. That single boundary closes a class of security holes.

The Surface

Operator-grade UI, not a chat demo

The merchant dashboard is the part most voice-AI submissions skip. It's the difference between "agent works" and "this is a product." Realtime call indicator pulses red when a call connects. Order cards land on the dashboard *before* the call ends (Supabase Realtime). The escalation queue has a state machine — queued → taken → resolved / abandoned — with inline resolution notes. The /agent page shows the deployed Bolna config diffed against local. All gated behind a Supabase Auth wall with a seeded admin so reviewers can actually click in.

Economics

Why a kirana would pay for this

Per outlet: 4 phone operators × ₹22k/month = ₹88k. With the agent, you keep one supervisor (₹22k) and the agent handles concurrent calls unboundedly. Peak-hour drop rate goes from 15–25% to under 2%. Per-call variable cost goes from ~₹3 (operator) to ~₹2 (Bolna trial-grade).

Per-outlet savings: roughly ₹65k/month. Break-even in week 1. The moat compounds with usage — the catalogue (aliases, substitutions, brand preferences) and the saved-address graph both get richer per outlet, per month.

What's Next

Roadmap, if this becomes real

Two weeks out: inbound numbers (architecture is ready — blocked only on Indian Pvt Ltd KYC), WhatsApp confirmation + Razorpay payment link, a catalog CMS so operators add SKUs without touching seeds.

Quarter out: browser-audio operator handoff during escalation (so the human picks up *inside the same call*), last-order substitution prompts ("aata khatam ho gaya — wohi laaun?"), and multi-outlet routing by pincode.