Claw Mart
← Back to Blog
February 18, 202610 min readClaw Mart Team

Voice AI Agents: Build Automated Phone Receptionists

Build voice agents that answer phone calls for businesses 24/7. Replaces $3k-5k/month receptionists.

Voice AI Agents: Build Automated Phone Receptionists

Most businesses are still paying $3,000–$5,000 a month for someone to answer the phone.

Think about that for a second. A full-time receptionist whose primary job is to pick up calls, say "Thanks for calling [Business Name], how can I help you?", route the caller to the right person, and maybe book an appointment. That's the job. And it comes with PTO, benefits, sick days, training, turnover, and the inevitable reality that nobody can answer the phone 24 hours a day, 7 days a week.

Meanwhile, your competitors' missed calls are going straight to voicemail. And nobody leaves voicemails anymore. They just call the next business on Google.

Here's what's changed: you can now build a voice AI agent that handles inbound calls, qualifies leads, books appointments, answers FAQs, and routes complex calls to humans—running 24/7 for a fraction of the cost. The technology is finally good enough. Latency is under 500 milliseconds. The voices sound human. And the tooling has matured to the point where you don't need a machine learning team to set it up.

The platform I'd recommend starting with is VAPI. Let me walk you through why, how it works, who should use it, and exactly how to build one.

The Receptionist Problem Is a Math Problem

Let's lay out the numbers honestly.

A full-time receptionist in the US costs $36,000–$60,000/year in salary alone. Add payroll taxes, benefits, and overhead, and you're looking at $3,000–$5,000/month, minimum. For that money, you get coverage roughly 40 hours a week—maybe 45 if they're dedicated. That's about 25% of the hours in a week.

What happens the other 75% of the time? After hours, weekends, lunch breaks, holidays? Those calls go unanswered.

Now consider a VAPI-powered voice agent:

  • Cost: ~$0.05–$0.10 per minute of talk time, all in. A business handling 500 calls/month averaging 3 minutes each would pay roughly $75–$150/month. Not $3,000. Not $5,000. Under $200.
  • Availability: 24/7/365. No sick days. No turnover. No training new hires every six months.
  • Scalability: Handles thousands of concurrent calls. Your agent never puts anyone on hold because it's already talking to someone else.

The ROI math is so comically lopsided that the only reason every small business isn't doing this yet is because most business owners don't know it's possible. That's the gap. And if you're an agency or a developer, that's your opportunity.

What VAPI Actually Is (And Isn't)

VAPI (Voice API) is a developer-focused platform that abstracts away the nightmare of stitching together telephony, speech-to-text, large language models, and text-to-speech into a single coherent system. It launched around 2023 and has quickly become the go-to for building production-grade voice agents.

Here's what it handles under the hood:

LayerWhat It DoesProviders VAPI Integrates
TelephonyManages actual phone calls (SIP/WebRTC)Twilio, Plivo
Speech-to-TextTranscribes what the caller says in real-timeDeepgram, AssemblyAI, Whisper
LLMGenerates intelligent responsesOpenAI GPT-4o, Anthropic Claude, Grok
Text-to-SpeechSpeaks the response back naturallyElevenLabs, PlayHT, AWS Polly

Before VAPI, building something like this meant wiring together four or five different APIs yourself, handling audio streaming, managing conversation state, dealing with interruption detection (when someone talks over the AI), and praying the latency didn't make the conversation feel like a satellite phone call from 2004.

VAPI packages all of that into a single API. You configure an assistant, point it at a phone number, and it works.

What VAPI is not: It's not a no-code chatbot builder (though it has a dashboard). It's not a call center platform like Five9 or Dialpad. It's infrastructure for developers and agencies who want to build voice AI products, either for themselves or for clients.

How to Build a Voice AI Receptionist with VAPI

Let me walk through the actual implementation. This isn't theoretical—this is what you'd do to get a working AI receptionist answering calls for a real business.

Step 1: Create Your VAPI Account and Get a Phone Number

Sign up at vapi.ai. You get $10 in free credits, which is enough for roughly 100–200 minutes of testing depending on your provider choices.

From the dashboard, provision a phone number. VAPI lets you buy numbers directly or port in existing ones. For testing, just grab a new number.

Step 2: Configure Your Assistant

This is where the magic happens. An "Assistant" in VAPI is the configuration object that defines everything about your voice agent: its personality, knowledge, voice, and capabilities.

Here's what you need to set:

LLM Selection: I'd start with GPT-4o for the best balance of speed and intelligence. Claude is great for longer, more nuanced conversations. For cost-sensitive deployments, GPT-4o-mini works surprisingly well.

System Prompt: This is the single most important piece. A bad prompt gives you a bad agent. Here's a real-world example for a dental office:

You are Sarah, the virtual receptionist for Bright Smile Dental.

Your responsibilities:
1. Answer incoming calls warmly and professionally
2. Determine the caller's intent (new appointment, reschedule, billing question, emergency)
3. For new appointments: collect name, phone number, preferred date/time, and reason for visit
4. For emergencies: advise the caller to go to the nearest ER if life-threatening, otherwise note the issue and mark as urgent callback
5. For billing questions: collect the caller's name and account number, then let them know someone from billing will call back within 2 hours

Office hours: Monday-Friday 8am-5pm, Saturday 9am-1pm
Providers: Dr. Martinez (general), Dr. Pham (orthodontics), Dr. Williams (pediatric)

Rules:
- Never provide medical advice
- Always confirm the information you've collected before ending the call
- Be concise. Patients don't want a chatty receptionist.
- If you don't know something, say "Let me have someone from our team follow up on that" — never guess

Voice Selection: ElevenLabs voices sound the most natural right now. Pick one that matches the brand. A law firm wants something different than a surf shop.

Tools / Function Calling: This is where it gets powerful. You can connect your assistant to external systems via function calls:

  • Google Calendar: Check real-time availability and book appointments
  • CRM (HubSpot, Salesforce): Create new contacts, log call notes
  • Custom webhooks: Trigger any workflow you want (Slack notification, database write, email confirmation)

Step 3: Set Up Function Calling for Appointment Booking

The most common use case is booking appointments, so let's make it concrete. VAPI supports function calling natively—you define tools that the LLM can invoke during the conversation.

Here's a simplified example of defining a booking tool:

const assistant = {
  model: {
    provider: "openai",
    model: "gpt-4o",
    messages: [
      {
        role: "system",
        content: "You are Sarah, the virtual receptionist for Bright Smile Dental..."
      }
    ],
    tools: [
      {
        type: "function",
        function: {
          name: "book_appointment",
          description: "Book a dental appointment for the caller",
          parameters: {
            type: "object",
            properties: {
              patient_name: {
                type: "string",
                description: "Full name of the patient"
              },
              phone_number: {
                type: "string",
                description: "Patient's callback number"
              },
              preferred_date: {
                type: "string",
                description: "Preferred appointment date (YYYY-MM-DD)"
              },
              preferred_time: {
                type: "string",
                description: "Preferred appointment time (HH:MM)"
              },
              reason: {
                type: "string",
                description: "Reason for the visit"
              },
              provider: {
                type: "string",
                description: "Requested dentist name, if any"
              }
            },
            required: ["patient_name", "phone_number", "preferred_date", "reason"]
          }
        }
      }
    ]
  },
  voice: {
    provider: "11labs",
    voiceId: "your-voice-id"
  },
  transcriber: {
    provider: "deepgram",
    model: "nova-2"
  }
};

When the LLM decides it has enough information to book, it calls book_appointment, which hits your server URL. Your server then does the actual booking (write to Google Calendar, update your CRM, send a confirmation SMS) and returns a result that the AI reads back to the caller.

Step 4: Trigger an Outbound Call (or Handle Inbound)

For inbound, you just assign the assistant to your provisioned phone number in the VAPI dashboard. Done. Every call to that number gets picked up by your agent.

For outbound (e.g., appointment reminders, follow-ups), it's a simple API call:

const Vapi = require('@vapi-ai/server-sdk');
const vapi = new Vapi({ token: 'your-api-key' });

await vapi.calls.create({
  assistantId: 'your-assistant-id',
  phoneNumberId: 'your-phone-number-id',
  customer: {
    number: '+11234567890'
  }
});

That's it. The call goes out, the AI runs the conversation, and you get a transcript and recording in your dashboard afterward.

Step 5: Monitor, Iterate, Improve

VAPI gives you full transcripts, recordings, and analytics for every call. This is crucial. Your first version of the prompt will be maybe 70% good. You need to listen to real calls, find where the agent stumbles, and refine.

Common fixes:

  • Agent talks too much: Tighten the prompt. Add "Be concise. Keep responses under 2 sentences."
  • Doesn't handle interruptions well: VAPI's barge-in detection is solid, but you can tune sensitivity.
  • Hallucinates information: Add explicit rules about what NOT to say. "Never quote prices. Never confirm insurance coverage."
  • Awkward pauses: Switch STT providers. Deepgram Nova-2 tends to have the lowest latency.

Who Should Be Using This Right Now

Some industries are absurdly well-suited for voice AI agents. If you're building an agency or looking for clients to pitch, start here:

Dental and Medical Offices: The average dental practice misses 30-40% of inbound calls. Each missed call from a new patient is worth $500–$1,200 in lifetime value. An AI receptionist that catches every call and books the appointment pays for itself on the first call of the month.

Law Firms: Especially personal injury and immigration firms. These get high call volume from potential clients who will call the next firm on the list if they hit voicemail. An AI agent that qualifies the lead (type of case, timeline, jurisdiction) and books a consultation is worth its weight in gold.

Home Services (HVAC, Plumbing, Electrical): These businesses live and die by the phone. When someone's AC goes out in July, they're calling three companies and booking with whoever answers first. A 24/7 AI agent that takes the call, captures the issue, and schedules a service window wins that job every time.

Real Estate: Agents are terrible at answering their phones because they're constantly in showings, meetings, and open houses. An AI agent that fields buyer/seller inquiries, captures lead details, and books showing appointments is a competitive advantage.

Auto Dealerships and Service Centers: Inbound calls for service appointments, parts inquiries, and test drive scheduling. High volume, repetitive, and easily scriptable—perfect for voice AI.

Hospitality (Hotels, Restaurants): Reservation booking, hours/location inquiries, event inquiries. Restaurants especially lose a staggering number of reservations to missed calls during the dinner rush when the staff is too busy to answer.

The Real Conversation: What About Quality?

Let me be honest about where things stand.

Voice quality: The best TTS voices (ElevenLabs, specifically) are now good enough that many callers won't immediately realize they're talking to AI. Not all callers—some will notice. But the gap has narrowed dramatically in just the last 12 months.

Conversation intelligence: GPT-4o can handle the vast majority of receptionist-level conversations. We're talking about routing calls, answering basic questions, and collecting information. This isn't asking the AI to negotiate a merger. It's asking it to do what a first-week receptionist does.

Edge cases: There will always be calls the AI can't handle. Someone who's upset and wants to escalate. A complex insurance question. A caller with a heavy accent and a bad cell connection. The key is designing your agent to recognize these moments and transfer to a human gracefully. VAPI supports warm transfers—the AI can say "Let me connect you with someone who can help with that" and bridge the call to a real person.

Latency: This was the dealbreaker 18 months ago. It's not anymore. With VAPI's streaming architecture, end-to-end response time is under 500 milliseconds. That's faster than most humans pause before responding.

The Regulatory Piece (Don't Skip This)

If you're making outbound calls, you need to know about TCPA compliance in the US. The rules around automated calls are strict. You generally need prior express consent to call someone with an AI agent. Inbound calls are far simpler from a compliance standpoint since the caller initiated the conversation.

For healthcare applications, HIPAA matters. Make sure your VAPI configuration uses HIPAA-compliant providers and that you have a BAA (Business Associate Agreement) in place.

This isn't legal advice. Get a lawyer if you're deploying at scale. But don't let compliance fear stop you from building—just build responsibly.

Comparing VAPI to Alternatives

VAPI isn't the only option. Here's how it stacks up:

PlatformBest ForPricing ModelDeveloper-Friendliness
VAPICustom voice agents, agenciesPer-minute usage★★★★★
Retell AISimilar to VAPI, slightly simplerPer-minute usage★★★★☆
Bland.aiHigh-volume outbound campaignsPer-minute usage★★★★☆
VoiceflowChatbot-first with voice add-onSubscription★★★☆☆
PlayAIVoice cloning focusPer-minute usage★★★☆☆

I recommend VAPI for most use cases because of its flexibility with provider choices (you're not locked into one STT or TTS vendor), its function calling support, and its documentation. It's the platform that gets out of your way the fastest.

What to Do Next

If you've read this far, you're either a developer who wants to build this, a business owner who wants this for your own company, or an agency owner who sees the opportunity to sell this as a service. Here's your next step for each:

Developers: Sign up for VAPI, burn through the $10 free credits building a test agent, and try calling it. Get the end-to-end flow working with a simple use case (answer the phone, collect a name and reason for calling, end the call). Then layer on function calling and integrations. The docs at docs.vapi.ai are solid. Their Discord community is active.

Business Owners: You have two options. Build it yourself if you're technical, or hire someone. If you want it done right without learning the tech stack, this is exactly what we do at Claw Mart. We build and deploy voice AI agents for businesses—set up the assistant, connect your calendar, configure the phone number, and hand you a system that answers your calls 24/7.

Agency Owners: This is one of the highest-margin services you can offer right now. Build one voice agent template per vertical (dental, legal, HVAC), customize the prompt and integrations per client, and charge $500–$1,500/month for a service that costs you $50–$150/month in API fees. The demand is massive and most businesses have no idea this exists yet. That window won't stay open forever.

The technology is here. The economics make sense. The businesses that figure this out first are the ones that stop losing customers to voicemail. Everyone else will catch up eventually—but by then, the early movers will have already captured the market.

Stop paying $5,000/month for someone to answer the phone. Build the thing that never sleeps.

Recommended for this post

$39

Knows things before everyone else — connected, informed, generous with intel, and always two weeks ahead of the headlines

Research
GeoffGuidesGeoffGuides
Buy
The Mentor

The Mentor

Persona

$29

An experienced guide who saves you from mistakes they already made — warm but direct, tough love when needed

Other
GeoffGuidesGeoffGuides
Buy
$29

A ruthlessly efficient executor with zero tolerance for waste, fluff, or unnecessary complexity — maximum output, minimum friction

Ops
GeoffGuidesGeoffGuides
Buy

More From the Blog