Vapi | CodeWorlds

Vapi - complete guide to the voice AI agent platform

What is Vapi?

Vapi is a developer platform for building, testing, and deploying voice AI agents. Instead of connecting speech-to-text, language models, and text-to-speech yourself, Vapi provides a unified pipeline that handles the entire conversation cycle: listening, thinking, and speaking.

The platform has processed over 300 million calls, launched 2.5 million assistants, and is used by more than 500 thousand developers - from startups to Fortune 500 companies. Vapi lets you build your first voice agent in less than an hour, and thanks to its modular architecture, you can choose any LLM, TTS, and STT provider.

Why Vapi?

Key advantages of Vapi

Ultra-low latency - Sub-500ms voice-to-voice response time, making conversations sound natural
BYO Keys - Bring your own API keys to any providers (OpenAI, Anthropic, ElevenLabs, Deepgram, and more)
Flow Studio - Visual builder for designing conversation logic with drag-and-drop
Tool calling - Agents can call your APIs mid-conversation (bookings, CRM, databases)
Multilingual - Over 100 languages and accents
Scalability - Handles millions of concurrent calls
SDK for every platform - Web, iOS, Android, Python, backend

Vapi vs Retell AI vs Bland AI

Feature	Vapi	Retell AI	Bland AI
Latency	~700ms	~600ms	~800ms
Base cost	$0.05/min (+ providers)	~$0.07/min (all-in)	~$0.09/min
True cost	$0.13-$0.31/min	~$0.07-$0.15/min	~$0.09-$0.20/min
BYO models	Full support	Limited	Full support
Visual builder	Flow Studio	Yes	Pathways builder
Open source	No	No	No
HIPAA	$1,000/mo add-on	Included	Included
SDK	Web, iOS, Android, Python	Web, Python	REST API
Concurrent calls	1M+	Unlimited	20,000+/h
Best for	Custom builds, dev-heavy	Inbound support, healthcare	High-volume outbound

Architecture - how does Vapi work?

Vapi acts as an orchestration layer between your application and AI providers. The entire conversation cycle is based on three steps:

1. Listen

Speech-to-text (STT) converts the caller's voice into text. Vapi supports providers like Deepgram, Whisper (OpenAI), Gladia, and Azure Speech.

2. Think

The LLM processes the transcription and generates a response. You can use GPT-4o, Claude, Gemini, Llama, or any other model. In this step, the agent can also call tools (tool calling).

3. Speak

Text-to-speech (TTS) converts the response into voice. Supported providers include ElevenLabs, Play.ht, Azure TTS, LMNT, and others.

Advanced conversation features

Endpointing - Detects when the caller has finished speaking
Interrupt detection - Allows the caller to interrupt the agent while speaking
Backchanneling - Inserts short acknowledgments ("okay", "I see") during processing
Emotion detection - Identifies tone signals such as urgency or frustration

Getting started

SDK installation

Code

Bash

npm install @vapi-ai/web

npm install @vapi-ai/server-sdk

pip install vapi_server_sdk

CLI installation

Code

Bash

curl -sSL https://vapi.ai/install.sh | bash

The CLI automatically detects your tech stack (React, Vue, Next.js, Python, Go, Flutter, React Native) and generates code examples tailored to your project.

Creating an assistant

Code

TypeScript

import Vapi from "@vapi-ai/web";

const vapi = new Vapi("YOUR_PUBLIC_API_KEY");

const assistant = await vapi.assistants.create({
  name: "Customer Support Agent",
  firstMessage: "Hello! How can I help you today?",
  model: {
    provider: "openai",
    model: "gpt-4o",
    temperature: 0.7,
    messages: [
      {
        role: "system",
        content:
          "You are a friendly customer support agent. Help users with their orders and account questions.",
      },
    ],
  },
  voice: {
    provider: "11labs",
    voiceId: "21m00Tcm4TlvDq8ikWAM",
  },
});

Code

Python

from vapi import Vapi
import os

vapi = Vapi(token=os.getenv("VAPI_API_KEY"))

assistant = vapi.assistants.create(
    name="Customer Support Agent",
    first_message="Hello! How can I help you today?",
    model={
        "provider": "openai",
        "model": "gpt-4o",
        "temperature": 0.7,
        "messages": [
            {
                "role": "system",
                "content": "You are a friendly customer support agent. Help users with their orders and account questions.",
            }
        ],
    },
    voice={
        "provider": "11labs",
        "voiceId": "21m00Tcm4TlvDq8ikWAM",
    },
)

Starting a call (Web)

Code

TypeScript

import Vapi from "@vapi-ai/web";

const vapi = new Vapi("YOUR_PUBLIC_API_KEY");

vapi.start("YOUR_ASSISTANT_ID");

vapi.on("call-start", () => {
  console.log("Call connected");
});

vapi.on("call-end", () => {
  console.log("Call ended");
});

vapi.on("message", (message) => {
  if (message.type === "transcript") {
    console.log(`${message.role}: ${message.transcript}`);
  }
});

vapi.on("error", (error) => {
  console.error("Call error:", error);
});

Starting a phone call (Server)

Code

Python

from vapi import Vapi
import os

vapi = Vapi(token=os.getenv("VAPI_API_KEY"))

call = vapi.calls.create(
    phone_number_id="YOUR_PHONE_NUMBER_ID",
    customer={"number": "+1234567890"},
    assistant_id="YOUR_ASSISTANT_ID",
)

print(f"Call started: {call.id}")

Tool calling

Tool calling is one of Vapi's most powerful features. It allows the agent to call your APIs mid-conversation - check order status, book appointments, update CRM, or fetch data from databases.

Creating a tool via API

Code

Bash

curl -X POST 'https://api.vapi.ai/tool' \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer YOUR_API_KEY' \
  -d '{
    "type": "function",
    "function": {
      "name": "get_order_status",
      "description": "Check the current status of a customer order",
      "parameters": {
        "type": "object",
        "properties": {
          "order_id": {
            "type": "string",
            "description": "The unique order identifier"
          }
        },
        "required": ["order_id"]
      }
    },
    "server": {
      "url": "https://your-api.com/webhook/vapi"
    }
  }'

Adding a tool to an assistant

Code

Bash

curl -X PATCH 'https://api.vapi.ai/assistant/ASSISTANT_ID' \
  -H 'Authorization: Bearer YOUR_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": {
      "provider": "openai",
      "model": "gpt-4o",
      "toolIds": ["your-tool-id"]
    }
  }'

Managing tools via CLI

Code

Bash

vapi tool list

vapi tool get <tool-id>

vapi tool create

vapi tool test <tool-id>

vapi tool delete <tool-id>

Handling a tool call on the server

When the agent calls a tool, Vapi sends a request to your server:

Code

JSON

{
  "message": {
    "type": "tool-calls",
    "toolCallList": [
      {
        "id": "toolu_01DTPAzUm5Gk3zxrpJ969oMF",
        "name": "get_order_status",
        "arguments": {
          "order_id": "ORD-12345"
        }
      }
    ]
  }
}

Your server must return the result with the matching toolCallId:

Code

JSON

{
  "results": [
    {
      "toolCallId": "toolu_01DTPAzUm5Gk3zxrpJ969oMF",
      "result": "Order ORD-12345 is currently being shipped. Expected delivery: tomorrow."
    }
  ]
}

Implementing webhooks

Code

TypeScript

import express from "express";

const app = express();
app.use(express.json());

app.post("/webhook/vapi", async (req, res) => {
  const { message } = req.body;

  if (message.type === "tool-calls") {
    const results = await Promise.all(
      message.toolCallList.map(async (toolCall) => {
        if (toolCall.name === "get_order_status") {
          const order = await db.orders.findById(
            toolCall.arguments.order_id
          );
          return {
            toolCallId: toolCall.id,
            result: `Order ${order.id}: ${order.status}. ${order.trackingInfo}`,
          };
        }

        if (toolCall.name === "book_appointment") {
          const booking = await calendar.createEvent(
            toolCall.arguments
          );
          return {
            toolCallId: toolCall.id,
            result: `Appointment booked for ${booking.date} at ${booking.time}.`,
          };
        }

        return {
          toolCallId: toolCall.id,
          result: "Unknown tool",
        };
      })
    );

    return res.json({ results });
  }

  res.json({ received: true });
});

app.listen(3000);

Code

Python

from flask import Flask, request, jsonify

app = Flask(__name__)

@app.route("/webhook/vapi", methods=["POST"])
def handle_webhook():
    message = request.json.get("message", {})

    if message.get("type") == "tool-calls":
        results = []
        for tool_call in message.get("toolCallList", []):
            if tool_call["name"] == "get_order_status":
                order = db.orders.find_by_id(tool_call["arguments"]["order_id"])
                results.append({
                    "toolCallId": tool_call["id"],
                    "result": f"Order {order.id}: {order.status}. {order.tracking_info}",
                })

            elif tool_call["name"] == "book_appointment":
                booking = calendar.create_event(tool_call["arguments"])
                results.append({
                    "toolCallId": tool_call["id"],
                    "result": f"Appointment booked for {booking.date} at {booking.time}.",
                })

        return jsonify({"results": results})

    return jsonify({"received": True})

if __name__ == "__main__":
    app.run(port=3000)

Flow Studio - visual builder

Flow Studio is a visual builder for designing conversation logic without writing code. It enables creating complex conversation scenarios using drag-and-drop.

Flow Studio capabilities

Branching prompts - Different conversation paths depending on responses
Conditional paths - Logic conditions (if VIP customer → route to specialist)
Error fallback - Automatic handling of errors and unexpected responses
Webhook triggers - Calling external APIs at any point in the conversation
Transfer calls - Redirecting to a live agent while preserving context

Squads - multi-assistant orchestration

Squads allow coordinating multiple assistants in a single conversation. Each assistant has its own specialization, and transfers between them preserve the full conversation context.

Example scenario:

Receptionist - Greets the customer and identifies the need
Technical specialist - Resolves technical issues
Sales department - Presents the offer and finalizes the order

Each transfer is seamless - the next assistant knows what the conversation was about.

Providers and integrations

LLM (language models)

Provider	Models
OpenAI	GPT-4o, GPT-4o-mini, o1
Anthropic	Claude Sonnet, Claude Haiku
Google	Gemini 1.5 Pro, Gemini Flash
Meta	Llama 3.1 (via Groq/Together)
Custom	Any model via Custom LLM URL

TTS (text-to-speech)

Provider	Description
ElevenLabs	Most realistic voices, voice cloning
Play.ht	Wide voice selection, competitive pricing
Azure TTS	Enterprise-grade, many languages
LMNT	Fast, low cost
Deepgram	Ultra-fast TTS

STT (speech-to-text)

Provider	Description
Deepgram	Fastest, best latency
Whisper (OpenAI)	Most accurate, more languages
Gladia	Good balance of speed and quality
Azure Speech	Enterprise, HIPAA-ready

Telephony

Provider	Description
Twilio	Most popular, global reach
Telnyx	Competitive pricing, good quality
BYOC	Bring Your Own Carrier - connect your own provider

Vapi Evals - testing agents

Vapi Evals is a framework for testing voice agents before deploying to production. It allows creating simulated conversations and validating agent behavior.

Three validation methods

Exact match - Exact matching for deterministic responses
Regex patterns - Flexible patterns for variable formats
AI judges - Semantic evaluation by AI for complex responses

Testing via CLI

Code

Bash

vapi listen --forward-to localhost:3000/tools/webhook

This command starts a local server that receives webhook events from Vapi and forwards them to your development server in real-time.

MCP (Model Context Protocol)

Vapi supports MCP, allowing the assistant to dynamically use tools provided by MCP servers during a call. Instead of defining tools statically, the agent can discover and call tools at runtime.

Pricing

Base cost

Component	Cost
Vapi platform	$0.05/min
STT (Deepgram)	~$0.01/min
LLM (GPT-4o)	~$0.02-$0.20/min
TTS (ElevenLabs)	~$0.04/min
Telephony (Twilio)	~$0.01/min
Total	~$0.13-$0.31/min

Free credits

New users receive $10 in free credits. At a true cost of $0.13-$0.31/min, that's approximately 30-75 minutes of conversations.

SIP Lines

Every plan includes 10 concurrent SIP lines. Additional lines cost $10/month each.

HIPAA Compliance

HIPAA compliance is available as an add-on for $1,000/month.

Cost comparison

Platform	True cost/min	Free credits	HIPAA
Vapi	$0.13-$0.31	$10	$1,000/mo
Retell AI	$0.07-$0.15	Yes	Included
Bland AI	$0.09-$0.20	Yes	Included

Practical applications

Customer support bot

Code

TypeScript

const assistant = await vapi.assistants.create({
  name: "Support Agent",
  firstMessage: "Hi! Welcome to Acme Support. How can I help you?",
  model: {
    provider: "openai",
    model: "gpt-4o",
    messages: [
      {
        role: "system",
        content: `You are a customer support agent for Acme Corp.
You can check order status, process returns, and answer product questions.
Always be friendly and professional. If you cannot resolve an issue,
offer to transfer to a human agent.`,
      },
    ],
    toolIds: ["order-status-tool", "return-tool", "transfer-tool"],
  },
  voice: {
    provider: "11labs",
    voiceId: "21m00Tcm4TlvDq8ikWAM",
  },
});

Appointment scheduler

Code

TypeScript

const assistant = await vapi.assistants.create({
  name: "Booking Agent",
  firstMessage: "Hello! I can help you schedule an appointment. What day works best for you?",
  model: {
    provider: "openai",
    model: "gpt-4o",
    messages: [
      {
        role: "system",
        content: `You are an appointment scheduling assistant for a dental clinic.
Collect: patient name, preferred date/time, reason for visit.
Check availability using the check_slots tool before confirming.
Always confirm the final booking details before saving.`,
      },
    ],
    toolIds: ["check-slots-tool", "book-appointment-tool"],
  },
  voice: {
    provider: "11labs",
    voiceId: "pNInz6obpgDQGcFmaJgB",
  },
});

Lead qualification

Code

TypeScript

const assistant = await vapi.assistants.create({
  name: "Sales Qualifier",
  firstMessage:
    "Hi! Thanks for your interest in our product. I'd love to learn more about your needs.",
  model: {
    provider: "anthropic",
    model: "claude-sonnet-4-5-20250514",
    messages: [
      {
        role: "system",
        content: `You are a sales qualification agent.
Ask about: company size, current solution, budget range, timeline.
Score the lead as hot/warm/cold based on responses.
Save qualification data using the save_lead tool.
If the lead is hot, offer to schedule a demo with a sales rep.`,
      },
    ],
    toolIds: ["save-lead-tool", "schedule-demo-tool"],
  },
  voice: {
    provider: "playht",
    voiceId: "jennifer",
  },
});

React integration

Code

TypeScript

import { useState } from "react";
import Vapi from "@vapi-ai/web";

const vapi = new Vapi("YOUR_PUBLIC_API_KEY");

export function VoiceAssistant() {
  const [isActive, setIsActive] = useState(false);
  const [transcript, setTranscript] = useState<string[]>([]);

  const startCall = async () => {
    setIsActive(true);
    await vapi.start("YOUR_ASSISTANT_ID");
  };

  const endCall = () => {
    vapi.stop();
    setIsActive(false);
  };

  vapi.on("message", (message) => {
    if (message.type === "transcript" && message.transcriptType === "final") {
      setTranscript((prev) => [
        ...prev,
        `${message.role}: ${message.transcript}`,
      ]);
    }
  });

  return (
    <div className="flex flex-col items-center gap-4 p-6">
      <button
        onClick={isActive ? endCall : startCall}
        className={`px-6 py-3 rounded-full font-medium ${
          isActive
            ? "bg-red-500 hover:bg-red-600 text-white"
            : "bg-blue-500 hover:bg-blue-600 text-white"
        }`}
      >
        {isActive ? "End Call" : "Start Call"}
      </button>
      <div className="w-full max-w-md space-y-2">
        {transcript.map((line, i) => (
          <p key={i} className="text-sm text-gray-700 dark:text-gray-300">
            {line}
          </p>
        ))}
      </div>
    </div>
  );
}

Limitations and challenges

Complex pricing - Multi-layered cost structure (platform + STT + LLM + TTS + telephony) makes budgeting difficult
Developer-first - Requires programming skills, not suitable for teams without developers
Provider management - You must manage 4-5 separate providers with separate billing
Expensive HIPAA - $1,000/month for HIPAA compliance, while competitors include it in their pricing
Vendor lock-in - Despite BYO keys, orchestration logic is tied to the Vapi platform
Variable latency - ~700ms on average, but can increase depending on chosen providers

FAQ

Can I use my own AI models?

Yes, Vapi supports BYO (Bring Your Own) keys for all pipeline components. You can use any LLM, TTS, and STT. If your model is hosted with a supported provider, just provide your API key. For custom models hosted elsewhere, use the Custom LLM URL.

How much does a minute of conversation cost?

Vapi's base fee is $0.05/min, but the true cost is $0.13-$0.31/min after adding STT, LLM, TTS, and telephony costs. The exact amount depends on your chosen providers.

Does Vapi support Polish?

Yes, Vapi supports over 100 languages, including Polish. Quality depends on your chosen STT and TTS providers - Whisper (OpenAI) and ElevenLabs offer good support for Polish.

How quickly can I build my first agent?

Registration takes about 10 minutes. Configuring your first agent takes another 20-30 minutes. Within an hour, you can have a working voice agent that answers phones and holds natural conversations.

Is Vapi open source?

No, Vapi is a closed platform. Client SDKs (Web, Python) are available on GitHub, but the orchestration engine itself is proprietary. If you're looking for an open-source alternative, check out Vocode.

When should I choose Vapi over Retell AI or Bland AI?

Choose Vapi when you need maximum control over the voice pipeline, want to use custom models, are building a developer product, or need Flow Studio for visual conversation design. Retell is better for simpler deployments and healthcare, while Bland excels at large-scale outbound campaigns.

Summary

Vapi is the most powerful platform for building voice AI agents if you have a development team and need full control over every pipeline element. The modular BYO keys architecture, Flow Studio, tool calling, and support for 100+ languages mean you can build exactly the agent you need.

The main trade-offs are complex pricing (realistically $0.13-$0.31/min vs simpler competitor models) and the requirement for technical skills. For teams with developers building advanced voice solutions - Vapi is the industry standard.