Vapi - complete guide to the voice AI agent platform
What is Vapi?
Vapi is a developer platform for building, testing, and deploying voice AI agents. Instead of connecting speech-to-text, language models, and text-to-speech yourself, Vapi provides a unified pipeline that handles the entire conversation cycle: listening, thinking, and speaking.
The platform has processed over 300 million calls, launched 2.5 million assistants, and is used by more than 500 thousand developers - from startups to Fortune 500 companies. Vapi lets you build your first voice agent in less than an hour, and thanks to its modular architecture, you can choose any LLM, TTS, and STT provider.
Why Vapi?
Key advantages of Vapi
- Ultra-low latency - Sub-500ms voice-to-voice response time, making conversations sound natural
- BYO Keys - Bring your own API keys to any providers (OpenAI, Anthropic, ElevenLabs, Deepgram, and more)
- Flow Studio - Visual builder for designing conversation logic with drag-and-drop
- Tool calling - Agents can call your APIs mid-conversation (bookings, CRM, databases)
- Multilingual - Over 100 languages and accents
- Scalability - Handles millions of concurrent calls
- SDK for every platform - Web, iOS, Android, Python, backend
Vapi vs Retell AI vs Bland AI
| Feature | Vapi | Retell AI | Bland AI |
|---|---|---|---|
| Latency | ~700ms | ~600ms | ~800ms |
| Base cost | $0.05/min (+ providers) | ~$0.07/min (all-in) | ~$0.09/min |
| True cost | $0.13-$0.31/min | ~$0.07-$0.15/min | ~$0.09-$0.20/min |
| BYO models | Full support | Limited | Full support |
| Visual builder | Flow Studio | Yes | Pathways builder |
| Open source | No | No | No |
| HIPAA | $1,000/mo add-on | Included | Included |
| SDK | Web, iOS, Android, Python | Web, Python | REST API |
| Concurrent calls | 1M+ | Unlimited | 20,000+/h |
| Best for | Custom builds, dev-heavy | Inbound support, healthcare | High-volume outbound |
Architecture - how does Vapi work?
Vapi acts as an orchestration layer between your application and AI providers. The entire conversation cycle is based on three steps:
1. Listen
Speech-to-text (STT) converts the caller's voice into text. Vapi supports providers like Deepgram, Whisper (OpenAI), Gladia, and Azure Speech.
2. Think
The LLM processes the transcription and generates a response. You can use GPT-4o, Claude, Gemini, Llama, or any other model. In this step, the agent can also call tools (tool calling).
3. Speak
Text-to-speech (TTS) converts the response into voice. Supported providers include ElevenLabs, Play.ht, Azure TTS, LMNT, and others.
Advanced conversation features
- Endpointing - Detects when the caller has finished speaking
- Interrupt detection - Allows the caller to interrupt the agent while speaking
- Backchanneling - Inserts short acknowledgments ("okay", "I see") during processing
- Emotion detection - Identifies tone signals such as urgency or frustration
Getting started
SDK installation
npm install @vapi-ai/web
npm install @vapi-ai/server-sdk
pip install vapi_server_sdkCLI installation
curl -sSL https://vapi.ai/install.sh | bashThe CLI automatically detects your tech stack (React, Vue, Next.js, Python, Go, Flutter, React Native) and generates code examples tailored to your project.
Creating an assistant
import Vapi from "@vapi-ai/web";
const vapi = new Vapi("YOUR_PUBLIC_API_KEY");
const assistant = await vapi.assistants.create({
name: "Customer Support Agent",
firstMessage: "Hello! How can I help you today?",
model: {
provider: "openai",
model: "gpt-4o",
temperature: 0.7,
messages: [
{
role: "system",
content:
"You are a friendly customer support agent. Help users with their orders and account questions.",
},
],
},
voice: {
provider: "11labs",
voiceId: "21m00Tcm4TlvDq8ikWAM",
},
});from vapi import Vapi
import os
vapi = Vapi(token=os.getenv("VAPI_API_KEY"))
assistant = vapi.assistants.create(
name="Customer Support Agent",
first_message="Hello! How can I help you today?",
model={
"provider": "openai",
"model": "gpt-4o",
"temperature": 0.7,
"messages": [
{
"role": "system",
"content": "You are a friendly customer support agent. Help users with their orders and account questions.",
}
],
},
voice={
"provider": "11labs",
"voiceId": "21m00Tcm4TlvDq8ikWAM",
},
)Starting a call (Web)
import Vapi from "@vapi-ai/web";
const vapi = new Vapi("YOUR_PUBLIC_API_KEY");
vapi.start("YOUR_ASSISTANT_ID");
vapi.on("call-start", () => {
console.log("Call connected");
});
vapi.on("call-end", () => {
console.log("Call ended");
});
vapi.on("message", (message) => {
if (message.type === "transcript") {
console.log(`${message.role}: ${message.transcript}`);
}
});
vapi.on("error", (error) => {
console.error("Call error:", error);
});Starting a phone call (Server)
from vapi import Vapi
import os
vapi = Vapi(token=os.getenv("VAPI_API_KEY"))
call = vapi.calls.create(
phone_number_id="YOUR_PHONE_NUMBER_ID",
customer={"number": "+1234567890"},
assistant_id="YOUR_ASSISTANT_ID",
)
print(f"Call started: {call.id}")Tool calling
Tool calling is one of Vapi's most powerful features. It allows the agent to call your APIs mid-conversation - check order status, book appointments, update CRM, or fetch data from databases.
Creating a tool via API
curl -X POST 'https://api.vapi.ai/tool' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer YOUR_API_KEY' \
-d '{
"type": "function",
"function": {
"name": "get_order_status",
"description": "Check the current status of a customer order",
"parameters": {
"type": "object",
"properties": {
"order_id": {
"type": "string",
"description": "The unique order identifier"
}
},
"required": ["order_id"]
}
},
"server": {
"url": "https://your-api.com/webhook/vapi"
}
}'Adding a tool to an assistant
curl -X PATCH 'https://api.vapi.ai/assistant/ASSISTANT_ID' \
-H 'Authorization: Bearer YOUR_API_KEY' \
-H 'Content-Type: application/json' \
-d '{
"model": {
"provider": "openai",
"model": "gpt-4o",
"toolIds": ["your-tool-id"]
}
}'Managing tools via CLI
vapi tool list
vapi tool get <tool-id>
vapi tool create
vapi tool test <tool-id>
vapi tool delete <tool-id>Handling a tool call on the server
When the agent calls a tool, Vapi sends a request to your server:
{
"message": {
"type": "tool-calls",
"toolCallList": [
{
"id": "toolu_01DTPAzUm5Gk3zxrpJ969oMF",
"name": "get_order_status",
"arguments": {
"order_id": "ORD-12345"
}
}
]
}
}Your server must return the result with the matching toolCallId:
{
"results": [
{
"toolCallId": "toolu_01DTPAzUm5Gk3zxrpJ969oMF",
"result": "Order ORD-12345 is currently being shipped. Expected delivery: tomorrow."
}
]
}Implementing webhooks
import express from "express";
const app = express();
app.use(express.json());
app.post("/webhook/vapi", async (req, res) => {
const { message } = req.body;
if (message.type === "tool-calls") {
const results = await Promise.all(
message.toolCallList.map(async (toolCall) => {
if (toolCall.name === "get_order_status") {
const order = await db.orders.findById(
toolCall.arguments.order_id
);
return {
toolCallId: toolCall.id,
result: `Order ${order.id}: ${order.status}. ${order.trackingInfo}`,
};
}
if (toolCall.name === "book_appointment") {
const booking = await calendar.createEvent(
toolCall.arguments
);
return {
toolCallId: toolCall.id,
result: `Appointment booked for ${booking.date} at ${booking.time}.`,
};
}
return {
toolCallId: toolCall.id,
result: "Unknown tool",
};
})
);
return res.json({ results });
}
res.json({ received: true });
});
app.listen(3000);from flask import Flask, request, jsonify
app = Flask(__name__)
@app.route("/webhook/vapi", methods=["POST"])
def handle_webhook():
message = request.json.get("message", {})
if message.get("type") == "tool-calls":
results = []
for tool_call in message.get("toolCallList", []):
if tool_call["name"] == "get_order_status":
order = db.orders.find_by_id(tool_call["arguments"]["order_id"])
results.append({
"toolCallId": tool_call["id"],
"result": f"Order {order.id}: {order.status}. {order.tracking_info}",
})
elif tool_call["name"] == "book_appointment":
booking = calendar.create_event(tool_call["arguments"])
results.append({
"toolCallId": tool_call["id"],
"result": f"Appointment booked for {booking.date} at {booking.time}.",
})
return jsonify({"results": results})
return jsonify({"received": True})
if __name__ == "__main__":
app.run(port=3000)Flow Studio - visual builder
Flow Studio is a visual builder for designing conversation logic without writing code. It enables creating complex conversation scenarios using drag-and-drop.
Flow Studio capabilities
- Branching prompts - Different conversation paths depending on responses
- Conditional paths - Logic conditions (if VIP customer → route to specialist)
- Error fallback - Automatic handling of errors and unexpected responses
- Webhook triggers - Calling external APIs at any point in the conversation
- Transfer calls - Redirecting to a live agent while preserving context
Squads - multi-assistant orchestration
Squads allow coordinating multiple assistants in a single conversation. Each assistant has its own specialization, and transfers between them preserve the full conversation context.
Example scenario:
- Receptionist - Greets the customer and identifies the need
- Technical specialist - Resolves technical issues
- Sales department - Presents the offer and finalizes the order
Each transfer is seamless - the next assistant knows what the conversation was about.
Providers and integrations
LLM (language models)
| Provider | Models |
|---|---|
| OpenAI | GPT-4o, GPT-4o-mini, o1 |
| Anthropic | Claude Sonnet, Claude Haiku |
| Gemini 1.5 Pro, Gemini Flash | |
| Meta | Llama 3.1 (via Groq/Together) |
| Custom | Any model via Custom LLM URL |
TTS (text-to-speech)
| Provider | Description |
|---|---|
| ElevenLabs | Most realistic voices, voice cloning |
| Play.ht | Wide voice selection, competitive pricing |
| Azure TTS | Enterprise-grade, many languages |
| LMNT | Fast, low cost |
| Deepgram | Ultra-fast TTS |
STT (speech-to-text)
| Provider | Description |
|---|---|
| Deepgram | Fastest, best latency |
| Whisper (OpenAI) | Most accurate, more languages |
| Gladia | Good balance of speed and quality |
| Azure Speech | Enterprise, HIPAA-ready |
Telephony
| Provider | Description |
|---|---|
| Twilio | Most popular, global reach |
| Telnyx | Competitive pricing, good quality |
| BYOC | Bring Your Own Carrier - connect your own provider |
Vapi Evals - testing agents
Vapi Evals is a framework for testing voice agents before deploying to production. It allows creating simulated conversations and validating agent behavior.
Three validation methods
- Exact match - Exact matching for deterministic responses
- Regex patterns - Flexible patterns for variable formats
- AI judges - Semantic evaluation by AI for complex responses
Testing via CLI
vapi listen --forward-to localhost:3000/tools/webhookThis command starts a local server that receives webhook events from Vapi and forwards them to your development server in real-time.
MCP (Model Context Protocol)
Vapi supports MCP, allowing the assistant to dynamically use tools provided by MCP servers during a call. Instead of defining tools statically, the agent can discover and call tools at runtime.
Pricing
Base cost
| Component | Cost |
|---|---|
| Vapi platform | $0.05/min |
| STT (Deepgram) | ~$0.01/min |
| LLM (GPT-4o) | ~$0.02-$0.20/min |
| TTS (ElevenLabs) | ~$0.04/min |
| Telephony (Twilio) | ~$0.01/min |
| Total | ~$0.13-$0.31/min |
Free credits
New users receive $10 in free credits. At a true cost of $0.13-$0.31/min, that's approximately 30-75 minutes of conversations.
SIP Lines
Every plan includes 10 concurrent SIP lines. Additional lines cost $10/month each.
HIPAA Compliance
HIPAA compliance is available as an add-on for $1,000/month.
Cost comparison
| Platform | True cost/min | Free credits | HIPAA |
|---|---|---|---|
| Vapi | $0.13-$0.31 | $10 | $1,000/mo |
| Retell AI | $0.07-$0.15 | Yes | Included |
| Bland AI | $0.09-$0.20 | Yes | Included |
Practical applications
Customer support bot
const assistant = await vapi.assistants.create({
name: "Support Agent",
firstMessage: "Hi! Welcome to Acme Support. How can I help you?",
model: {
provider: "openai",
model: "gpt-4o",
messages: [
{
role: "system",
content: `You are a customer support agent for Acme Corp.
You can check order status, process returns, and answer product questions.
Always be friendly and professional. If you cannot resolve an issue,
offer to transfer to a human agent.`,
},
],
toolIds: ["order-status-tool", "return-tool", "transfer-tool"],
},
voice: {
provider: "11labs",
voiceId: "21m00Tcm4TlvDq8ikWAM",
},
});Appointment scheduler
const assistant = await vapi.assistants.create({
name: "Booking Agent",
firstMessage: "Hello! I can help you schedule an appointment. What day works best for you?",
model: {
provider: "openai",
model: "gpt-4o",
messages: [
{
role: "system",
content: `You are an appointment scheduling assistant for a dental clinic.
Collect: patient name, preferred date/time, reason for visit.
Check availability using the check_slots tool before confirming.
Always confirm the final booking details before saving.`,
},
],
toolIds: ["check-slots-tool", "book-appointment-tool"],
},
voice: {
provider: "11labs",
voiceId: "pNInz6obpgDQGcFmaJgB",
},
});Lead qualification
const assistant = await vapi.assistants.create({
name: "Sales Qualifier",
firstMessage:
"Hi! Thanks for your interest in our product. I'd love to learn more about your needs.",
model: {
provider: "anthropic",
model: "claude-sonnet-4-5-20250514",
messages: [
{
role: "system",
content: `You are a sales qualification agent.
Ask about: company size, current solution, budget range, timeline.
Score the lead as hot/warm/cold based on responses.
Save qualification data using the save_lead tool.
If the lead is hot, offer to schedule a demo with a sales rep.`,
},
],
toolIds: ["save-lead-tool", "schedule-demo-tool"],
},
voice: {
provider: "playht",
voiceId: "jennifer",
},
});React integration
import { useState } from "react";
import Vapi from "@vapi-ai/web";
const vapi = new Vapi("YOUR_PUBLIC_API_KEY");
export function VoiceAssistant() {
const [isActive, setIsActive] = useState(false);
const [transcript, setTranscript] = useState<string[]>([]);
const startCall = async () => {
setIsActive(true);
await vapi.start("YOUR_ASSISTANT_ID");
};
const endCall = () => {
vapi.stop();
setIsActive(false);
};
vapi.on("message", (message) => {
if (message.type === "transcript" && message.transcriptType === "final") {
setTranscript((prev) => [
...prev,
`${message.role}: ${message.transcript}`,
]);
}
});
return (
<div className="flex flex-col items-center gap-4 p-6">
<button
onClick={isActive ? endCall : startCall}
className={`px-6 py-3 rounded-full font-medium ${
isActive
? "bg-red-500 hover:bg-red-600 text-white"
: "bg-blue-500 hover:bg-blue-600 text-white"
}`}
>
{isActive ? "End Call" : "Start Call"}
</button>
<div className="w-full max-w-md space-y-2">
{transcript.map((line, i) => (
<p key={i} className="text-sm text-gray-700 dark:text-gray-300">
{line}
</p>
))}
</div>
</div>
);
}Limitations and challenges
- Complex pricing - Multi-layered cost structure (platform + STT + LLM + TTS + telephony) makes budgeting difficult
- Developer-first - Requires programming skills, not suitable for teams without developers
- Provider management - You must manage 4-5 separate providers with separate billing
- Expensive HIPAA - $1,000/month for HIPAA compliance, while competitors include it in their pricing
- Vendor lock-in - Despite BYO keys, orchestration logic is tied to the Vapi platform
- Variable latency - ~700ms on average, but can increase depending on chosen providers
FAQ
Can I use my own AI models?
Yes, Vapi supports BYO (Bring Your Own) keys for all pipeline components. You can use any LLM, TTS, and STT. If your model is hosted with a supported provider, just provide your API key. For custom models hosted elsewhere, use the Custom LLM URL.
How much does a minute of conversation cost?
Vapi's base fee is $0.05/min, but the true cost is $0.13-$0.31/min after adding STT, LLM, TTS, and telephony costs. The exact amount depends on your chosen providers.
Does Vapi support Polish?
Yes, Vapi supports over 100 languages, including Polish. Quality depends on your chosen STT and TTS providers - Whisper (OpenAI) and ElevenLabs offer good support for Polish.
How quickly can I build my first agent?
Registration takes about 10 minutes. Configuring your first agent takes another 20-30 minutes. Within an hour, you can have a working voice agent that answers phones and holds natural conversations.
Is Vapi open source?
No, Vapi is a closed platform. Client SDKs (Web, Python) are available on GitHub, but the orchestration engine itself is proprietary. If you're looking for an open-source alternative, check out Vocode.
When should I choose Vapi over Retell AI or Bland AI?
Choose Vapi when you need maximum control over the voice pipeline, want to use custom models, are building a developer product, or need Flow Studio for visual conversation design. Retell is better for simpler deployments and healthcare, while Bland excels at large-scale outbound campaigns.
Summary
Vapi is the most powerful platform for building voice AI agents if you have a development team and need full control over every pipeline element. The modular BYO keys architecture, Flow Studio, tool calling, and support for 100+ languages mean you can build exactly the agent you need.
The main trade-offs are complex pricing (realistically $0.13-$0.31/min vs simpler competitor models) and the requirement for technical skills. For teams with developers building advanced voice solutions - Vapi is the industry standard.