Utilizziamo i cookie per migliorare la tua esperienza sul sito
CodeWorlds
Torna alle collezioni
Guide14 min read

Vapi

Vapi is a developer platform for building voice AI agents. Guide to SDK, API, Flow Studio, tool calling, webhooks, and integration with LLM, TTS, and STT providers.

Vapi - complete guide to the voice AI agent platform

What is Vapi?

Vapi is a developer platform for building, testing, and deploying voice AI agents. Instead of connecting speech-to-text, language models, and text-to-speech yourself, Vapi provides a unified pipeline that handles the entire conversation cycle: listening, thinking, and speaking.

The platform has processed over 300 million calls, launched 2.5 million assistants, and is used by more than 500 thousand developers - from startups to Fortune 500 companies. Vapi lets you build your first voice agent in less than an hour, and thanks to its modular architecture, you can choose any LLM, TTS, and STT provider.

Why Vapi?

Key advantages of Vapi

  1. Ultra-low latency - Sub-500ms voice-to-voice response time, making conversations sound natural
  2. BYO Keys - Bring your own API keys to any providers (OpenAI, Anthropic, ElevenLabs, Deepgram, and more)
  3. Flow Studio - Visual builder for designing conversation logic with drag-and-drop
  4. Tool calling - Agents can call your APIs mid-conversation (bookings, CRM, databases)
  5. Multilingual - Over 100 languages and accents
  6. Scalability - Handles millions of concurrent calls
  7. SDK for every platform - Web, iOS, Android, Python, backend

Vapi vs Retell AI vs Bland AI

FeatureVapiRetell AIBland AI
Latency~700ms~600ms~800ms
Base cost$0.05/min (+ providers)~$0.07/min (all-in)~$0.09/min
True cost$0.13-$0.31/min~$0.07-$0.15/min~$0.09-$0.20/min
BYO modelsFull supportLimitedFull support
Visual builderFlow StudioYesPathways builder
Open sourceNoNoNo
HIPAA$1,000/mo add-onIncludedIncluded
SDKWeb, iOS, Android, PythonWeb, PythonREST API
Concurrent calls1M+Unlimited20,000+/h
Best forCustom builds, dev-heavyInbound support, healthcareHigh-volume outbound

Architecture - how does Vapi work?

Vapi acts as an orchestration layer between your application and AI providers. The entire conversation cycle is based on three steps:

1. Listen

Speech-to-text (STT) converts the caller's voice into text. Vapi supports providers like Deepgram, Whisper (OpenAI), Gladia, and Azure Speech.

2. Think

The LLM processes the transcription and generates a response. You can use GPT-4o, Claude, Gemini, Llama, or any other model. In this step, the agent can also call tools (tool calling).

3. Speak

Text-to-speech (TTS) converts the response into voice. Supported providers include ElevenLabs, Play.ht, Azure TTS, LMNT, and others.

Advanced conversation features

  • Endpointing - Detects when the caller has finished speaking
  • Interrupt detection - Allows the caller to interrupt the agent while speaking
  • Backchanneling - Inserts short acknowledgments ("okay", "I see") during processing
  • Emotion detection - Identifies tone signals such as urgency or frustration

Getting started

SDK installation

Code
Bash
npm install @vapi-ai/web

npm install @vapi-ai/server-sdk

pip install vapi_server_sdk

CLI installation

Code
Bash
curl -sSL https://vapi.ai/install.sh | bash

The CLI automatically detects your tech stack (React, Vue, Next.js, Python, Go, Flutter, React Native) and generates code examples tailored to your project.

Creating an assistant

Code
TypeScript
import Vapi from "@vapi-ai/web";

const vapi = new Vapi("YOUR_PUBLIC_API_KEY");

const assistant = await vapi.assistants.create({
  name: "Customer Support Agent",
  firstMessage: "Hello! How can I help you today?",
  model: {
    provider: "openai",
    model: "gpt-4o",
    temperature: 0.7,
    messages: [
      {
        role: "system",
        content:
          "You are a friendly customer support agent. Help users with their orders and account questions.",
      },
    ],
  },
  voice: {
    provider: "11labs",
    voiceId: "21m00Tcm4TlvDq8ikWAM",
  },
});
Code
Python
from vapi import Vapi
import os

vapi = Vapi(token=os.getenv("VAPI_API_KEY"))

assistant = vapi.assistants.create(
    name="Customer Support Agent",
    first_message="Hello! How can I help you today?",
    model={
        "provider": "openai",
        "model": "gpt-4o",
        "temperature": 0.7,
        "messages": [
            {
                "role": "system",
                "content": "You are a friendly customer support agent. Help users with their orders and account questions.",
            }
        ],
    },
    voice={
        "provider": "11labs",
        "voiceId": "21m00Tcm4TlvDq8ikWAM",
    },
)

Starting a call (Web)

Code
TypeScript
import Vapi from "@vapi-ai/web";

const vapi = new Vapi("YOUR_PUBLIC_API_KEY");

vapi.start("YOUR_ASSISTANT_ID");

vapi.on("call-start", () => {
  console.log("Call connected");
});

vapi.on("call-end", () => {
  console.log("Call ended");
});

vapi.on("message", (message) => {
  if (message.type === "transcript") {
    console.log(`${message.role}: ${message.transcript}`);
  }
});

vapi.on("error", (error) => {
  console.error("Call error:", error);
});

Starting a phone call (Server)

Code
Python
from vapi import Vapi
import os

vapi = Vapi(token=os.getenv("VAPI_API_KEY"))

call = vapi.calls.create(
    phone_number_id="YOUR_PHONE_NUMBER_ID",
    customer={"number": "+1234567890"},
    assistant_id="YOUR_ASSISTANT_ID",
)

print(f"Call started: {call.id}")

Tool calling

Tool calling is one of Vapi's most powerful features. It allows the agent to call your APIs mid-conversation - check order status, book appointments, update CRM, or fetch data from databases.

Creating a tool via API

Code
Bash
curl -X POST 'https://api.vapi.ai/tool' \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer YOUR_API_KEY' \
  -d '{
    "type": "function",
    "function": {
      "name": "get_order_status",
      "description": "Check the current status of a customer order",
      "parameters": {
        "type": "object",
        "properties": {
          "order_id": {
            "type": "string",
            "description": "The unique order identifier"
          }
        },
        "required": ["order_id"]
      }
    },
    "server": {
      "url": "https://your-api.com/webhook/vapi"
    }
  }'

Adding a tool to an assistant

Code
Bash
curl -X PATCH 'https://api.vapi.ai/assistant/ASSISTANT_ID' \
  -H 'Authorization: Bearer YOUR_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": {
      "provider": "openai",
      "model": "gpt-4o",
      "toolIds": ["your-tool-id"]
    }
  }'

Managing tools via CLI

Code
Bash
vapi tool list

vapi tool get <tool-id>

vapi tool create

vapi tool test <tool-id>

vapi tool delete <tool-id>

Handling a tool call on the server

When the agent calls a tool, Vapi sends a request to your server:

Code
JSON
{
  "message": {
    "type": "tool-calls",
    "toolCallList": [
      {
        "id": "toolu_01DTPAzUm5Gk3zxrpJ969oMF",
        "name": "get_order_status",
        "arguments": {
          "order_id": "ORD-12345"
        }
      }
    ]
  }
}

Your server must return the result with the matching toolCallId:

Code
JSON
{
  "results": [
    {
      "toolCallId": "toolu_01DTPAzUm5Gk3zxrpJ969oMF",
      "result": "Order ORD-12345 is currently being shipped. Expected delivery: tomorrow."
    }
  ]
}

Implementing webhooks

Code
TypeScript
import express from "express";

const app = express();
app.use(express.json());

app.post("/webhook/vapi", async (req, res) => {
  const { message } = req.body;

  if (message.type === "tool-calls") {
    const results = await Promise.all(
      message.toolCallList.map(async (toolCall) => {
        if (toolCall.name === "get_order_status") {
          const order = await db.orders.findById(
            toolCall.arguments.order_id
          );
          return {
            toolCallId: toolCall.id,
            result: `Order ${order.id}: ${order.status}. ${order.trackingInfo}`,
          };
        }

        if (toolCall.name === "book_appointment") {
          const booking = await calendar.createEvent(
            toolCall.arguments
          );
          return {
            toolCallId: toolCall.id,
            result: `Appointment booked for ${booking.date} at ${booking.time}.`,
          };
        }

        return {
          toolCallId: toolCall.id,
          result: "Unknown tool",
        };
      })
    );

    return res.json({ results });
  }

  res.json({ received: true });
});

app.listen(3000);
Code
Python
from flask import Flask, request, jsonify

app = Flask(__name__)

@app.route("/webhook/vapi", methods=["POST"])
def handle_webhook():
    message = request.json.get("message", {})

    if message.get("type") == "tool-calls":
        results = []
        for tool_call in message.get("toolCallList", []):
            if tool_call["name"] == "get_order_status":
                order = db.orders.find_by_id(tool_call["arguments"]["order_id"])
                results.append({
                    "toolCallId": tool_call["id"],
                    "result": f"Order {order.id}: {order.status}. {order.tracking_info}",
                })

            elif tool_call["name"] == "book_appointment":
                booking = calendar.create_event(tool_call["arguments"])
                results.append({
                    "toolCallId": tool_call["id"],
                    "result": f"Appointment booked for {booking.date} at {booking.time}.",
                })

        return jsonify({"results": results})

    return jsonify({"received": True})

if __name__ == "__main__":
    app.run(port=3000)

Flow Studio - visual builder

Flow Studio is a visual builder for designing conversation logic without writing code. It enables creating complex conversation scenarios using drag-and-drop.

Flow Studio capabilities

  • Branching prompts - Different conversation paths depending on responses
  • Conditional paths - Logic conditions (if VIP customer → route to specialist)
  • Error fallback - Automatic handling of errors and unexpected responses
  • Webhook triggers - Calling external APIs at any point in the conversation
  • Transfer calls - Redirecting to a live agent while preserving context

Squads - multi-assistant orchestration

Squads allow coordinating multiple assistants in a single conversation. Each assistant has its own specialization, and transfers between them preserve the full conversation context.

Example scenario:

  1. Receptionist - Greets the customer and identifies the need
  2. Technical specialist - Resolves technical issues
  3. Sales department - Presents the offer and finalizes the order

Each transfer is seamless - the next assistant knows what the conversation was about.

Providers and integrations

LLM (language models)

ProviderModels
OpenAIGPT-4o, GPT-4o-mini, o1
AnthropicClaude Sonnet, Claude Haiku
GoogleGemini 1.5 Pro, Gemini Flash
MetaLlama 3.1 (via Groq/Together)
CustomAny model via Custom LLM URL

TTS (text-to-speech)

ProviderDescription
ElevenLabsMost realistic voices, voice cloning
Play.htWide voice selection, competitive pricing
Azure TTSEnterprise-grade, many languages
LMNTFast, low cost
DeepgramUltra-fast TTS

STT (speech-to-text)

ProviderDescription
DeepgramFastest, best latency
Whisper (OpenAI)Most accurate, more languages
GladiaGood balance of speed and quality
Azure SpeechEnterprise, HIPAA-ready

Telephony

ProviderDescription
TwilioMost popular, global reach
TelnyxCompetitive pricing, good quality
BYOCBring Your Own Carrier - connect your own provider

Vapi Evals - testing agents

Vapi Evals is a framework for testing voice agents before deploying to production. It allows creating simulated conversations and validating agent behavior.

Three validation methods

  1. Exact match - Exact matching for deterministic responses
  2. Regex patterns - Flexible patterns for variable formats
  3. AI judges - Semantic evaluation by AI for complex responses

Testing via CLI

Code
Bash
vapi listen --forward-to localhost:3000/tools/webhook

This command starts a local server that receives webhook events from Vapi and forwards them to your development server in real-time.

MCP (Model Context Protocol)

Vapi supports MCP, allowing the assistant to dynamically use tools provided by MCP servers during a call. Instead of defining tools statically, the agent can discover and call tools at runtime.

Pricing

Base cost

ComponentCost
Vapi platform$0.05/min
STT (Deepgram)~$0.01/min
LLM (GPT-4o)~$0.02-$0.20/min
TTS (ElevenLabs)~$0.04/min
Telephony (Twilio)~$0.01/min
Total~$0.13-$0.31/min

Free credits

New users receive $10 in free credits. At a true cost of $0.13-$0.31/min, that's approximately 30-75 minutes of conversations.

SIP Lines

Every plan includes 10 concurrent SIP lines. Additional lines cost $10/month each.

HIPAA Compliance

HIPAA compliance is available as an add-on for $1,000/month.

Cost comparison

PlatformTrue cost/minFree creditsHIPAA
Vapi$0.13-$0.31$10$1,000/mo
Retell AI$0.07-$0.15YesIncluded
Bland AI$0.09-$0.20YesIncluded

Practical applications

Customer support bot

Code
TypeScript
const assistant = await vapi.assistants.create({
  name: "Support Agent",
  firstMessage: "Hi! Welcome to Acme Support. How can I help you?",
  model: {
    provider: "openai",
    model: "gpt-4o",
    messages: [
      {
        role: "system",
        content: `You are a customer support agent for Acme Corp.
You can check order status, process returns, and answer product questions.
Always be friendly and professional. If you cannot resolve an issue,
offer to transfer to a human agent.`,
      },
    ],
    toolIds: ["order-status-tool", "return-tool", "transfer-tool"],
  },
  voice: {
    provider: "11labs",
    voiceId: "21m00Tcm4TlvDq8ikWAM",
  },
});

Appointment scheduler

Code
TypeScript
const assistant = await vapi.assistants.create({
  name: "Booking Agent",
  firstMessage: "Hello! I can help you schedule an appointment. What day works best for you?",
  model: {
    provider: "openai",
    model: "gpt-4o",
    messages: [
      {
        role: "system",
        content: `You are an appointment scheduling assistant for a dental clinic.
Collect: patient name, preferred date/time, reason for visit.
Check availability using the check_slots tool before confirming.
Always confirm the final booking details before saving.`,
      },
    ],
    toolIds: ["check-slots-tool", "book-appointment-tool"],
  },
  voice: {
    provider: "11labs",
    voiceId: "pNInz6obpgDQGcFmaJgB",
  },
});

Lead qualification

Code
TypeScript
const assistant = await vapi.assistants.create({
  name: "Sales Qualifier",
  firstMessage:
    "Hi! Thanks for your interest in our product. I'd love to learn more about your needs.",
  model: {
    provider: "anthropic",
    model: "claude-sonnet-4-5-20250514",
    messages: [
      {
        role: "system",
        content: `You are a sales qualification agent.
Ask about: company size, current solution, budget range, timeline.
Score the lead as hot/warm/cold based on responses.
Save qualification data using the save_lead tool.
If the lead is hot, offer to schedule a demo with a sales rep.`,
      },
    ],
    toolIds: ["save-lead-tool", "schedule-demo-tool"],
  },
  voice: {
    provider: "playht",
    voiceId: "jennifer",
  },
});

React integration

Code
TypeScript
import { useState } from "react";
import Vapi from "@vapi-ai/web";

const vapi = new Vapi("YOUR_PUBLIC_API_KEY");

export function VoiceAssistant() {
  const [isActive, setIsActive] = useState(false);
  const [transcript, setTranscript] = useState<string[]>([]);

  const startCall = async () => {
    setIsActive(true);
    await vapi.start("YOUR_ASSISTANT_ID");
  };

  const endCall = () => {
    vapi.stop();
    setIsActive(false);
  };

  vapi.on("message", (message) => {
    if (message.type === "transcript" && message.transcriptType === "final") {
      setTranscript((prev) => [
        ...prev,
        `${message.role}: ${message.transcript}`,
      ]);
    }
  });

  return (
    <div className="flex flex-col items-center gap-4 p-6">
      <button
        onClick={isActive ? endCall : startCall}
        className={`px-6 py-3 rounded-full font-medium ${
          isActive
            ? "bg-red-500 hover:bg-red-600 text-white"
            : "bg-blue-500 hover:bg-blue-600 text-white"
        }`}
      >
        {isActive ? "End Call" : "Start Call"}
      </button>
      <div className="w-full max-w-md space-y-2">
        {transcript.map((line, i) => (
          <p key={i} className="text-sm text-gray-700 dark:text-gray-300">
            {line}
          </p>
        ))}
      </div>
    </div>
  );
}

Limitations and challenges

  1. Complex pricing - Multi-layered cost structure (platform + STT + LLM + TTS + telephony) makes budgeting difficult
  2. Developer-first - Requires programming skills, not suitable for teams without developers
  3. Provider management - You must manage 4-5 separate providers with separate billing
  4. Expensive HIPAA - $1,000/month for HIPAA compliance, while competitors include it in their pricing
  5. Vendor lock-in - Despite BYO keys, orchestration logic is tied to the Vapi platform
  6. Variable latency - ~700ms on average, but can increase depending on chosen providers

FAQ

Can I use my own AI models?

Yes, Vapi supports BYO (Bring Your Own) keys for all pipeline components. You can use any LLM, TTS, and STT. If your model is hosted with a supported provider, just provide your API key. For custom models hosted elsewhere, use the Custom LLM URL.

How much does a minute of conversation cost?

Vapi's base fee is $0.05/min, but the true cost is $0.13-$0.31/min after adding STT, LLM, TTS, and telephony costs. The exact amount depends on your chosen providers.

Does Vapi support Polish?

Yes, Vapi supports over 100 languages, including Polish. Quality depends on your chosen STT and TTS providers - Whisper (OpenAI) and ElevenLabs offer good support for Polish.

How quickly can I build my first agent?

Registration takes about 10 minutes. Configuring your first agent takes another 20-30 minutes. Within an hour, you can have a working voice agent that answers phones and holds natural conversations.

Is Vapi open source?

No, Vapi is a closed platform. Client SDKs (Web, Python) are available on GitHub, but the orchestration engine itself is proprietary. If you're looking for an open-source alternative, check out Vocode.

When should I choose Vapi over Retell AI or Bland AI?

Choose Vapi when you need maximum control over the voice pipeline, want to use custom models, are building a developer product, or need Flow Studio for visual conversation design. Retell is better for simpler deployments and healthcare, while Bland excels at large-scale outbound campaigns.

Summary

Vapi is the most powerful platform for building voice AI agents if you have a development team and need full control over every pipeline element. The modular BYO keys architecture, Flow Studio, tool calling, and support for 100+ languages mean you can build exactly the agent you need.

The main trade-offs are complex pricing (realistically $0.13-$0.31/min vs simpler competitor models) and the requirement for technical skills. For teams with developers building advanced voice solutions - Vapi is the industry standard.