ElevenLabs - complete guide to the AI voice platform
What is ElevenLabs?
ElevenLabs is an AI platform specializing in generating realistic speech, voice cloning, and building conversational voice agents. Founded in 2022 by Piotr Dąbkowski (ex-Google ML) and Mati Staniszewski (ex-Palantir), the company quickly became the market leader in speech synthesis.
In February 2026, ElevenLabs announced a $500 million funding round at an $11 billion valuation, eyeing a potential IPO. The platform offers over 3,000 voices in 70+ languages, voice cloning from just a few minutes of recording, and Conversational AI for building interactive voice agents.
What sets ElevenLabs apart from the competition is the quality of generated speech - voices sound natural, with proper intonation, emotions, and pacing. The Eleven v3 model can interpret the emotional context of text and modulate the voice accordingly.
Why ElevenLabs?
Key advantages
- Highest voice quality - The most realistic AI voices on the market
- 70+ languages - Including Polish, with culturally appropriate pronunciation and intonation
- Voice cloning - Clone a voice from just a few minutes of recording
- Conversational AI - Platform for building real-time voice agents
- Audio tags (v3) - Control emotions and speaking style directly in text
- Multi-speaker dialogue - Natural dialogues between multiple speakers in a single audio file
- Ultra-low latency - Flash v2.5 achieves 75ms, ideal for real-time applications
ElevenLabs vs Amazon Polly vs Google Cloud TTS vs OpenAI TTS
| Feature | ElevenLabs | Amazon Polly | Google Cloud TTS | OpenAI TTS |
|---|---|---|---|---|
| Voice quality | Best | Good | Very good | Very good |
| Languages | 70+ | 30+ | 40+ | 57 |
| Voice cloning | Yes | No | No | No |
| Audio tags (emotions) | Yes (v3) | No | No | No |
| Multi-speaker | Yes (v3) | No | No | No |
| Conversational AI | Yes | No | Dialogflow | Realtime API |
| Free plan | 10 min/mo | 5M chars/mo (12 mo) | 1M chars/mo | None |
| Cost (TTS) | From $0.12/1K chars | $4/1M chars | $4-$16/1M chars | $15/1M chars |
| Latency | 75ms (Flash) | ~200ms | ~200ms | ~300ms |
TTS models
ElevenLabs offers several models tailored to different use cases:
Eleven v3
The newest and most powerful model, released in June 2025.
- Languages: 70+
- Character limit: 5,000 per request
- Features: Audio tags, multi-speaker dialogue, natural emotional context
- Use cases: Audiobooks, podcasts, dubbing, premium content
Multilingual v2
The flagship model for high-quality speech synthesis.
- Languages: 29
- Character limit: 10,000 per request
- Features: Most nuanced expression, excellent intonation
- Use cases: Professional voiceovers, ads, e-learning
Flash v2.5
A model optimized for low latency.
- Languages: 32
- Character limit: 40,000 per request
- Latency: ~75ms
- Use cases: Conversational AI, voice assistants, real-time applications
Turbo v2.5
A fast model at lower cost.
- Languages: 32
- Character limit: 40,000 per request
- Latency: 250-300ms
- Use cases: Mass audio production, content automation
Audio tags (Eleven v3)
Audio tags are a breakthrough feature of Eleven v3 that lets you control emotions, style, and speaking manner directly in the text.
[excited] I can't believe we won the championship!
[whispers] Don't tell anyone, but I have a secret.
[sighs] Another Monday morning...
[laughs] That's the funniest thing I've heard all week!
[sad] I'm going to miss this place.You can also combine tags with natural context:
"I'm so proud of you," she said [tearfully]. "You've come so far."The model interprets both tags and textual context (punctuation, emotion-describing adjectives), producing very natural results.
Getting started with the API
SDK installation
pip install elevenlabs
npm install elevenlabsAPI key configuration
Generate an API key in the ElevenLabs dashboard and set it as an environment variable:
export ELEVENLABS_API_KEY="your-api-key"Basic text-to-speech (Python)
from elevenlabs.client import ElevenLabs
from elevenlabs.play import play
elevenlabs = ElevenLabs()
audio = elevenlabs.text_to_speech.convert(
text="The first move is what sets everything in motion.",
voice_id="JBFqnCBsd6RMkjVDRZzb",
model_id="eleven_v3",
output_format="mp3_44100_128",
)
play(audio)Basic text-to-speech (TypeScript)
import { ElevenLabsClient } from "elevenlabs";
import { createWriteStream } from "fs";
const elevenlabs = new ElevenLabsClient({
apiKey: process.env.ELEVENLABS_API_KEY,
});
const audio = await elevenlabs.textToSpeech.convert("JBFqnCBsd6RMkjVDRZzb", {
text: "The first move is what sets everything in motion.",
model_id: "eleven_v3",
output_format: "mp3_44100_128",
});
const writeStream = createWriteStream("output.mp3");
audio.pipe(writeStream);Streaming audio
from elevenlabs import stream
from elevenlabs.client import ElevenLabs
elevenlabs = ElevenLabs()
audio_stream = elevenlabs.text_to_speech.stream(
text="This is a streaming example. The audio plays as it generates.",
voice_id="JBFqnCBsd6RMkjVDRZzb",
model_id="eleven_multilingual_v2",
)
stream(audio_stream)You can also process chunks manually:
for chunk in audio_stream:
if isinstance(chunk, bytes):
process_audio_chunk(chunk)Async client
import asyncio
from elevenlabs.client import AsyncElevenLabs
elevenlabs = AsyncElevenLabs()
async def generate_speech():
audio = await elevenlabs.text_to_speech.convert(
text="Async speech generation is great for web servers.",
voice_id="JBFqnCBsd6RMkjVDRZzb",
model_id="eleven_v3",
)
return audio
asyncio.run(generate_speech())Searching voices
from elevenlabs.client import ElevenLabs
elevenlabs = ElevenLabs()
response = elevenlabs.voices.search()
for voice in response.voices:
print(f"{voice.name} ({voice.voice_id})")Voice cloning
ElevenLabs offers two types of voice cloning:
Instant voice cloning
Quick cloning from short audio samples (30 seconds - a few minutes). Available from the Starter plan.
from elevenlabs.client import ElevenLabs
elevenlabs = ElevenLabs()
voice = elevenlabs.voices.ivc.create(
name="Alex",
description="An old American male voice with a slight hoarseness in his throat. Perfect for news",
files=["./sample_0.mp3", "./sample_1.mp3", "./sample_2.mp3"],
)
audio = elevenlabs.text_to_speech.convert(
text="This is my cloned voice speaking.",
voice_id=voice.voice_id,
model_id="eleven_v3",
)Professional voice cloning
Advanced cloning from longer recordings (30+ minutes), delivering the highest reproduction quality. Requires the Creator plan or higher. The process includes identity verification and consent for cloning.
Conversational AI
Conversational AI is the ElevenLabs platform for building interactive real-time voice agents. It combines STT, LLM, and TTS into a single pipeline with monitoring and analytics.
Architecture
- Speech-to-text - Scribe (ElevenLabs' own model) converts speech to text
- LLM - Your choice of model (GPT-4o, Claude, Gemini) processes text and generates a response
- Text-to-speech - ElevenLabs TTS converts the response to natural speech
- Knowledge base - Optional knowledge base (documents, FAQ) accessible to the agent
Creating a conversational agent
from elevenlabs.client import ElevenLabs
from elevenlabs.conversational_ai.conversation import Conversation
from elevenlabs.conversational_ai.default_audio_interface import DefaultAudioInterface
elevenlabs = ElevenLabs()
audio_interface = DefaultAudioInterface()
conversation = Conversation(
client=elevenlabs,
agent_id="your-agent-id",
requires_auth=True,
audio_interface=audio_interface,
)
conversation.start_session()
conversation.end_session()Agent with tool calling
import asyncio
from elevenlabs.client import ElevenLabs
from elevenlabs.conversational_ai.conversation import ClientTools, Conversation
from elevenlabs.conversational_ai.default_audio_interface import DefaultAudioInterface
elevenlabs = ElevenLabs()
async def main():
custom_loop = asyncio.get_running_loop()
client_tools = ClientTools(loop=custom_loop)
async def get_weather(params):
location = params.get("location", "Unknown")
return f"Weather in {location}: Sunny, 72°F"
async def check_order(params):
order_id = params.get("order_id", "")
return f"Order {order_id}: Shipped, arriving tomorrow."
client_tools.register("get_weather", get_weather, is_async=True)
client_tools.register("check_order", check_order, is_async=True)
conversation = Conversation(
client=elevenlabs,
agent_id="your-agent-id",
requires_auth=True,
audio_interface=DefaultAudioInterface(),
client_tools=client_tools,
)
conversation.start_session()
asyncio.run(main())Tool registration
from elevenlabs.conversational_ai.conversation import ClientTools
client_tools = ClientTools()
def calculate_sum(params):
numbers = params.get("numbers", [])
return sum(numbers)
async def fetch_data(params):
url = params.get("url")
return {"data": "fetched"}
client_tools.register("calculate_sum", calculate_sum, is_async=False)
client_tools.register("fetch_data", fetch_data, is_async=True)Additional products
Scribe (Speech-to-Text)
ElevenLabs' own STT model. Transcribes audio with character-level accuracy, timestamps, and speaker diarization.
Eleven Music
AI music generator launched in August 2025. Creates studio-quality music from natural language prompts. Developed in collaboration with record labels and artists - generated music is cleared for commercial use.
Video dubbing and localization
Localizes films and videos into 70+ languages while preserving the original speaker's voice, emotions, and timing.
Reader App
A mobile app (iOS/Android) that lets you listen to articles, PDFs, and ePubs with AI voices.
Audiobook publishing
A platform for creating and publishing AI-generated audiobooks, launched in February 2025.
Pricing
Plans
| Plan | Price/mo | TTS minutes | Conversational AI | Voice cloning |
|---|---|---|---|---|
| Free | $0 | ~10 min | - | - |
| Starter | $5 | ~30 min | - | Instant |
| Creator | $22 | ~100 min | - | Professional |
| Pro | $99 | ~500 min | Yes | Professional |
| Scale | $330 | ~2,000 min | Yes | Professional |
| Business | $1,320 | 11,000 min | 13,750 min | Professional |
| Enterprise | Custom | Custom | Custom | Custom |
Per-character costs (TTS, Multilingual v2)
| Plan | Cost per 1K chars |
|---|---|
| Creator | $0.30 |
| Pro | $0.24 |
| Scale | $0.18 |
| Business | $0.12 |
Conversational AI
The cost of Conversational AI on the Business plan is $0.08/min. Unused minutes reset monthly.
Comparison with competitors
| Platform | TTS cost | Conversational AI | Voice cloning |
|---|---|---|---|
| ElevenLabs | From $0.12/1K chars | $0.08/min | Yes |
| Amazon Polly | $4/1M chars | No | No |
| Google Cloud TTS | $4-$16/1M chars | Via Dialogflow | No |
| OpenAI TTS | $15/1M chars | Realtime API | No |
| Play.ht | From $0.10/1K chars | No | Yes |
Savings
Annual plans offer 2 months free. Unused credits roll over to the next month when upgrading plans.
Security and compliance
- Encryption - Data encrypted in transit and at rest
- SOC 2 - SOC 2 compliance
- HIPAA - HIPAA compliance support
- GDPR - GDPR compliance
- EU Data Residency - Option to store data in the EU
- Zero Retention - Mode without data storage for sensitive applications
- Consent verification - Consent verification for voice cloning
Audio formats
| Format | Sample rate | Description |
|---|---|---|
| MP3 | 22.05-44.1 kHz | Default, universal |
| PCM | 16-44.1 kHz | Raw audio, lowest latency |
| μ-law | 8 kHz | Telephony |
| A-law | 8 kHz | Telephony (Europe) |
| Opus | 48 kHz | WebRTC, streaming |
Practical applications
Multi-voice podcast (v3)
from elevenlabs.client import ElevenLabs
elevenlabs = ElevenLabs()
script = """
[Speaker: Host] Welcome to Tech Talk! Today we're discussing the future of AI.
[Speaker: Guest] Thanks for having me. I think 2026 is going to be a breakthrough year.
[Speaker: Host] [excited] Absolutely! Let's dive right in.
"""
audio = elevenlabs.text_to_speech.convert(
text=script,
voice_id="multi_speaker_v3",
model_id="eleven_v3",
)Audiobook generator
from elevenlabs.client import ElevenLabs
from elevenlabs import stream
elevenlabs = ElevenLabs()
chapters = [
"Chapter 1: The Beginning. It was a dark and stormy night...",
"Chapter 2: The Journey. The next morning brought clear skies...",
"Chapter 3: The Discovery. Deep in the forest, she found...",
]
for i, chapter in enumerate(chapters):
audio = elevenlabs.text_to_speech.convert(
text=chapter,
voice_id="JBFqnCBsd6RMkjVDRZzb",
model_id="eleven_multilingual_v2",
output_format="mp3_44100_128",
)
with open(f"chapter_{i+1}.mp3", "wb") as f:
for chunk in audio:
f.write(chunk)Real-time voice assistant (Next.js)
import { ElevenLabsClient } from "elevenlabs";
import { NextResponse } from "next/server";
const elevenlabs = new ElevenLabsClient({
apiKey: process.env.ELEVENLABS_API_KEY,
});
export async function POST(request: Request) {
const { text, voiceId } = await request.json();
const audio = await elevenlabs.textToSpeech.convert(voiceId, {
text,
model_id: "eleven_flash_v2_5",
output_format: "mp3_22050_32",
});
return new NextResponse(audio as unknown as ReadableStream, {
headers: {
"Content-Type": "audio/mpeg",
},
});
}React integration
import { useState, useRef } from "react";
export function TextToSpeechPlayer() {
const [text, setText] = useState("");
const [isLoading, setIsLoading] = useState(false);
const audioRef = useRef<HTMLAudioElement>(null);
const generateSpeech = async () => {
setIsLoading(true);
const response = await fetch("/api/tts", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
text,
voiceId: "JBFqnCBsd6RMkjVDRZzb",
}),
});
const blob = await response.blob();
const url = URL.createObjectURL(blob);
if (audioRef.current) {
audioRef.current.src = url;
audioRef.current.play();
}
setIsLoading(false);
};
return (
<div className="flex flex-col gap-4 p-6 max-w-md">
<textarea
value={text}
onChange={(e) => setText(e.target.value)}
placeholder="Enter text to convert to speech..."
className="w-full p-3 border rounded-lg dark:bg-gray-800 dark:border-gray-600"
rows={4}
/>
<button
onClick={generateSpeech}
disabled={isLoading || !text}
className="px-4 py-2 bg-blue-500 hover:bg-blue-600 text-white rounded-lg disabled:opacity-50"
>
{isLoading ? "Generating..." : "Generate Speech"}
</button>
<audio ref={audioRef} controls className="w-full" />
</div>
);
}Limitations and challenges
- Cost at scale - Costs grow quickly at high audio volumes, especially with premium models
- Character limit - V3 has a 5,000 character limit per request (vs 40,000 for Flash)
- Free plan - Only 10 minutes per month, no commercial use, requires attribution
- Voice cloning ethics - Requires consent verification, which slows down the process
- No self-hosting - Models available only via API, no on-premise option
- Quality in minor languages - Some languages have lower quality than English
FAQ
Does ElevenLabs support Polish?
Yes, Polish is one of the 70+ supported languages. The v3 model offers the best Polish quality with culturally appropriate intonation. For real-time applications, Flash v2.5 also supports Polish.
How much does voice cloning cost?
Instant voice cloning is available from the Starter plan ($5/mo). Professional voice cloning requires the Creator plan ($22/mo) or higher. The cloning process itself has no additional fee - you pay for audio generation as usual.
Can I use ElevenLabs for a commercial audiobook?
Yes, from the Starter plan you have commercial usage rights. ElevenLabs also offers a dedicated platform for publishing audiobooks in the Reader app.
How does ElevenLabs compare to OpenAI TTS?
ElevenLabs offers higher voice quality, voice cloning, audio tags, multi-speaker dialogue, and lower latency (75ms vs ~300ms). OpenAI TTS is simpler to use and has a Realtime API for conversations, but doesn't support voice cloning or such advanced emotion control.
Can I build a voice agent with ElevenLabs?
Yes, Conversational AI combines STT (Scribe), LLM (your choice), and TTS into a single pipeline. It supports tool calling, knowledge base, and monitoring. SDKs are available for Python, TypeScript, Flutter, Swift, and Kotlin.
Do audio tags work in all models?
No, audio tags ([excited], [whispers], etc.) are a feature exclusive to the Eleven v3 model. Older models (Multilingual v2, Flash v2.5) interpret emotions from text context but don't support tags.
Summary
ElevenLabs is the undisputed leader in AI speech synthesis quality in 2026. The v3 model with audio tags, multi-speaker dialogue, and 70+ languages sets a new industry standard. Conversational AI allows building voice agents comparable to Vapi, but with the advantage of the best voice quality on the market.
For developers, ElevenLabs offers well-documented SDKs (Python, TypeScript), streaming API, voice cloning, and flexible architecture connecting to any LLM. The main trade-offs are cost at scale and no self-hosting option - but if voice quality is the priority, ElevenLabs is hard to beat.