Back to articles
AIHugging Face Blog

FastRTC: The Real-Time Communication Library for Python

Back to Articles FastRTC: The Real-Time Communication Library for Python Published February 25, 2025 Update on GitHub Upvote 172 +166 Freddy Boulton freddyaboulton Follow Abubakar Abid abidlabs Follow I...

The RSS feed only provided an excerpt. FlowMarket recovered the public content available from the original page without bypassing restricted content.

FastRTC: The Real-Time Communication Library for Python

FastRTC: The Real-Time Communication Library for Python

  • +166
Freddy Boulton
Abubakar Abid

In the last few months, many new real-time speech models have been released and entire companies have been founded around both open and closed source models. To name a few milestones:

  • OpenAI and Google released their live multimodal APIs for ChatGPT and Gemini. OpenAI even went so far as to release a 1-800-ChatGPT phone number!
  • Kyutai released Moshi , a fully open-source audio-to-audio LLM. Alibaba released Qwen2-Audio and Fixie.ai released Ultravox - two open-source LLMs that natively understand audio.
  • ElevenLabs raised $180m in their Series C.

Despite the explosion on the model and funding side, it's still difficult to build real-time AI applications that stream audio and video, especially in Python.

  • ML engineers may not have experience with the technologies needed to build real-time applications, such as WebRTC.
  • Even code assistant tools like Cursor and Copilot struggle to write Python code that supports real-time audio/video applications. I know from experience!

That's why we're excited to announce FastRTC , the real-time communication library for Python. The library is designed to make it super easy to build real-time audio and video AI applications entirely in Python!

In this blog post, we'll walk through the basics of FastRTC by building real-time audio applications. At the end, you'll understand the core features of FastRTC :

  • 🗣️ Automatic Voice Detection and Turn Taking built-in, so you only need to worry about the logic for responding to the user.
  • 💻 Automatic UI - Built-in WebRTC-enabled Gradio UI for testing (or deploying to production!).
  • 📞 Call via Phone - Use fastphone() to get a FREE phone number to call into your audio stream (HF Token required. Increased limits for PRO accounts).
  • ⚡️ WebRTC and Websocket support.
  • 💪 Customizable - You can mount the stream to any FastAPI app so you can serve a custom UI or deploy beyond Gradio.
  • 🧰 Lots of utilities for text-to-speech, speech-to-text, stop word detection to get you started.

Let's dive in.

Getting Started

We'll start by building the "hello world" of real-time audio: echoing back what the user says. In FastRTC , this is as simple as:

from fastrtc import Stream, ReplyOnPause
import numpy as np

def echo(audio: tuple[int, np.ndarray]) -> tuple[int, np.ndarray]:
    yield audio

stream = Stream(ReplyOnPause(echo), modality="audio", mode="send-receive")
stream.ui.launch()

Let's break it down:

  • The ReplyOnPause will handle the voice detection and turn taking for you. You just have to worry about the logic for responding to the user. Any generator that returns a tuple of audio, (represented as (sample_rate, audio_data) ) will work.
  • The Stream class will build a Gradio UI for you to quickly test out your stream. Once you have finished prototyping, you can deploy your Stream as a production-ready FastAPI app in a single line of code - stream.mount(app) . Where app is a FastAPI app.

Here it is in action:

Leveling-Up: LLM Voice Chat

The next level is to use an LLM to respond to the user. FastRTC comes with built-in speech-to-text and text-to-speech capabilities, so working with LLMs is really easy. Let's change our echo function accordingly:

import os

from fastrtc import (ReplyOnPause, Stream, get_stt_model, get_tts_model)
from openai import OpenAI

sambanova_client = OpenAI(
    api_key=os.getenv("SAMBANOVA_API_KEY"), base_url="https://api.sambanova.ai/v1"
)
stt_model = get_stt_model()
tts_model = get_tts_model()

def echo(audio):
    prompt = stt_model.stt(audio)
    response = sambanova_client.chat.completions.create(
        model="Meta-Llama-3.2-3B-Instruct",
        messages=[{"role": "user", "content": prompt}],
        max_tokens=200,
    )
    prompt = response.choices[0].message.content
    for audio_chunk in tts_model.stream_tts_sync(prompt):
        yield audio_chunk

stream = Stream(ReplyOnPause(echo), modality="audio", mode="send-receive")
stream.ui.launch()

We're using the SambaNova API since it's fast. The get_stt_model() will fetch Moonshine Base and get_tts_model() will fetch Kokoro from the Hub, both of which have been further optimized for on-device CPU inference. But you can use any LLM/text-to-speech/speech-to-text API or even a speech-to-speech model. Bring the tools you love - FastRTC just handles the real-time communication layer.

Bonus: Call via Phone

If instead of stream.ui.launch() , you call stream.fastphone() , you'll get a free phone number to call into your stream. Note, a Hugging Face token is required. Increased limits for PRO accounts.

You'll see something like this in your terminal:

INFO:	  Your FastPhone is now live! Call +1 877-713-4471 and use code 530574 to connect to your stream.
INFO:	  You have 30:00 minutes remaining in your quota (Resetting on 2025-03-23)

You can then call the number and it will connect you to your stream!

Next Steps

  • Read the docs to learn more about the basics of FastRTC .
  • The best way to start building is by checking out the cookbook . Find out how to integrate with popular LLM providers (including OpenAI and Gemini's real-time APIs), integrate your stream with a FastAPI app and do a custom deployment, return additional data from your handler, do video processing, and more!
  • ⭐️ Star the repo and file bug and issue requests!
  • Follow the FastRTC Org on HuggingFace for updates and check out deployed examples!

Thank you for checking out FastRTC !

Models mentioned in this article 4

More Articles from our Blog

Hugging Face and Cloudflare Partner to Make Real-Time Speech and Video Seamless with FastRTC

Gemma 3n fully available in the open-source ecosystem!

  • +4

Community

Wow.

Image de l'article

Can fastphone() accept an Indian phone number?

  • 4 replies

We're working on getting a whatsapp number

This is amazing!

Image de l'article

📻 🎙️ Hey, I generated an AI podcast about this blog post, check it out!

This podcast is generated via ngxson/kokoro-podcast-generator , using DeepSeek-R1 and Kokoro-TTS .

Thx to all all. Great work!!!

I have a question for concurrency when use tts_model and stt_model. How does each type of model handle multiple requests at the same time. (e.g. batching technique ? cpu-only threading ....) @ freddyaboulton

  • 6 replies

Hi @ MRU4913 ! Each stream is an independent event in the event loop. But you can limit how many streams run concurrently very easily. There is a parameter in the Stream class

Image de l'article

Taking a while to connect. Are you on a VPN? Anyone else stuck with this error (I am not using VPN)? This only happens on Gemini examples

  • 1 reply

Hi @ Nirav-Madhani - I am not sure. Let me see. Feel free to clone and run locally in the meantime.

Image de l'article

Would be very cool if you can also add a example with Azure OpenAI-API

Hi @ MechanicCoder - please feel free to add an example here if you’d like. It should be straightforward- take the example in this blog post and replace the LLM with the api call for the LLM on Azure you like.

  • 2 replies

Hey, have a working example...should I send you a repo link?

Can I connect something like FreeSWITCH and have its RTC directly parsed by fastRTC?

  • 1 reply

I have not tried this myself but I think so. The FastRTC server is completely open so you can integrate with any telephony/webrtc client.

Please open a PR to add a guide on how to do this: https://github.com/freddyaboulton/fastrtc/blob/main/docs/userguide/audio.md

Also feel free to join the HF discord and ask questions in the fastrtc-channels: https://discord.gg/TSWU7HyaYu

Hi, I'm new to WebRTC applications, and one of my main questions is: how does the process of capturing audio work? I mean, in demos, you always take the audio directly from the microphone, but I'd like to know if it's possible to get the input audio from a specific port (for example, a listening port where RTP packets are arriving). I guess I need to better understand how WebRTC communications work...Thank you!

  • 2 replies

Can you tell me a bit more about the use case @ JuanRoyo ? WebRTC requires a "handshake" to happen between the two clients. This handshake is taken care of by the webrtc/offer route of the FastRTC server. So you can just send a post request there. See this js code snippet:

https://fastrtc.org/userguide/api/

· Sign up or log in to comment

  • +160

Models mentioned in this article 4

Need an n8n workflow or help installing it?

After the briefing, move to execution: find an n8n template or a creator who can adapt it to your tools.

Source

Hugging Face Blog - huggingface.co

View original publication