Faster Whisper Server：轻松实现语音转文本

用 Faster Whisper Server 轻松实现语音转文本

https://github.com/fedirz/faster-whisper-server Faster Whisper Server 是一个基于 OpenAI API 的语音转文本服务器，它利用了更快的 Whisper 模型 (faster-whisper) 作为后端。它拥有许多实用功能，让你的语音转文本工作变得更加便捷高效。

项目优势

GPU 和 CPU 支持: 根据你的硬件配置，你可以选择使用 GPU 加速或 CPU 处理，更快地完成语音转文本任务。
Docker 易于部署: 只需几行命令，你就可以轻松在 Docker 环境中部署 Faster Whisper Server。
环境变量配置: 你可以通过修改配置文件 config.py 来定制服务器的行为，例如调整模型、设置日志级别等。
OpenAI API 兼容: Faster Whisper Server 完全兼容 OpenAI API，你可以使用你熟悉的工具和代码来调用它。

OpenAI API 兼容性

Faster Whisper Server 提供了与 OpenAI API 类似的接口，让你可以轻松地将你的现有代码迁移过来。

音频文件转文本: 使用 POST /v1/audio/transcriptions 端点将音频文件转换为文本。与 OpenAI API 不同，Faster Whisper Server 还支持流式转录，这意味着你可以接收转录结果的逐段更新，而不是等待整个文件转录完成。这对于处理大型音频文件非常有用。
音频文件翻译: 使用 POST /v1/audio/translations 端点将音频文件翻译成另一种语言。
实时转录: (还在开发中) 使用 WS /v1/audio/transcriptions 端点实现实时转录。

快速上手

Hugging Face Space: https://huggingface.co/spaces/Iatalking/fast-whisper-server
Docker:

docker run --gpus=all --publish 8000:8000 --volume ~/.cache/huggingface:/root/.cache/huggingface fedirz/faster-whisper-server:latest-cuda
# or
docker run --publish 8000:8000 --volume ~/.cache/huggingface:/root/.cache/huggingface fedirz/faster-whisper-server:latest-cpu

Docker Compose:

curl -sO https://raw.githubusercontent.com/fedirz/faster-whisper-server/master/compose.yaml
docker compose up --detach faster-whisper-server-cuda
# or
docker compose up --detach faster-whisper-server-cpu

使用示例

OpenAI API CLI:

export OPENAI_API_KEY="cant-be-empty"
export OPENAI_BASE_URL=http://localhost:8000/v1/

openai api audio.transcriptions.create -m Systran/faster-distil-whisper-large-v3 -f audio.wav --response-format text

openai api audio.translations.create -m Systran/faster-distil-whisper-large-v3 -f audio.wav --response-format verbose_json

OpenAI API Python SDK:

from openai import OpenAI

client = OpenAI(api_key="cant-be-empty", base_url="http://localhost:8000/v1/")

audio_file = open("audio.wav", "rb")
transcript = client.audio.transcriptions.create(
    model="Systran/faster-distil-whisper-large-v3", file=audio_file
)
print(transcript.text)

CURL:

# If 'model' isn't specified, the default model is used
curl http://localhost:8000/v1/audio/transcriptions -F "file=@audio.wav"
curl http://localhost:8000/v1/audio/transcriptions -F "file=@audio.mp3"
curl http://localhost:8000/v1/audio/transcriptions -F "file=@audio.wav" -F "stream=true"
curl http://localhost:8000/v1/audio/transcriptions -F "file=@audio.wav" -F "model=Systran/faster-distil-whisper-large-v3"
# It's recommended that you always specify the language as that will reduce the transcription time
curl http://localhost:8000/v1/audio/transcriptions -F "file=@audio.wav" -F "language=en"

curl http://localhost:8000/v1/audio/translations -F "file=@audio.wav"

实时转录 (使用 WebSocket):

#  需要安装 websocat
#  从 [live-audio](/fedirz/faster-whisper-server/blob/master/examples/live-audio) 示例中获取更多信息

ffmpeg -loglevel quiet -f alsa -i default -ac 1 -ar 16000 -f s16le - | websocat --binary ws://localhost:8000/v1/audio/transcriptions

Faster Whisper Server 能够帮助你轻松实现语音转文本，并提供多种灵活的部署和使用方式。

Faster Whisper Server：轻松实现语音转文本

用 Faster Whisper Server 轻松实现语音转文本

项目优势

OpenAI API 兼容性

快速上手

使用示例

See Also

最近文章

分类

标签

友情链接

其它