音訊轉錄為字幕

總覽

Remotion 提供多種內建選項，可將音訊轉錄以產生字幕：

@remotion/install-whisper-cpp — 使用 Whisper.cpp 在伺服器本地端進行音訊轉錄
@remotion/whisper-web — 透過 WebAssembly 在瀏覽器中進行音訊轉錄
@remotion/openai-whisper — 使用 OpenAI Whisper API 進行雲端轉錄
@remotion/elevenlabs — 使用 ElevenLabs 語音轉文字 API 進行雲端轉錄

方案比較

方案	`@remotion/install-whisper-cpp`	`@remotion/whisper-web`	`@remotion/openai-whisper`	`@remotion/elevenlabs`
執行環境	伺服器 (Node.js)	用戶端 (瀏覽器)	雲端 (API)	雲端 (API)
速度	快速（依硬體而定）	慢（WASM 額外開銷）	快速	快速
費用	免費	免費	付費（OpenAI API 計價）	付費（ElevenLabs API 計價）
離線支援	支援	支援	不支援	不支援
無需伺服器	不支援	支援	支援	支援
轉換函式	`toCaptions()`	`toCaptions()`	`openaiWhisperApiToCaptions()`	`elevenLabsTranscriptToCaptions()`

`Caption` 型別

所有方案都可以將字幕輸出為 Caption 型別格式，這是在 Remotion 中使用的推薦格式。此格式具備以下特點：

可以使用 @remotion/captions 中的 API，例如 createTikTokStyleCaptions()
與 Remotion Editor Starter 使用的格式相符
與 Animated Captions 套件相容

`Caption` 型別定義

type Caption = {
  text: string;
  startMs: number;
  endMs: number;
  timestampMs: number | null;
  confidence: number | null;
};

使用 `@remotion/install-whisper-cpp`（本地伺服器）

此方案在伺服器端使用 Whisper.cpp 模型進行轉錄，完全免費且支援離線使用。

安裝

npm install @remotion/install-whisper-cpp

基本使用範例

import path from "path";
import {
  downloadWhisperModel,
  installWhisperCpp,
  transcribe,
  toCaptions,
} from "@remotion/install-whisper-cpp";
 
const whisperPath = path.join(process.cwd(), "whisper.cpp");
 
// 安裝 Whisper.cpp
await installWhisperCpp({
  to: whisperPath,
  version: "1.5.5",
});
 
// 下載語言模型（可選：tiny, base, small, medium, large）
await downloadWhisperModel({
  model: "medium",
  folder: whisperPath,
});
 
// 執行轉錄
const { transcription } = await transcribe({
  inputPath: "/path/to/audio.wav",
  whisperPath,
  model: "medium",
  tokenLevelTimestamps: true,
});
 
// 轉換為 Caption 格式
const { captions } = toCaptions({ transcription });
console.log(captions);

注意事項

音訊檔案必須為 WAV 格式（16kHz、單聲道）
首次執行需要下載模型，模型大小從數十 MB 到數 GB 不等
建議搭配 ffmpeg 進行音訊格式轉換

使用 `@remotion/whisper-web`（瀏覽器端）

此方案透過 WebAssembly 在瀏覽器中執行 Whisper，無需伺服器，但速度較慢。

安裝

npm install @remotion/whisper-web

基本使用範例

import { transcribe, toCaptions } from "@remotion/whisper-web";
 
// 在瀏覽器中轉錄音訊
const { transcription } = await transcribe({
  inputUrl: "https://example.com/audio.wav",
  model: "tiny",
  onProgress: (progress) => {
    console.log(`轉錄進度：${Math.round(progress * 100)}%`);
  },
});
 
// 轉換為 Caption 格式
const { captions } = toCaptions({ transcription });
console.log(captions);

注意事項

由於 WASM 的額外開銷，速度比伺服器端慢
適合不需要伺服器的純前端應用場景
瀏覽器必須支援 WebAssembly

使用 `@remotion/openai-whisper`（OpenAI 雲端 API）

此方案使用 OpenAI 的 Whisper API，速度快但需要付費。

安裝

npm install @remotion/openai-whisper

基本使用範例

import OpenAI from "openai";
import fs from "fs";
import {
  openaiWhisperApiToCaptions,
} from "@remotion/openai-whisper";
 
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
 
// 呼叫 OpenAI Whisper API
const transcription = await openai.audio.transcriptions.create({
  file: fs.createReadStream("/path/to/audio.mp3"),
  model: "whisper-1",
  response_format: "verbose_json",
  timestamp_granularities: ["word"],
});
 
// 轉換為 Caption 格式
const { captions } = openaiWhisperApiToCaptions({ transcription });
console.log(captions);

注意事項

需要有效的 OpenAI API 金鑰
按照 OpenAI API 計價收費
支援多種音訊格式（mp3、mp4、wav 等）

使用 `@remotion/elevenlabs`（ElevenLabs 雲端 API）

此方案使用 ElevenLabs 的語音轉文字 API，適合已使用 ElevenLabs 服務的用戶。

安裝

npm install @remotion/elevenlabs

基本使用範例

import {
  ElevenLabsClient,
  elevenLabsTranscriptToCaptions,
} from "@remotion/elevenlabs";
 
const client = new ElevenLabsClient({
  apiKey: process.env.ELEVENLABS_API_KEY,
});
 
// 呼叫 ElevenLabs 語音轉文字 API
const transcript = await client.speechToText.convert({
  audio: fs.createReadStream("/path/to/audio.mp3"),
  model_id: "scribe_v1",
});
 
// 轉換為 Caption 格式
const { captions } = elevenLabsTranscriptToCaptions({ transcript });
console.log(captions);

儲存轉錄結果

建議將轉錄結果儲存為 JSON 檔案，以便在渲染時重複使用，避免每次都重新呼叫 API：

import fs from "fs";
import path from "path";
 
// 儲存字幕到 public 資料夾
const captionsPath = path.join(process.cwd(), "public", "captions.json");
fs.writeFileSync(captionsPath, JSON.stringify(captions, null, 2));
console.log(`字幕已儲存至：${captionsPath}`);

渲染時再從 staticFile() 載入：

import { staticFile } from "remotion";
 
const response = await fetch(staticFile("captions.json"));
const captions = await response.json();

替代方案

你也可以自訂字幕格式，不一定要使用 Caption 型別。此頁面僅介紹內建選項。例如，你可以直接解析 SRT 或 VTT 檔案並自行處理時間軸邏輯。

總覽

方案比較

Caption 型別

Caption 型別定義

使用 @remotion/install-whisper-cpp（本地伺服器）

安裝

基本使用範例

注意事項

使用 @remotion/whisper-web（瀏覽器端）

安裝

基本使用範例

注意事項

使用 @remotion/openai-whisper（OpenAI 雲端 API）

安裝

基本使用範例

注意事項

使用 @remotion/elevenlabs（ElevenLabs 雲端 API）

安裝

基本使用範例

儲存轉錄結果

替代方案

參考資料

`Caption` 型別

`Caption` 型別定義

使用 `@remotion/install-whisper-cpp`（本地伺服器）

使用 `@remotion/whisper-web`（瀏覽器端）

使用 `@remotion/openai-whisper`（OpenAI 雲端 API）

使用 `@remotion/elevenlabs`（ElevenLabs 雲端 API）