Upload a video · get subtitles, translation, and a burned-in cut

The Agent transcribes with diarization, translates timing-preserving, and re-encodes through ffmpeg inside an isolated sandbox. You get the .mp4 with subtitles burned in plus external .srt files in every language you asked for — usually back in a few minutes.

A peek at what you get

Deliverable summary
Page 1 of 4·Deliverable summary
Burned-in frame
Page 2 of 4·Burned-in frame
.srt sample
Page 3 of 4·.srt sample
Multi-language matrix
Page 4 of 4·Multi-language matrix

Inside the sandbox · the ffmpeg pipeline

Every box below is a real intermediate file produced inside the workspace-scoped sandbox. The command beneath each step is the literal call the Agent made — open the run later and the transcript is the same.

input.mp4
184 MB · 4K · 60 fps · AAC
ffprobeffprobe -v quiet -show_streams input.mp4
audio.wav
mono · 16kHz
vb asrvb asr transcribe audio.wav --provider whisper-large --diarize
subtitles.en.srt
412 cues · diarized
write_filewrite_file subtitles.en.srt
subtitles.zh.srt
cue-by-cue · timing intact
translate_cuestranslate_cues subtitles.en.srt --target zh-CN --collapse-on length-overflow
output.mp4
192 MB · stacked bilingual
ffmpegffmpeg -i input.mp4 -vf 'ass=bilingual.ass' -c:a copy -preset fast output.mp4
Lands in Drive
keynote.subtitled.mp4
192 MB
subtitles.en.srt
32 KB
subtitles.zh-CN.srt
28 KB

Burned-in frame · what your viewers see

Two stacked lines, anti-aliased, anchored to the lower-third with a 60% black scrim so light scenes still read.

keynote · stage shot
Today we're announcing something we've quietly been the proudest of in the last eighteen months.
今天我们要发布的,是过去十八个月里最让我们感到自豪的一件事。
00:00:14
output.mp4·H.264 · 60fps · 2.1 Mbps AAC

How it works

Step 01

Upload the source video

MP4, MOV, WebM, MKV — up to 500 MB and 4K. The Agent uses ffprobe to read the actual stream specs and matches the output to your source resolution, frame rate, and audio channel layout.

Step 02

Tell the Agent how to deliver

Bilingual burn-in for Bilibili? Just the .srt for an editor? Five languages for a global launch? Say it in the chat — the Agent picks the right pipeline. Interrupt and re-scope anytime.

Step 03

Pick up the package

The .zip lands in your Drive: burned-in .mp4, every requested .srt, a .txt transcript, and a small report listing what was detected (speakers, languages, cue count). You get a public preview URL you can share without unzipping.

Why Vecbase for this

Real ffmpeg in a real sandbox — not a hosted "subtitle SaaS"

Most subtitle tools wrap one vendor and one preset. Here the Agent decides the encode parameters from your source (codec, fps, bitrate) and you can literally see the ffmpeg command in the transcript. If you need a tweak — say it. The next run uses your version.

Timing-preserving translation, not "translate this paragraph"

Cheap subtitle translators dump the .srt to a single string, translate, and re-split — timing drifts by entire seconds. The Agent translates cue-by-cue, only collapsing cues when the target language is significantly shorter, and logs every collapse so a human can audit.

Speaker diarization is on by default

Two-person interview? Panel discussion? The transcript labels Speaker 1 / Speaker 2 from the start — no separate step, no extra cost. You can rename them in chat ("Speaker 1 is the host, Speaker 2 is the founder") and the labels persist into every output language.

Your video never leaves your workspace

The source uploads into the sandbox tied to your workspace, the encode runs there, and only the named outputs are saved to your Drive — intermediate frames, audio tracks, and temp files are wiped when the run ends. We do not train models on customer media.

Frequently asked

The Agent reads `vb asr providers` first and picks based on what is enabled in your workspace and which provider handles the audio best (noisy / multi-speaker / accented vs clean studio). You can pin a specific provider in chat ("use OpenAI for this one" / "use Deepgram for diarization") — `vb asr providers` is the authoritative list.

Get yours in under 90 seconds

Sign in, hand it over to the Agent — the finished file lands in your Drive.