SLATE 03/03 PROJ operator-stack TYPE voice-to-video CODEC h.264
A voice-to-video Claude worker.
Script → MP4. Built on Remotion.
Dispatched a script + visual direction, renders a short MP4. Six-section brief is the contract; Remotion + ffmpeg + your footage source do the work. MIT licensed.
01What it is
Accepts a brief. Renders an MP4 through Remotion.
A working Claude worker that does one thing. You give it a brief, it reads the visual system you've defined and the footage source you've configured, and it renders an MP4 through Remotion. Fork the repo, stand up Remotion alongside, configure two reference files, dispatch your first brief.
Track 01
Brief-driven
Every dispatch starts with a seven-section brief at briefs/<date>-<slug>.md. Script, timing, visual direction, voice source — all in the brief. The worker reads it, renders, verifies, hands back the MP4 path.
Track 02
Voice-clone-ready
TTS, recorded audio, or voice-clone — your choice per brief. Voice-clone of a named third party is gated behind a consent-confirmation gate. The worker refuses without it, on purpose, every time.
Track 03
Footage-configurable
Pexels via API (key env-loaded, never committed), a local licensed footage library, or per-brief operator-supplied assets. The worker pulls from the source you configure, nowhere else.
02Worked example
One brief. One thirty-second cut.
The repo ships with one worked example. A seven-section brief asks for a 30-second vertical 9:16 voice-over MP4 for a focus-timer app called FocusBlock. The worker reads the brief, constructs the composition props, queries Pexels for the three footage cues, generates the TTS voice-over, invokes Remotion. Output is a 1080×1920 h.264 MP4 with burned-in captions, roughly 8-12MB.
PREVIEW PENDING
Render the example MP4 locally,
host it publicly, and wire the<video> tag in docs/index.html.
Full render plan — props JSON, render command, verification table — is in examples.md. The paired brief is at briefs/2026-05-14-example.md.
03Why this one
Five tracks this starter holds that most don't.
There are other Claude-worker starter kits. The differentiators below are the reasons this one exists. Each is enforced by the architecture, not by a README that hopes you'll remember.
-
ICM rigor
The three always-relevant files —
CLAUDE.md,CONTEXT.md,STATUS.md— plus identity, rules, examples, and reference. Not "aCLAUDE.mdand hope." Structural, named, enforced. -
~90% of work lives outside Claude
Infrastructure / Orchestration / AI, in the corrected definition. AI is the 10%; Remotion, ffmpeg, your footage source, your CDN — the other 90% — live outside Claude. Most kits make Claude the whole stack. This one keeps it in its lane.
-
Brief-as-contract dispatch
Clean role boundary between orchestrator and worker. Especially load-bearing for video: a wrong render burns Pexels quota, TTS / voice-clone API charges, and render compute. The contract closes the guessing space before the API charges start.
-
Self-contained Pages landing
The repo ships its own landing surface: inline CSS, no external font/CDN fetches, deployable via GitHub Pages from the
/docsfolder in sixty seconds after push. The page you're reading now is that file. -
Tied to a real article series
The architecture is documented in the operator-AI series, starting with "I burned 800,000 tokens on one daily routine." Read the article, then the code. The two reinforce each other.
04In production
Real-user receipts.
This repo is a fresh release. Real-user receipts go in the block below once operators who fork it ship work with it.
[ RECEIPTS PENDING ]
Render counts and shipped MP4s go here.
Forked this repo and shipped real video with it? Open an issue with a one-paragraph note on what you rendered and how the brief shape held up. Selected receipts will be quoted here verbatim, with consent.
05Get started
Five steps from fork to first render.
-
Fork the repo
Click Fork on the GitHub page, or clone directly:
git clone https://github.com/NFTYoginis/your-animation-worker.git -
Stand up Remotion alongside
In a sibling folder:
npx create-video@latest. The repo deliberately doesn't bundle Remotion — pinned dependencies rot fast. You install fresh against current versions. Also installffmpeglocally (brew install ffmpegon macOS). -
Configure visual-system and footage-source
Edit
reference/visual-system.md(palette, type, motion) andreference/footage-sources.md(Pexels API key in.env, library path, or per-brief assets). The worker refuses to render against the placeholder text — the gates are there to prevent dispatching against an unconfigured stack. -
Write your first brief
Copy
briefs/_BRIEF-TEMPLATE.mdtobriefs/<today>-<slug>.md. Fill the seven sections. Keep it on one screen. Read the script aloud at target duration — if your "30-second video" reads as 50 seconds, fix it in the brief, not at render time. -
Dispatch
Paste this into a Claude session opened in the repo folder:
Read the brief at briefs/<your-filename>.md and execute.The worker reads the contract, constructs the composition props, invokes Remotion, and hands back the path to the rendered MP4.
06Questions
What you'll want to know.
Why Remotion and not a video-editor binary or a pure ffmpeg script?
Remotion lets you declare compositions in React/JSX. The brief names which composition and what props; the worker fills the props and renders. You don't generate ffmpeg flag strings from scratch every dispatch — you pick a composition. ffmpeg is still in the stack for post-render passes (aspect conversion, audio-mux for source-with-voice-over mixes, caption burn-in for soft-track MP4s) but Remotion is the primary render path. The reference file reference/remotion-pipeline.md documents the composition shapes and render commands.
Why doesn't the repo include a working Remotion project?
Pinned dependencies rot fast. A starter that bundles node_modules/ and a frozen package.json goes stale within months. The repo documents the composition shapes and the render commands; the operator stands up Remotion fresh against current versions. The setup step is roughly two minutes (npx create-video@latest + npm install), and you get a current Remotion install instead of a six-month-old one.
Why isn't the example MP4 committed to the repo?
A 30-second vertical at h.264 CRF 18 runs roughly 8-15MB. A year of weekly renders would balloon a fork's .git directory past comfortable clone size. The repo carries render *receipts* — the props JSON, the render command, the verification table — and gitignores *.mp4. To see the example output, either reproduce the render locally (the receipt has everything you need) or watch the embedded video on this page once the operator wires up the <video> tag with a public URL.
What does the worker refuse to do?
Five named refusal gates in rules.md, each with verbatim refusal language. The big ones: empty visual-system configuration, empty footage-source configuration, missing brief preconditions, voice-clone or deepfake of a named third party without consent confirmation, and high-harm domains without explicit operator authorization. When a gate fires, the worker writes a question file at briefs/questions/<slug>-question.md and stops. It does not "do its best."
Why seven brief sections instead of six (vs. the content sibling)?
Video has visual direction as a first-class concern that prose doesn't. The animation brief separates "What to produce" (aspect, duration, format) from "Visual direction" (per-section footage cues, title-card spec, caption style) — two related but distinct contracts. Forcing them into one section either makes the section sprawl or invites the worker to guess what was meant. Seven sections is the smallest contract that closes the guessing space for video. Same principle as the content sibling's six; one section bigger because video carries one more dimension.
How much does a render actually cost?
For the worked example (30-second vertical, 3 Pexels clips, ~75 words of TTS via tts-1-hd): roughly <$0.01 on OpenAI TTS, 3 requests against Pexels (well within free-tier rate limit), and 30-90 seconds of render compute on a modern laptop. Voice-clone services typically charge $0.30 per 1,000 characters generated, so the same script via voice-clone costs roughly $0.02. The brief-as-contract pattern matters here precisely because a wrong render costs all of the above twice — the contract gets you to the right render on the first dispatch.
Is this affiliated with Anthropic, Claude, or Remotion?
No. This is an independent project demonstrating one way to structure a Claude-based voice-to-video worker on top of Remotion. Claude is the model the worker runs on; Remotion is the render engine; the architecture and the opinions are this repo's.
07The series
Three repos. One architecture. Different domains.
This repo is the third in a three-repo operator-stack series, each demonstrating the same architecture in a different domain. The three pair naturally — content writes the script, design produces the title-card still, animation renders the video.
The architecture is documented in Article 1 of the operator-AI series: I burned 800,000 tokens on one daily routine. Here's the architecture that killed it. Read the article for the receipts and the four mistakes that produced the number; read this repo for the code that runs the fix.