Case Study

End-to-End Video the Agent Runs Itself.

How we connected an autonomous execution agent to Remotion, Gemini, Flux, ElevenLabs, and Suno — so client video content goes from brief to rendered file with one human handoff checkpoint.

Autonomous video production pipeline
1
Human Handoff Point per Video
6
AI APIs Orchestrated
MOCK_MODE
Safe Testing Without API Costs
Live
Current Status

The Problem

Producing client video content required manually stitching Remotion (programmatic video), image generation APIs, voice synthesis, and music generation for every project. Each pipeline was built from scratch, and every API had its own auth, rate limits, and output format to manage.

The creative work — art direction, script, pacing — was getting crowded out by infrastructure work. The ratio of time spent on creative decisions versus API plumbing was badly wrong.

What We Built

We connected an autonomous execution agent to a unified video pipeline: Remotion for programmatic composition, Gemini and Flux/Replicate for image generation, Runway for motion, ElevenLabs for voice synthesis, and Suno for music. The agent orchestrates all six based on a brief and a pre-made decision fallback library for the choices it can make without human input.

A MOCK_MODE flag lets the pipeline run dry-run tests without burning API credits. A Cowork Execution Protocol defines exactly one moment where human review adds value — the final output check before delivery — and keeps the agent autonomous for everything before that.

Producing client video used to mean stitching four APIs by hand for every project. Now the agent runs the pipeline and I review the output once.

The System Architecture

Autonomous agent orchestrating six APIs: Remotion (video composition), Gemini + Flux/Replicate (image generation), Runway (motion), ElevenLabs (voice), Suno (music). Pre-made decision fallback library for non-critical choices. MOCK_MODE flag for dry-run testing. Cowork Execution Protocol with single human handoff checkpoint. MCP-on-Edge-Functions pattern for API surface.

The Results

End-to-end video generation the agent can run independently. A project that used to require 4–6 hours of manual API work now runs overnight and surfaces a review-ready cut in the morning.

The MOCK_MODE pattern and Execution Protocol have become a template for other autonomous pipelines at Automaton — any multi-API workflow can be built with the same safety rails.

Client

Automaton (internal — creative automation infrastructure)

Engagement

Internal Build
Initial build: Ongoing

Stack

  • Remotion
  • Gemini
  • Flux / Replicate
  • Runway
  • ElevenLabs
  • Suno
  • MCP Protocol

Services

  • Creative Automation
  • API Integration
  • Pipeline Architecture
  • Autonomous Agent Design

Your Turn
Similar problem?
Every system we build starts with understanding what's broken.
Book a call →