Motion and Video

Why Agents Need Programmatic Video

Traditional video tools --- Premiere, After Effects, Final Cut --- require GUI interaction. Agents can't click timeline scrubbers. They can't drag keyframes. They can't adjust bezier curves on easing functions with a mouse. Programmatic video frameworks solve this by treating video as code: versionable, diffable, reproducible.

The shift mirrors what happened with design. Design moved from GUI-only tools (Photoshop) to code-friendly formats (CSS, HTML, design tokens). Video is making the same transition. The timing matters because agents can now produce video artifacts the same way they produce code and design --- by writing text that compiles to frames.

Three use cases drive agent-driven video production in practice. First, product marketing: a startup needs a 60-second product intro, and the designer writes it once with code rather than editing frames manually. Second, personalized video at scale: a SaaS company generates thousands of onboarding videos, each with the customer's name and usage data. Third, design documentation: a team produces animated walkthroughs of UI flows, generated directly from the design system tokens covered in Chapter 08.

Two approaches have emerged. Remotion uses React components as the authoring format. Hyperframes uses HTML + GSAP. The philosophical difference sounds minor. In practice, it shapes everything: how you author, how you preview, how you debug, how you render, and what your license allows.

Remotion: React-Based Video Production

Remotion is the established player. Created by Jonny Burger (Remotion AG), 47.1k GitHub stars, 3.3k forks, 622+ releases as of May 2026. The current version is v4.0.462. It's TypeScript-first (74.5%), with PHP for Lambda rendering and Rust for native components. The ecosystem is substantial: 3M npm installs, 800+ pages of documentation, 35+ templates, 8000+ Discord members, and 300+ contributors. Trusted by GitHub, Musixmatch, Wistia, and SoundCloud.

Remotion Studio interface showing a React video composition with a preview player on the left, a composition list panel, timeline controls with frame-by-frame scrubbing, and render parameter settings — Remotion Studio showing a video composition with preview player and timeline controls

The mental model: your video is a React component tree. Each frame is a render of that tree at a specific point in time. The framework handles the mapping between time and component state. If you know React, you know most of Remotion already.

Core primitives:

Primitive	Purpose	Key Props
`<Composition>`	Registers a renderable video	`id`, `fps`, `durationInFrames`, `width`, `height`
`<Sequence>`	Time-shifts child components	`from`, `durationInFrames`
`<Series>`	Plays sequences back-to-back	Auto-calculated timing
`spring()`	Physics-based animation	`mass`, `damping`, `stiffness`
`interpolate()`	Maps frame number to value range	`inputRange`, `outputRange`, `extrapolateLeft/Right`
`useCurrentFrame()`	Hook for current frame number	None
`useVideoConfig()`	Hook for video metadata	Returns `fps`, `width`, `height`, `durationInFrames`

// src/Root.tsx — register compositions
import { Composition } from 'remotion';
import { ProductIntro } from './ProductIntro';

export const RemotionRoot: React.FC = () => {
  return (
    <Composition
      id="ProductIntro"
      durationInFrames={150}
      fps={30}
      width={1920}
      height={1080}
      component={ProductIntro}
    />
  );
};

// Frame-dependent rendering
import { AbsoluteFill, useCurrentFrame, interpolate } from 'remotion';

export const ProductIntro = () => {
  const frame = useCurrentFrame();

  const opacity = interpolate(frame, [0, 30], [0, 1], {
    extrapolateRight: 'clamp',
  });

  const scale = interpolate(frame, [0, 30], [0.8, 1], {
    extrapolateRight: 'clamp',
  });

  return (
    <AbsoluteFill style={{
      backgroundColor: 'var(--color-background)',
      justifyContent: 'center',
      alignItems: 'center',
    }}>
      <div style={{ opacity, transform: `scale(${scale})` }}>
        <h1 style={{ fontFamily: 'var(--font-display)' }}>
          Introducing the Future
        </h1>
      </div>
    </AbsoluteFill>
  );
};

// Sequences for timing control
import { Sequence } from 'remotion';

const ProductTrailer = () => {
  return (
    <>
      <Sequence durationInFrames={30}>
        <LogoReveal />
      </Sequence>
      <Sequence from={30} durationInFrames={60}>
        <FeatureShowcase />
      </Sequence>
      <Sequence from={90} durationInFrames={60}>
        <CallToAction />
      </Sequence>
    </>
  );
};

The spring() function deserves a closer look. Unlike CSS transitions with fixed durations, spring animations are physics-based. You set mass, damping, and stiffness, and the framework calculates the animation curve. This produces motion that feels natural --- objects accelerate, overshoot slightly, and settle.

import { spring, useCurrentFrame, useVideoConfig } from 'remotion';

const AnimatedTitle = () => {
  const frame = useCurrentFrame();
  const { fps } = useVideoConfig();

  const scale = spring({
    frame,
    fps,
    config: {
      stiffness: 100,
      damping: 10,
      mass: 1,
    },
  });

  return (
    <div style={{
      transform: `scale(${scale})`,
      fontFamily: 'var(--font-display)',
    }}>
      Hello World
    </div>
  );
};

Sequences can nest. A sequence inside another sequence is offset by the parent's from value. This composition pattern maps naturally to how real videos are structured: acts contain scenes, scenes contain shots, shots contain elements.

Remotion Studio provides a browser-based preview with timeline scrubbing, a props editor, and a render button. You can deploy Studio to the cloud for non-technical team members to preview and render videos without touching code.

The rendering targets are extensive:

$ npx remotion render ProductIntro                # local MP4
$ npx remotion render --sequence                  # image sequence
$ npx remotion render --output custom.mp4         # custom output path
$ npx remotion render --codec prores              # ProRes for editing
$ npx remotion render --frames=0-29               # render specific range

Beyond local CLI rendering, Remotion supports Node.js SSR API (renderMedia()), AWS Lambda for distributed rendering, GitHub Actions for CI/CD, and Google Cloud Run (alpha as of May 2026). The Lambda integration is mature and production-tested for hyperscale rendering. The SSR API lets you render from any Node.js process, which is particularly useful for agent-driven pipelines.

Setting Up a Remotion Project with Agent Assistance

Quick start:

$ npx create-video@latest

This scaffolds a Remotion project with a src/Root.tsx that registers compositions. Each composition is a React component that receives props and renders frames.

Remotion has first-class agent support. The repository includes .claude/ and .cursor/ directories with CLAUDE.md and AGENTS.md files. This means Claude Code and Cursor understand Remotion's conventions out of the box.

my-video/
├── src/
│   ├── Root.tsx          # registers all compositions
│   ├── ProductIntro.tsx  # composition component
│   └── lib/
│       └── animations.ts # shared animation helpers
├── public/
│   └── assets/           # images, videos, fonts
├── package.json
└── remotion.config.ts

Agents can scaffold the project, write composition components, configure rendering, and even set up Lambda deployment. The Zod schema support on <Composition> enables visual editing of props in Remotion Studio, which agents can configure:

import { Composition } from 'remotion';
import { z } from 'zod';

const productSchema = z.object({
  title: z.string(),
  subtitle: z.string(),
  accentColor: z.string(),
  logoUrl: z.string(),
});

export const RemotionRoot = () => (
  <Composition
    id="ProductIntro"
    component={ProductIntro}
    schema={productSchema}
    defaultProps={{
      title: 'Product Name',
      subtitle: 'Tagline goes here',
      accentColor: '#E94560',
      logoUrl: '/assets/logo.svg',
    }}
    durationInFrames={150}
    fps={30}
    width={1920}
    height={1080}
  />
);

The agent workflow: you describe the video, the agent writes the composition components, you preview in Remotion Studio, you iterate, then you render. The tight integration with design tokens (from Chapter 08) means your video can share the same var(--color-primary) variables as your web UI.

Remotion also provides a Player component for embedding video previews in web applications and a Recorder for screen recording. The Editor Starter is a commercial template for building custom video editing applications on top of Remotion's rendering engine.

Hyperframes: HTML-Native Video Built for Agents

Hyperframes takes a fundamentally different approach. Created by HeyGen, 19k GitHub stars, 1.8k forks, Apache 2.0 license. The tagline is direct: "Write HTML. Render video. Built for agents."

Hyperframes editor showing an HTML timeline with GSAP animation keyframes on multiple layers, a properties panel for the selected keyframe, and a real-time preview of the animated sequence — Hyperframes timeline editor with GSAP keyframes, layer panel, and animation preview

Compositions are plain HTML files with data attributes. No React. No build step. The HTML file is both the render layer and the editable source of truth. This is the same principle that powers Huashu Design (Chapter 07) for prototypes and slide decks --- HTML as the native artifact, not an export target.

The key insight from HeyGen's internal evaluation: LLMs produce more creative output writing HTML+GSAP than React compositions. HTML is lower-friction for language models. No imports to manage. No component lifecycle to reason about. No JSX syntax to get right. The agent writes declarative HTML, adds animation attributes, and the framework handles the rest.

Attribute	Purpose
`data-composition-id`	Identifies the root element
`data-start`	Start time in seconds
`data-duration`	Duration in seconds
`data-track-index`	Layer ordering (higher = on top)
`data-width` / `data-height`	Composition dimensions
`data-volume`	Audio volume (0-1)

<div id="root"
     data-composition-id="product-intro"
     data-start="0"
     data-width="1920"
     data-height="1080">

  <video id="clip-1"
         data-start="0"
         data-duration="5"
         data-track-index="0"
         src="intro.mp4"
         muted playsinline></video>

  <h1 id="title"
      class="clip"
      data-start="1"
      data-duration="4"
      data-track-index="1"
      style="font-size: 72px; color: white;">
    Welcome to Hyperframes
  </h1>

  <audio id="bg-music"
         data-start="0"
         data-duration="5"
         data-track-index="2"
         data-volume="0.5"
         src="music.wav"></audio>
</div>

No React. No JSX. No build step. You write HTML, open it in a browser to preview, and render it to MP4. The data attributes tell the renderer when each element appears, for how long, and in what layer. The track index system is like a video editor's timeline --- elements on higher tracks overlay elements on lower tracks.

The CLI is deliberately non-interactive by default:

$ npx hyperframes init my-video
$ cd my-video
$ npx hyperframes preview         # live preview with hot reload
$ npx hyperframes render          # render to MP4
$ npx hyperframes render --output demo.mp4
$ npx hyperframes add flash-through-white   # add catalog block
$ npx hyperframes lint            # lint composition
$ npx hyperframes doctor          # diagnose issues

The --human-friendly flag enables interactive mode, but the default is flag-driven. This is a deliberate design choice: agents work better with non-interactive CLIs. Every command accepts flags and exits cleanly.

Hyperframes ships with a catalog of 50+ ready-to-use blocks. These are pre-built composition snippets for common patterns: flash-through-white, instagram-follow, title cards, lower thirds, transitions. You add them to your project with npx hyperframes add <block-name> and customize the data attributes. For agents, this means the agent can compose complex videos from building blocks rather than writing every element from scratch.

The Frame Adapter pattern is where Hyperframes' architecture really matters. It supports GSAP, Lottie, CSS animations, Three.js, Anime.js, and Web Animations API. For each adapter, Hyperframes handles deterministic seeking --- it pauses the animation library and scrubs to frame / fps before each capture. This eliminates the wall-clock dependency that plagues other renderers.

My take: Hyperframes' seek-based rendering is technically superior to Remotion's wall-clock approach for animation libraries. GSAP in Remotion races through timelines at wall-clock speed during render, producing mostly-empty frames after the initial frames. Hyperframes pauses GSAP, seeks to the exact time, then captures. If your video uses GSAP heavily, this alone is reason to choose Hyperframes.

Setting Up Hyperframes with AI Agents

The recommended path is skill-based:

$ npx skills add heygen-com/hyperframes

This installs 13 skills that teach the agent framework-specific patterns. Skills register as slash commands in Claude Code: /hyperframes, /hyperframes-cli, /hyperframes-media, /gsap, /lottie, /threejs, and more.

Requirements: Node.js >= 22 and FFmpeg.

For Codex and Cursor, Hyperframes provides dedicated plugins:

# Codex plugin
$ codex plugin marketplace add heygen-com/hyperframes \
    --sparse .codex-plugin --sparse skills --sparse assets

# Claude Code plugin
$ claude --plugin-dir .

Example agent prompts that work well:

"Create a 10-second product intro with a fade-in title."
"Turn this CSV into an animated bar chart race."
"Build a 60-second explainer video with 4 scenes."
"Add background music with ducking when narration plays."

The hyperframes init command installs skills automatically, so you can hand a project to an agent at any point in its lifecycle. Media preprocessing skills handle TTS (via Kokoro), transcription (via Whisper), and background removal (via u2net). The TTS integration is particularly useful for narrated videos --- the agent generates voiceover, measures the duration, and synchronizes visuals to the audio timeline automatically.

There's also a remotion-to-hyperframes skill for migrating React compositions to HTML. Useful if you start with Remotion and want to switch. The migration handles data attribute mapping, sequence-to-track conversion, and GSAP timeline adaptation.

Hyperframes supports two capture modes. BeginFrame mode runs on Linux and is fully deterministic --- no wall-clock dependency. Screenshot mode runs on macOS and Windows, falling back automatically when BeginFrame isn't available. For production CI/CD pipelines, Linux with BeginFrame mode is the recommended setup.

frame.md: Your Design System, Ready for Video

Every brand has a design spec. Colors, typography, spacing, composition rules. These specs are written for the web --- for screens where users scroll, resize, and interact. They are not written for a camera. frame.md bridges this gap. It translates a web-oriented design spec into a video-ready specification that an agent can use to compose branded video without guessing at scale, timing, or motion.

The pipeline:

The frame.md pipeline: brand design spec translated into video-ready composition parameters

You start with a design.md --- the same format used by Open Design (covered in Chapter 06). frame.md reads your colors, type scale, spacing tokens, and composition rules, then rewrites them for the 16:9 frame. The output is a DESIGN.md superset: all your original design tokens plus video-specific parameters like timing, transitions, track ordering, and camera moves. The agent reads this superset and produces Hyperframes HTML that inherits your brand identity automatically.

The tool lives at hyperframes.dev/design. Paste your design spec, and it returns a frame.md you drop into your project. For teams already using Open Design's DESIGN.md format, this creates a direct path: one spec drives both static output and video output without duplication.

Design Templates

If you don't have an existing design spec, frame.md ships with 10 templates. Each template maps a visual identity to video-appropriate defaults:

Template	Visual Character	Motion Style	Best For
Biennale Yellow	Warm parchment, solar yellow bloom, Instrument Serif	Slow fades, generous pauses	Art, culture, editorial
BlockFrame	Thick black borders, hard offset shadows, candy accents	Hard cuts, snap transitions	Startups, tech launches
Blue Professional	Cobalt primary, Space Grotesk display, Inter body	Clean wipes, measured pace	Enterprise, B2B, SaaS
Bold Poster	Shrikhand tilted display, red accent on cream	Ken Burns zooms, kinetic type	Marketing campaigns
Capsule	Pill-shaped editorial, cream paper, Bodoni Moda serif	Float animations, soft reveals	Lifestyle, fashion, food
Cartesian	Minimal sparse, warm parchment, hairline rules	Restrained, data-driven motion	Analytics, dashboards
Coral	Bebas Neue uppercase, coral on cream, Inter reading	Bold entrances, sweep reveals	Product announcements
Creative Mode	Cream + saturated candy, Archivo Black, JetBrains Mono	Character stagger, chart fills	Developer tools, data viz

Each template comes with fine-tuning controls for palette and typography. Download the frame pack and drop it into your Hyperframes project. The agent reads the template's DESIGN.md and produces video that matches the template's visual identity.

Skeleton Templates for Common Video Types

frame.md also provides skeleton templates that define scene structure, timing, and transition placement for common video formats:

Skeleton	Format	Duration	Scenes	Transitions
Social Reel	1080x1920 (portrait)	15s	6	1 shader at hero reveal, rest hard cuts
Launch Teaser	1920x1080 (landscape)	25s	8	2-3 shaders at key moments
Product Explainer	1920x1080	45s	12	Mixed durations, varied transitions
Cinematic Title	1920x1080	60s	7	Long holds, restrained shaders

These skeletons give the agent a structural starting point. You tell it "use the Launch Teaser skeleton with the BlockFrame template" and the agent produces an 8-scene, 25-second video with thick borders, hard shadows, and snap transitions at the right moments.

Animation Patterns

frame.md defines reusable animation patterns that map to specific use cases. These are copy-paste ready GSAP snippets the agent can drop into any composition:

Pattern	What it does	Use case
Counter animation	Stats animate from 0 to target	Metrics, KPIs, growth numbers
SVG stroke draw	Lines and paths draw themselves	Diagrams, flowcharts, data lines
Character stagger	Letters enter one by one	Headlines, logos, titles
Breathing float	Subtle vertical drift	Logos, icons, floating elements
Bar chart fill	Bars grow from bottom sequentially	Data comparisons, benchmarking
Highlight sweep	Accent underline sweeps across text	Feature callouts, emphasis
Ken Burns	Slow zoom from scale 1 to 1.03	Background images, hero shots

For transitions between scenes, 14 shader effects are available, organized by energy level:

Energy	Shaders	When to use
Calm	`cross-warp-morph`, `light-leak`, `domain-warp`	Editorial, culture, storytelling
Professional	`cinematic-zoom`, `whip-pan`, `sdf-iris`	Enterprise, product demos, B2B
Aggressive	`glitch`, `chromatic-split`, `ridged-burn`	Startups, launches, hype videos
Ethereal	`gravitational-lens`, `ripple-waves`, `swirl-vortex`	Brand films, ambient, mood

Example: A Product Launch Video with frame.md

A practical walkthrough. You have a SaaS product launching next month. The brand uses the Blue Professional template (cobalt primary, Space Grotesk display, clean and corporate). You want a 25-second launch teaser.

Step 1: Start the project with the template.

npx hyperframes init product-launch --template blue-professional
cd product-launch
npx hyperframes preview

Step 2: Tell the agent what to build.

Create a 25-second product launch teaser using the Blue Professional
template and the Launch Teaser skeleton (8 scenes).

Scenes:
1. Logo fades in on cobalt background (3s)
2. Tagline animates with character stagger (3s)
3. Feature 1: "Real-time collaboration" with icon animation (3s)
4. Feature 2: "AI-powered suggestions" with counter animation (3s)
5. Feature 3: "One-click deploy" with SVG stroke draw (3s)
6. Social proof: "10,000 teams" with bar chart fill (3s)
7. CTA: "Try free today" with highlight sweep (4s)
8. Logo + URL: breathing float on cobalt (3s)

Transitions: cinematic-zoom at scene 3 and scene 7.
All other transitions: hard cut.
Music: subtle, professional, ducks during text reveals.

Step 3: The agent produces the HTML composition.

<div id="stage" data-composition-id="product-launch"
     data-start="0" data-width="1920" data-height="1080" data-duration="25">

  <!-- Scene 1: Logo -->
  <div class="clip" id="s1" data-start="0" data-duration="3" data-track-index="0">
    <div class="scene-content" style="background: var(--cobalt);">
      <img id="logo" src="logo.svg" />
    </div>
  </div>

  <!-- Scene 2: Tagline -->
  <div class="clip" id="s2" data-start="3" data-duration="3" data-track-index="0">
    <div class="scene-content">
      <h1 class="display" style="font-family: 'Space Grotesk';">
        Build faster. Ship smarter.
      </h1>
    </div>
  </div>

  <!-- Scene 3: Feature 1 -->
  <div class="clip" id="s3" data-start="6" data-duration="3" data-track-index="0"
       data-transition="cinematic-zoom" data-transition-duration="0.5">
    <div class="scene-content">
      <img src="collab-icon.svg" />
      <h2>Real-time collaboration</h2>
    </div>
  </div>

  <!-- ... scenes 4-7 follow same pattern ... -->

  <!-- Scene 8: Logo + URL -->
  <div class="clip" id="s8" data-start="22" data-duration="3" data-track-index="0">
    <div class="scene-content" style="background: var(--cobalt);">
      <img id="logo-end" src="logo.svg" />
      <p class="url">example.com</p>
    </div>
  </div>

  <audio data-start="0" data-duration="25" data-track-index="2"
         data-volume="0.3" src="bg-music.wav"></audio>
</div>

Step 4: Review, refine, render.

npx hyperframes preview    # watch it in the browser
npx hyperframes lint       # check for timing issues
npx hyperframes render     # produce the MP4

The agent applied the Blue Professional template's typography (Space Grotesk display, Inter body) and colors (cobalt primary, cream background) automatically. The character stagger on the tagline, the counter animation on "10,000 teams," and the cinematic-zoom transitions at scenes 3 and 7 all came from the frame.md animation patterns. The agent didn't invent the visual language. It composed from the template's vocabulary.

When to Use frame.md

frame.md solves the cold-start problem for agent-generated video. Without it, the agent has to make every visual decision from scratch: what colors to use, what typeface, how fast to animate, when to transition. With frame.md, the brand decisions are already made. The agent's job is composition, not art direction.

This matters for three scenarios:

Scenario 1: A team with an existing design system. You maintain a DESIGN.md (Open Design format) or a Figma-based design system with extracted tokens. You want to produce branded video content without hiring a motion designer. frame.md translates your existing spec into video parameters. The output inherits your brand identity without manual configuration.

Scenario 2: Rapid video iteration. You produce multiple video variants for A/B testing social ads. Each variant needs different copy and different feature callouts but the same brand identity. With frame.md, you change the copy in each composition while the visual system stays consistent. Render ten variants in one batch.

Scenario 3: Agent-driven video at scale. A content team runs 50 product videos per month across multiple brands. Each brand has its own frame.md template. The agent selects the right template based on the product's brand, composes the video from the template's animation patterns, and renders. No human touches the visual identity. Humans review the narrative and the data accuracy.

My take: frame.md is the connective tissue between this chapter and Chapter 08 (Design Systems and Tokens). Your design system now has a path to video output that does not require a separate motion design effort. The agent reads the design spec, the frame.md template translates it, and the Hyperframes engine renders the result. This is the design-as-code principle extended to video: the same tokens, the same rules, the same review process, just a different output format.

From Reference Video to Animation: Agent-Driven Motion Extraction

Motion design often starts from reference. You see an animation on stripe.com, in an Apple keynote, or in a competitor's app, and you want to replicate it. The traditional approach is to describe the motion in words, let the agent interpret those words, and iterate through several failed attempts. The agent's mental model of "smooth ease-out with a slight overshoot" never matches what you actually saw.

A better approach emerged in mid-2026: drop the reference video into an agent session, let the agent analyze the motion, and produce a structured specification you can feed into Remotion, Hyperframes, or MagicPath. The agent does the hard part --- frame extraction, timing analysis, easing identification --- and you get an accurate starting point instead of a vague description.

Pipeline diagram showing the video-to-animation workflow: reference video is analyzed by an agent (Codex or Claude Code) using ffmpeg for frame extraction and LLM for motion understanding, producing a motion specification that feeds into AnimSpec for prompt generation or MagicPath for editable design output, then rendered via Remotion, Hyperframes, or MagicPath Canvas — The video-to-animation pipeline: agents extract motion parameters from reference video, producing structured specs for rendering tools

How the Agent Analyzes Video

The agent does not watch video the way you do. It breaks the video into frames, inspects the differences between consecutive frames, and builds a structured description of what moves, when, and how. Pietro Schirano demonstrated this workflow with Codex: drag a video into the prompt, tell Codex to "recreate these animations," and it analyzes the motion and generates implementation code (source: @skirano, retrieved 2026-06-03).

The process works in three steps:

Frame extraction. The agent uses ffmpeg to split the video into individual frames at the source framerate. A 10-second video at 30fps produces 300 frames. The agent can also sample at lower rates for faster analysis.
Motion analysis. The agent compares consecutive frames to identify what changed. It looks for position shifts (translate), size changes (scale), opacity transitions (fade), rotation, color shifts, and timing. The LLM's visual understanding turns pixel differences into structured motion descriptions.
Specification generation. The agent produces a structured output: element, property, start value, end value, duration, easing function, and delay. This specification is the bridge between the reference video and your rendering tool.

This is not a Codex-specific capability. Any agent with access to ffmpeg and visual understanding can do it. Claude Code, Codex, Cursor, and OpenCode all support the workflow. The key requirement is that the agent can process the video frames, which means either native multimodal support or a preprocessing step that converts frames to images the agent can inspect.

AnimSpec: Video-to-Prompt Extraction

AnimSpec (animspec.com) automates this extraction. You upload a screen recording, choose an output format, and AnimSpec produces a structured prompt your coding agent can implement. It offers 16 output formats covering UI cloning, animation recreation, design token extraction, and UX audits.

The service runs on Google Gemini models: Gemini 2.5 Flash for fast analysis (1 credit), Gemini 3 Flash for balanced (3 credits), and Gemini 3.1 Pro for precise analysis (20 credits). The output is a text prompt, not code --- you paste it into your agent session and the agent writes the implementation.

For the motion design workflow, the relevant formats are:

Format	What it produces	Best for
Clone UI Animation	Structured motion spec with timing, easing, and CSS/JS implementation code	Recreating specific animations from reference
Clone UI Component	Full component specification including layout, styles, and interactions	Rebuilding an entire UI element with its animations
Extract Design Tokens	Color, spacing, typography, and animation values as structured tokens	Building a motion design system from reference
Export	Production-ready code in React, Vue, or Svelte	Direct implementation without agent interpretation

The AnimSpec workflow is: record your screen → upload → choose format → get prompt → paste into agent. It handles the frame extraction and motion analysis that your agent would otherwise need to do manually with ffmpeg.

MagicPath: Agent-to-Design Handoff

MagicPath takes a different approach. Instead of producing a prompt for your coding agent, it provides a shared canvas where external agents (Claude Code, Codex, Cursor) can build editable designs directly. The agent analyzes the video, generates the motion specification, and creates the animation on the MagicPath canvas --- no manual paste step required (source: MagicPath Documentation, retrieved 2026-06-04).

The workflow from Schirano's demonstration: drag a video into Codex, tell it to "recreate these animations in MagicPath," and Codex analyzes the motion, generates the design files, and sends them to the MagicPath canvas. When Schirano was asked about credit costs, his response was: "If you use external agents like in this example, it costs 0 MagicPath credits." The agent does the work; MagicPath receives the output.

Installation is a single command:

npx skills add https://github.com/magicpathai/agent-skills --skill magicpath

After installation, the agent knows how to read from and write to the MagicPath canvas. It can create new designs, modify existing ones, and pull designs from MagicPath into your codebase. The skill works with any external agent: Claude Code, Codex, Cursor, and the Claude mobile app. MagicPath does not need to be open --- the canvas lives in the cloud and the agent communicates with it via API.

For the motion design workflow, the MagicPath skill enables this pattern: your agent analyzes a reference video, extracts the motion parameters, and builds the animation directly on the canvas. You then review it visually, make edits in the visual editor, and export production-ready code when it looks right. The entire loop --- from reference video to editable animation --- runs through one agent session.

Fitting Extraction into the Motion Stack

These tools solve different parts of the same problem. AnimSpec extracts motion from video and produces a prompt. MagicPath receives an agent's output and renders it on an editable canvas. Remotion and Hyperframes render production video from code. They compose into a pipeline:

Stage	Tool	Input	Output
1. Extract motion	Agent + ffmpeg, or AnimSpec	Reference video	Structured motion spec / prompt
2. Generate animation	Agent (Claude Code, Codex, Cursor)	Motion spec	Animation code (CSS, GSAP, React)
3. Render video	Remotion or Hyperframes	Animation code	MP4, WebM, GIF
3a. Editable design	MagicPath	Animation code	Canvas design (visual editing)

You can skip stages depending on your needs. If you want production video, the full pipeline runs extract → generate → render. If you want an editable prototype to share with your team, extract → generate → MagicPath. If you already know what motion you want and just need to code it, skip extraction and go straight to generate → render.

The extraction step is where agent capability matters most. A good agent extracts accurate timing, identifies the correct easing function, and distinguishes between simultaneous and sequential animations. A less capable agent produces vague descriptions like "fades in smoothly" that still require manual interpretation. This is where AnimSpec's structured analysis adds value: it standardizes the extraction so the output quality does not depend on which agent you use.

Practical Workflow: Recreating a Competitor's Onboarding Animation

A concrete example ties the tools together. You see a competitor's onboarding flow with smooth card transitions and want to replicate the motion pattern in your own product.

Step 1: Capture the reference.

Screen-record the competitor's onboarding flow. A 15-second recording at 60fps is sufficient.

Step 2: Extract the motion.

Option A --- use an agent directly:

# In Claude Code or Codex
# Drop the video file into the session

"Analyze this onboarding animation. Extract:
1. Each card transition: start position, end position, duration, easing
2. The stagger delay between sequential elements
3. Any scale or opacity changes during transitions
4. The overall timeline: when does each animation start and end

Output as structured JSON with CSS animation equivalents."

Option B --- use AnimSpec:

# Upload to animspec.com, select "Clone UI Animation"
# Paste the generated prompt into your agent session

Step 3: Generate the animation.

For Remotion:

"Using the motion spec I just extracted, create a Remotion composition
for a 3-card onboarding carousel. Each card slides in from the right
with the same timing and easing as the reference. Use spring()
for physics-based motion. Total duration: 180 frames at 30fps."

For Hyperframes:

"Using the motion spec, create a Hyperframes scene with GSAP timelines
for the card transitions. Match the reference easing values exactly.
Use our frame.md template for consistent styling."

For MagicPath:

"Using the MagicPath skill, recreate the onboarding animation from
this reference video in my open project. Match the card transitions,
timing, and easing."

Step 4: Iterate and render.

Review the output. If the timing is off, tell the agent which element to adjust: "The second card enters 200ms too early. Push it back." If the easing feels wrong: "Change the ease-out to cubic-bezier(0.16, 1, 0.3, 1)." The structured spec from step 2 gives you precise control over individual parameters.

My take: Video-to-animation extraction is the missing piece in the motion design stack. The current workflow for reference-driven motion is broken: you describe what you see in words, the agent interprets those words, and you iterate on a foundation of ambiguity. Extraction tools fix this by starting from the actual motion parameters instead of a verbal description. AnimSpec standardizes the extraction. MagicPath gives the agent a direct path to editable output. Neither replaces Remotion or Hyperframes for production rendering --- they make those tools more effective by giving them accurate input. Expect this category to grow fast. As agents get better at visual analysis, the extraction step will become a standard part of every motion design workflow.

Animation Runtimes: GSAP, CSS, Lottie, Three.js

The animation runtime is the most consequential technical difference between Remotion and Hyperframes. It affects how every library behaves during rendering. Understanding this difference is essential for choosing the right tool.

Side-by-side comparison showing GSAP JavaScript animation code and CSS keyframe animation code producing identical visual results for an interactive button hover effect — GSAP and CSS animation approaches compared for the same interactive element

GSAP: The primary animation runtime for Hyperframes. Hyperframes pauses GSAP and seeks it to frame / fps before each capture. In Remotion, GSAP's internal performance.now() ticker races through the timeline at wall-clock speed during render. The result: mostly-empty frames after the initial frames. This is not a minor issue. It makes GSAP effectively unusable in Remotion for any non-trivial animation.

// GSAP in Hyperframes — deterministic seeking
const tl = gsap.timeline({ paused: true });
tl.from("#title", { opacity: 0, y: 50, duration: 1 })
  .to("#title", { opacity: 1, y: 0, duration: 0.5 });
// Hyperframes pauses this timeline and seeks to frame/fps before each capture

CSS Animations: Both tools support CSS. Hyperframes uses the Web Animations API adapter for frame-accurate seeking. Remotion renders CSS animations frame-by-frame but can struggle with complex keyframe sequences that depend on computed styles.

Lottie: Hyperframes supports Lottie via window.__hfLottie registration for deterministic seeking. Remotion requires the @remotion/lottie package, which provides a springify utility but adds a dependency.

Three.js: Hyperframes renders from hf-seek events and window.__hfThreeTime instead of wall-clock time. Remotion has @remotion/three for React Three Fiber integration, which works well if you're already in the React ecosystem. The React Three Fiber integration is one of Remotion's genuine strengths --- you get the full React component model for 3D scenes.

Anime.js: Hyperframes registers on window.__hfAnime for deterministic seeking. Remotion has no native Anime.js adapter.

Runtime	Remotion	Hyperframes
GSAP	Wall-clock issues during render	Seekable, frame-accurate
CSS Animations	Frame-by-frame render	Web Animations API adapter
Lottie	`@remotion/lottie` package	`window.__hfLottie` registration
Three.js	`@remotion/three` (React Three Fiber)	`hf-seek` events
Anime.js	No native adapter	`window.__hfAnime`
WAAPI	Not documented	`document.getAnimations()` seeking

The pattern is clear. Hyperframes supports more animation runtimes with deterministic seeking. Remotion supports fewer runtimes but integrates deeply with the React ecosystem. If your video relies on GSAP, the choice is straightforward. If your video is pure React with spring animations, Remotion is the natural fit.

Rendering, Preview, and the Dev Loop

The development loop differs significantly between the two tools. This affects how fast you can iterate, which matters more in agent workflows than in traditional video production.

Terminal output from a Remotion render command showing a progress bar rendering frame 120 of 300 at 30fps with estimated completion time and final output file size in megabytes — Remotion render progress showing frame-by-frame video encoding in the terminal

Remotion Studio: A browser-based application with a sidebar, timeline, props editor, and render button. You edit code, the studio live-reloads, you scrub the timeline to check timing. Deployable to the cloud for team access. The props editor is particularly useful with Zod schemas --- you can adjust text, colors, and timing visually without touching code. This is Remotion's strongest UX advantage.

Hyperframes Preview: npx hyperframes preview opens a live preview in the browser with instant hot reload. No build step. Edit the HTML, save, see changes. Simpler than Remotion Studio but less feature-rich --- no timeline scrubbing, no props editor. The lack of a build step is the key advantage. In agent workflows, every second of iteration latency compounds. Hyperframes eliminates the webpack compilation that Remotion requires after each code change.

Remotion Render: CLI, Node.js SSR API (renderMedia()), AWS Lambda (distributed and production-tested), GitHub Actions, Google Cloud Run (alpha). The Lambda story is the clear leader in distributed rendering. For high-volume video production (thousands of personalized videos), Lambda scales horizontally. The SSR API lets you render from any Node.js process --- useful for agent pipelines that need to produce video as part of a larger workflow.

Hyperframes Render: npx hyperframes render --output output.mp4. Single-machine today. Docker support exists for containerized workflows. The stateless architecture doesn't block future distributed rendering, but as of May 2026, it hasn't shipped. This is Hyperframes' most significant gap compared to Remotion.

$ npx remotion render ProductIntro                  # Remotion: local render
$ npx remotion lambda render ProductIntro           # Remotion: distributed

$ npx hyperframes render                           # Hyperframes: local render
$ npx hyperframes render --output final.mp4        # Hyperframes: custom output

Remotion supports multiple output codecs: H.264, H.265, VP8, VP9, ProRes, AV1. Output formats include MP4, WebM, audio-only, image sequences, still images, GIF, and transparent video overlays. Hyperframes supports MP4 output and HDR via a two-pass compositing pipeline. For most use cases, MP4 covers what you need.

Common rendering issues and their fixes:

Symptom	Cause	Fix
Blank frames in Remotion with GSAP	Wall-clock timing races ahead	Replace GSAP with `interpolate()` or `spring()`
Blurry text in Hyperframes	Missing device pixel ratio	Set `data-scale` attribute or render at 2x
Audio sync drift	Variable frame timing	Use fixed FPS; avoid dynamic duration calculations
Long render times locally	Single-threaded capture	Remotion: use Lambda. Hyperframes: use Docker parallelism.

Remotion vs Hyperframes: An Honest Comparison

I've used both. Here's where each wins and loses.

A single frame from an AI-generated course video showing branded educational content with animated text overlays, smooth transitions, and design system-consistent color palette and typography — Course video frame generated by Hyperframes with branded educational animation and text overlays

Dimension	Remotion	Hyperframes
Authoring format	React components (TSX)	HTML + CSS + GSAP
Build step	Required (webpack/bundler)	None --- index.html plays as-is
GSAP support	Wall-clock during render (broken)	Seekable, frame-accurate
Arbitrary HTML/CSS	Must rewrite as JSX	Paste and animate
Distributed rendering	Lambda, production-ready	Single-machine today
HDR output	Not documented	Supported (two-pass)
Visual editor	Harder (code + build step)	Native (same DOM is editable)
License	Source-available, custom Remotion License	Apache 2.0 (OSI-approved)
Commercial pricing	Free for 3 people, paid above	Free at any scale
GitHub stars	47.1k	19k
Maturity	622+ releases, v4.x	137 releases, v0.6.x
Agent integration	CLAUDE.md, AGENTS.md in repo	13 skills, plugins for 4 agents
React ecosystem	Full reuse	No React dependency

The licensing difference deserves attention. Remotion is source-available under a custom license. Free for teams of 3 or fewer. Above that, you need a Company License: $25/month per seat (Creator tier), $100/month minimum with per-render pricing (Automator tier), or $500+/month (Enterprise). If you're rendering thousands of personalized videos via Lambda, the Automator pricing adds up.

Hyperframes is Apache 2.0. Free at any scale. No per-seat pricing. No per-render fees. For companies producing video at volume, this is a significant cost difference.

My take: If you're already in the React ecosystem and need distributed rendering at scale, Remotion is the right choice today. The Lambda story is mature and proven. If you're building agent-first video workflows, care about GSAP accuracy, or want to avoid license fees at scale, Hyperframes is the better bet. The licensing model alone tips the decision for many teams. I started with Remotion because of the ecosystem maturity. I switched to Hyperframes for agent workflows --- the build-step removal alone saves significant iteration time.

The practical decision framework:

Choose Remotion if: your team writes React daily, you need Lambda-scale distributed rendering, you want the mature ecosystem with 800+ pages of docs, or you're building a video editor product (Editor Starter).
Choose Hyperframes if: your primary author is an AI agent, you use GSAP for animations, you need HDR output, you want to paste arbitrary HTML and animate it, or license fees at scale are a concern.
Choose based on animation runtime: if your video relies heavily on GSAP, Lottie, or Three.js animations, Hyperframes' seek-based rendering produces more reliable output than Remotion's wall-clock approach.

Both tools support design token integration from your design system (Chapter 08). Whether you write var(--color-primary) in a TSX component or an HTML data-attribute div, the token flows through correctly. This means your videos share the same visual language as your web UI, your prototypes, and your slide decks.

The video export pipeline from Huashu Design (Chapter 07) sits adjacent to both tools. Huashu produces HTML animations and exports them to MP4/GIF using its own render pipeline (25fps base + 60fps interpolation). For design-focused motion --- product intros, UI walkthroughs, animated infographics --- Huashu's pipeline is simpler and faster to set up. For complex video compositions with multiple clips, audio tracks, and transitions, Remotion or Hyperframes are the right tools.

html-video: a CapCut-style editing layer on top of Hyperframes

Hyperframes is the engine an agent writes against: an HTML-native runtime that turns markup into rendered video frames. html-video is the dashboard built on top of it. It is an open-source (Apache 2.0) editing layer that sits on the Hyperframes runtime, and the pitch is blunt: a CapCut for agents that write HTML. You hand it a website link, a file, or an article, and the agent generates an MP4 from one of 20-plus style templates aimed at product promos and explainers.

What earns it a place in a video workflow is the editing model. Authoring against Hyperframes alone is render-then-look: you write markup, render the whole timeline, then watch it back to find what is wrong. html-video adds paginated preview and frame-level text editing, so you change a caption on page three and see it without re-rendering the whole timeline. That tightens the critique-revision step that pure CLI rendering leaves slow.

Two integrations matter. It auto-detects six local agent CLIs — Claude Code, Codex, Cursor, Hermes among them — and lets you switch in the top bar with no extra API keys. And it wires in MiniMax for narration and background music generated from the video's own content, so audio stops being a separate manual step.

This is a layering pattern: the framework gives the agent a deterministic, diffable target, and html-video adds the human-facing editing surface and template library on top. You do not have to choose. The agent can author against Hyperframes directly, or you can sit in html-video to scrub, edit text inline, and pick a template — the artifact underneath stays HTML either way.

html-video sits as an editing layer between the agent and the Hyperframes render engine. — Figure: html-video as an editing layer over the Hyperframes runtime.

Next: All of these design tools connect to agents through MCP servers. Chapter 10 covers the Model Context Protocol in depth --- how to configure MCP servers for Figma, Paper, Pencil, OpenPencil, and other tools, and how to chain them together for multi-tool workflows.

Why Agents Need Programmatic Video

Remotion: React-Based Video Production

Setting Up a Remotion Project with Agent Assistance

Hyperframes: HTML-Native Video Built for Agents

Setting Up Hyperframes with AI Agents

frame.md: Your Design System, Ready for Video

Design Templates

Skeleton Templates for Common Video Types

Animation Patterns

Example: A Product Launch Video with frame.md

When to Use frame.md

From Reference Video to Animation: Agent-Driven Motion Extraction

How the Agent Analyzes Video

AnimSpec: Video-to-Prompt Extraction

MagicPath: Agent-to-Design Handoff

Fitting Extraction into the Motion Stack

Practical Workflow: Recreating a Competitor's Onboarding Animation

Animation Runtimes: GSAP, CSS, Lottie, Three.js

Rendering, Preview, and the Dev Loop

Remotion vs Hyperframes: An Honest Comparison

html-video: a CapCut-style editing layer on top of Hyperframes

MCP Integrations

Preface

Copyright & License

The Agentic Design Paradigm

Your Agent Toolkit

Teaching Agents to Design

Design-as-Code

Paper and Pencil

Open Design and OpenPencil

Huashu Design

Design Systems and Tokens

Motion and Video

MCP Integrations

Multi-Agent Design Teams

Production UI from Design

Real-World Case Studies

The Future of Agentic Design

Tool Comparison Matrix

MCP Server Reference

Prompt Library for Design Tasks