AD '26
All Chapters

CHAPTER 09

Motion and Video

Remotion, Hyperframes, and programmatic video for agents

Reading time

32 min

Why Agents Need Programmatic Video

Traditional video tools --- Premiere, After Effects, Final Cut --- require GUI interaction. Agents can't click timeline scrubbers. They can't drag keyframes. They can't adjust bezier curves on easing functions with a mouse. Programmatic video frameworks solve this by treating video as code: versionable, diffable, reproducible.

The shift mirrors what happened with design. Design moved from GUI-only tools (Photoshop) to code-friendly formats (CSS, HTML, design tokens). Video is making the same transition. The timing matters because agents can now produce video artifacts the same way they produce code and design --- by writing text that compiles to frames.

Three use cases drive agent-driven video production in practice. First, product marketing: a startup needs a 60-second product intro, and the designer writes it once with code rather than editing frames manually. Second, personalized video at scale: a SaaS company generates thousands of onboarding videos, each with the customer's name and usage data. Third, design documentation: a team produces animated walkthroughs of UI flows, generated directly from the design system tokens covered in Chapter 08.

Two approaches have emerged. Remotion uses React components as the authoring format. Hyperframes uses HTML + GSAP. The philosophical difference sounds minor. In practice, it shapes everything: how you author, how you preview, how you debug, how you render, and what your license allows.

Remotion: React-Based Video Production

Remotion is the established player. Created by Jonny Burger (Remotion AG), 47.1k GitHub stars, 3.3k forks, 622+ releases as of May 2026. The current version is v4.0.462. It's TypeScript-first (74.5%), with PHP for Lambda rendering and Rust for native components. The ecosystem is substantial: 3M npm installs, 800+ pages of documentation, 35+ templates, 8000+ Discord members, and 300+ contributors. Trusted by GitHub, Musixmatch, Wistia, and SoundCloud.

Remotion Studio interface showing a React video composition with a preview player on the left, a composition list panel, timeline controls with frame-by-frame scrubbing, and render parameter settings
Remotion Studio showing a video composition with preview player and timeline controls

The mental model: your video is a React component tree. Each frame is a render of that tree at a specific point in time. The framework handles the mapping between time and component state. If you know React, you know most of Remotion already.

Core primitives:

Primitive Purpose Key Props
<Composition> Registers a renderable video id, fps, durationInFrames, width, height
<Sequence> Time-shifts child components from, durationInFrames
<Series> Plays sequences back-to-back Auto-calculated timing
spring() Physics-based animation mass, damping, stiffness
interpolate() Maps frame number to value range inputRange, outputRange, extrapolateLeft/Right
useCurrentFrame() Hook for current frame number None
useVideoConfig() Hook for video metadata Returns fps, width, height, durationInFrames
// src/Root.tsx — register compositions
import { Composition } from 'remotion';
import { ProductIntro } from './ProductIntro';

export const RemotionRoot: React.FC = () => {
  return (
    <Composition
      id="ProductIntro"
      durationInFrames={150}
      fps={30}
      width={1920}
      height={1080}
      component={ProductIntro}
    />
  );
};
// Frame-dependent rendering
import { AbsoluteFill, useCurrentFrame, interpolate } from 'remotion';

export const ProductIntro = () => {
  const frame = useCurrentFrame();

  const opacity = interpolate(frame, [0, 30], [0, 1], {
    extrapolateRight: 'clamp',
  });

  const scale = interpolate(frame, [0, 30], [0.8, 1], {
    extrapolateRight: 'clamp',
  });

  return (
    <AbsoluteFill style={{
      backgroundColor: 'var(--color-background)',
      justifyContent: 'center',
      alignItems: 'center',
    }}>
      <div style={{ opacity, transform: `scale(${scale})` }}>
        <h1 style={{ fontFamily: 'var(--font-display)' }}>
          Introducing the Future
        </h1>
      </div>
    </AbsoluteFill>
  );
};
// Sequences for timing control
import { Sequence } from 'remotion';

const ProductTrailer = () => {
  return (
    <>
      <Sequence durationInFrames={30}>
        <LogoReveal />
      </Sequence>
      <Sequence from={30} durationInFrames={60}>
        <FeatureShowcase />
      </Sequence>
      <Sequence from={90} durationInFrames={60}>
        <CallToAction />
      </Sequence>
    </>
  );
};

The spring() function deserves a closer look. Unlike CSS transitions with fixed durations, spring animations are physics-based. You set mass, damping, and stiffness, and the framework calculates the animation curve. This produces motion that feels natural --- objects accelerate, overshoot slightly, and settle.

import { spring, useCurrentFrame, useVideoConfig } from 'remotion';

const AnimatedTitle = () => {
  const frame = useCurrentFrame();
  const { fps } = useVideoConfig();

  const scale = spring({
    frame,
    fps,
    config: {
      stiffness: 100,
      damping: 10,
      mass: 1,
    },
  });

  return (
    <div style={{
      transform: `scale(${scale})`,
      fontFamily: 'var(--font-display)',
    }}>
      Hello World
    </div>
  );
};

Sequences can nest. A sequence inside another sequence is offset by the parent's from value. This composition pattern maps naturally to how real videos are structured: acts contain scenes, scenes contain shots, shots contain elements.

Remotion Studio provides a browser-based preview with timeline scrubbing, a props editor, and a render button. You can deploy Studio to the cloud for non-technical team members to preview and render videos without touching code.

The rendering targets are extensive:

$ npx remotion render ProductIntro                # local MP4
$ npx remotion render --sequence                  # image sequence
$ npx remotion render --output custom.mp4         # custom output path
$ npx remotion render --codec prores              # ProRes for editing
$ npx remotion render --frames=0-29               # render specific range

Beyond local CLI rendering, Remotion supports Node.js SSR API (renderMedia()), AWS Lambda for distributed rendering, GitHub Actions for CI/CD, and Google Cloud Run (alpha as of May 2026). The Lambda integration is mature and production-tested for hyperscale rendering. The SSR API lets you render from any Node.js process, which is particularly useful for agent-driven pipelines.

Setting Up a Remotion Project with Agent Assistance

Quick start:

$ npx create-video@latest

This scaffolds a Remotion project with a src/Root.tsx that registers compositions. Each composition is a React component that receives props and renders frames.

Remotion has first-class agent support. The repository includes .claude/ and .cursor/ directories with CLAUDE.md and AGENTS.md files. This means Claude Code and Cursor understand Remotion's conventions out of the box.

my-video/
├── src/
│   ├── Root.tsx          # registers all compositions
│   ├── ProductIntro.tsx  # composition component
│   └── lib/
│       └── animations.ts # shared animation helpers
├── public/
│   └── assets/           # images, videos, fonts
├── package.json
└── remotion.config.ts

Agents can scaffold the project, write composition components, configure rendering, and even set up Lambda deployment. The Zod schema support on <Composition> enables visual editing of props in Remotion Studio, which agents can configure:

import { Composition } from 'remotion';
import { z } from 'zod';

const productSchema = z.object({
  title: z.string(),
  subtitle: z.string(),
  accentColor: z.string(),
  logoUrl: z.string(),
});

export const RemotionRoot = () => (
  <Composition
    id="ProductIntro"
    component={ProductIntro}
    schema={productSchema}
    defaultProps={{
      title: 'Product Name',
      subtitle: 'Tagline goes here',
      accentColor: '#E94560',
      logoUrl: '/assets/logo.svg',
    }}
    durationInFrames={150}
    fps={30}
    width={1920}
    height={1080}
  />
);

The agent workflow: you describe the video, the agent writes the composition components, you preview in Remotion Studio, you iterate, then you render. The tight integration with design tokens (from Chapter 08) means your video can share the same var(--color-primary) variables as your web UI.

Remotion also provides a Player component for embedding video previews in web applications and a Recorder for screen recording. The Editor Starter is a commercial template for building custom video editing applications on top of Remotion's rendering engine.

Hyperframes: HTML-Native Video Built for Agents

Hyperframes takes a fundamentally different approach. Created by HeyGen, 19k GitHub stars, 1.8k forks, Apache 2.0 license. The tagline is direct: "Write HTML. Render video. Built for agents."

Hyperframes editor showing an HTML timeline with GSAP animation keyframes on multiple layers, a properties panel for the selected keyframe, and a real-time preview of the animated sequence
Hyperframes timeline editor with GSAP keyframes, layer panel, and animation preview

Compositions are plain HTML files with data attributes. No React. No build step. The HTML file is both the render layer and the editable source of truth. This is the same principle that powers Huashu Design (Chapter 07) for prototypes and slide decks --- HTML as the native artifact, not an export target.

The key insight from HeyGen's internal evaluation: LLMs produce more creative output writing HTML+GSAP than React compositions. HTML is lower-friction for language models. No imports to manage. No component lifecycle to reason about. No JSX syntax to get right. The agent writes declarative HTML, adds animation attributes, and the framework handles the rest.

Attribute Purpose
data-composition-id Identifies the root element
data-start Start time in seconds
data-duration Duration in seconds
data-track-index Layer ordering (higher = on top)
data-width / data-height Composition dimensions
data-volume Audio volume (0-1)
<div id="root"
     data-composition-id="product-intro"
     data-start="0"
     data-width="1920"
     data-height="1080">

  <video id="clip-1"
         data-start="0"
         data-duration="5"
         data-track-index="0"
         src="intro.mp4"
         muted playsinline></video>

  <h1 id="title"
      class="clip"
      data-start="1"
      data-duration="4"
      data-track-index="1"
      style="font-size: 72px; color: white;">
    Welcome to Hyperframes
  </h1>

  <audio id="bg-music"
         data-start="0"
         data-duration="5"
         data-track-index="2"
         data-volume="0.5"
         src="music.wav"></audio>
</div>

No React. No JSX. No build step. You write HTML, open it in a browser to preview, and render it to MP4. The data attributes tell the renderer when each element appears, for how long, and in what layer. The track index system is like a video editor's timeline --- elements on higher tracks overlay elements on lower tracks.

The CLI is deliberately non-interactive by default:

$ npx hyperframes init my-video
$ cd my-video
$ npx hyperframes preview         # live preview with hot reload
$ npx hyperframes render          # render to MP4
$ npx hyperframes render --output demo.mp4
$ npx hyperframes add flash-through-white   # add catalog block
$ npx hyperframes lint            # lint composition
$ npx hyperframes doctor          # diagnose issues

The --human-friendly flag enables interactive mode, but the default is flag-driven. This is a deliberate design choice: agents work better with non-interactive CLIs. Every command accepts flags and exits cleanly.

Hyperframes ships with a catalog of 50+ ready-to-use blocks. These are pre-built composition snippets for common patterns: flash-through-white, instagram-follow, title cards, lower thirds, transitions. You add them to your project with npx hyperframes add <block-name> and customize the data attributes. For agents, this means the agent can compose complex videos from building blocks rather than writing every element from scratch.

The Frame Adapter pattern is where Hyperframes' architecture really matters. It supports GSAP, Lottie, CSS animations, Three.js, Anime.js, and Web Animations API. For each adapter, Hyperframes handles deterministic seeking --- it pauses the animation library and scrubs to frame / fps before each capture. This eliminates the wall-clock dependency that plagues other renderers.

My take: Hyperframes' seek-based rendering is technically superior to Remotion's wall-clock approach for animation libraries. GSAP in Remotion races through timelines at wall-clock speed during render, producing mostly-empty frames after the initial frames. Hyperframes pauses GSAP, seeks to the exact time, then captures. If your video uses GSAP heavily, this alone is reason to choose Hyperframes.

Setting Up Hyperframes with AI Agents

The recommended path is skill-based:

$ npx skills add heygen-com/hyperframes

This installs 13 skills that teach the agent framework-specific patterns. Skills register as slash commands in Claude Code: /hyperframes, /hyperframes-cli, /hyperframes-media, /gsap, /lottie, /threejs, and more.

Requirements: Node.js >= 22 and FFmpeg.

For Codex and Cursor, Hyperframes provides dedicated plugins:

# Codex plugin
$ codex plugin marketplace add heygen-com/hyperframes \
    --sparse .codex-plugin --sparse skills --sparse assets

# Claude Code plugin
$ claude --plugin-dir .

Example agent prompts that work well:

"Create a 10-second product intro with a fade-in title."
"Turn this CSV into an animated bar chart race."
"Build a 60-second explainer video with 4 scenes."
"Add background music with ducking when narration plays."

The hyperframes init command installs skills automatically, so you can hand a project to an agent at any point in its lifecycle. Media preprocessing skills handle TTS (via Kokoro), transcription (via Whisper), and background removal (via u2net). The TTS integration is particularly useful for narrated videos --- the agent generates voiceover, measures the duration, and synchronizes visuals to the audio timeline automatically.

There's also a remotion-to-hyperframes skill for migrating React compositions to HTML. Useful if you start with Remotion and want to switch. The migration handles data attribute mapping, sequence-to-track conversion, and GSAP timeline adaptation.

Hyperframes supports two capture modes. BeginFrame mode runs on Linux and is fully deterministic --- no wall-clock dependency. Screenshot mode runs on macOS and Windows, falling back automatically when BeginFrame isn't available. For production CI/CD pipelines, Linux with BeginFrame mode is the recommended setup.

frame.md: Your Design System, Ready for Video

Every brand has a design spec. Colors, typography, spacing, composition rules. These specs are written for the web --- for screens where users scroll, resize, and interact. They are not written for a camera. frame.md bridges this gap. It translates a web-oriented design spec into a video-ready specification that an agent can use to compose branded video without guessing at scale, timing, or motion.

The pipeline:

Horizontal pipeline showing four stages: design.md with brand tokens, frame.md translation with video-specific parameters, DESIGN.md superset combining both, and Hyperframes HTML composition output
The frame.md pipeline: brand design spec translated into video-ready composition parameters

You start with a design.md --- the same format used by Open Design (covered in Chapter 06). frame.md reads your colors, type scale, spacing tokens, and composition rules, then rewrites them for the 16:9 frame. The output is a DESIGN.md superset: all your original design tokens plus video-specific parameters like timing, transitions, track ordering, and camera moves. The agent reads this superset and produces Hyperframes HTML that inherits your brand identity automatically.

The tool lives at hyperframes.dev/design. Paste your design spec, and it returns a frame.md you drop into your project. For teams already using Open Design's DESIGN.md format, this creates a direct path: one spec drives both static output and video output without duplication.

Design Templates

If you don't have an existing design spec, frame.md ships with 10 templates. Each template maps a visual identity to video-appropriate defaults:

TemplateVisual CharacterMotion StyleBest For
Biennale YellowWarm parchment, solar yellow bloom, Instrument SerifSlow fades, generous pausesArt, culture, editorial
BlockFrameThick black borders, hard offset shadows, candy accentsHard cuts, snap transitionsStartups, tech launches
Blue ProfessionalCobalt primary, Space Grotesk display, Inter bodyClean wipes, measured paceEnterprise, B2B, SaaS
Bold PosterShrikhand tilted display, red accent on creamKen Burns zooms, kinetic typeMarketing campaigns
CapsulePill-shaped editorial, cream paper, Bodoni Moda serifFloat animations, soft revealsLifestyle, fashion, food
CartesianMinimal sparse, warm parchment, hairline rulesRestrained, data-driven motionAnalytics, dashboards
CoralBebas Neue uppercase, coral on cream, Inter readingBold entrances, sweep revealsProduct announcements
Creative ModeCream + saturated candy, Archivo Black, JetBrains MonoCharacter stagger, chart fillsDeveloper tools, data viz

Each template comes with fine-tuning controls for palette and typography. Download the frame pack and drop it into your Hyperframes project. The agent reads the template's DESIGN.md and produces video that matches the template's visual identity.

Skeleton Templates for Common Video Types

frame.md also provides skeleton templates that define scene structure, timing, and transition placement for common video formats:

SkeletonFormatDurationScenesTransitions
Social Reel1080x1920 (portrait)15s61 shader at hero reveal, rest hard cuts
Launch Teaser1920x1080 (landscape)25s82-3 shaders at key moments
Product Explainer1920x108045s12Mixed durations, varied transitions
Cinematic Title1920x108060s7Long holds, restrained shaders

These skeletons give the agent a structural starting point. You tell it "use the Launch Teaser skeleton with the BlockFrame template" and the agent produces an 8-scene, 25-second video with thick borders, hard shadows, and snap transitions at the right moments.

Animation Patterns

frame.md defines reusable animation patterns that map to specific use cases. These are copy-paste ready GSAP snippets the agent can drop into any composition:

PatternWhat it doesUse case
Counter animationStats animate from 0 to targetMetrics, KPIs, growth numbers
SVG stroke drawLines and paths draw themselvesDiagrams, flowcharts, data lines
Character staggerLetters enter one by oneHeadlines, logos, titles
Breathing floatSubtle vertical driftLogos, icons, floating elements
Bar chart fillBars grow from bottom sequentiallyData comparisons, benchmarking
Highlight sweepAccent underline sweeps across textFeature callouts, emphasis
Ken BurnsSlow zoom from scale 1 to 1.03Background images, hero shots

For transitions between scenes, 14 shader effects are available, organized by energy level:

EnergyShadersWhen to use
Calmcross-warp-morph, light-leak, domain-warpEditorial, culture, storytelling
Professionalcinematic-zoom, whip-pan, sdf-irisEnterprise, product demos, B2B
Aggressiveglitch, chromatic-split, ridged-burnStartups, launches, hype videos
Etherealgravitational-lens, ripple-waves, swirl-vortexBrand films, ambient, mood
Example: A Product Launch Video with frame.md

A practical walkthrough. You have a SaaS product launching next month. The brand uses the Blue Professional template (cobalt primary, Space Grotesk display, clean and corporate). You want a 25-second launch teaser.

Step 1: Start the project with the template.

npx hyperframes init product-launch --template blue-professional
cd product-launch
npx hyperframes preview

Step 2: Tell the agent what to build.

Create a 25-second product launch teaser using the Blue Professional
template and the Launch Teaser skeleton (8 scenes).

Scenes:
1. Logo fades in on cobalt background (3s)
2. Tagline animates with character stagger (3s)
3. Feature 1: "Real-time collaboration" with icon animation (3s)
4. Feature 2: "AI-powered suggestions" with counter animation (3s)
5. Feature 3: "One-click deploy" with SVG stroke draw (3s)
6. Social proof: "10,000 teams" with bar chart fill (3s)
7. CTA: "Try free today" with highlight sweep (4s)
8. Logo + URL: breathing float on cobalt (3s)

Transitions: cinematic-zoom at scene 3 and scene 7.
All other transitions: hard cut.
Music: subtle, professional, ducks during text reveals.

Step 3: The agent produces the HTML composition.

<div id="stage" data-composition-id="product-launch"
     data-start="0" data-width="1920" data-height="1080" data-duration="25">

  <!-- Scene 1: Logo -->
  <div class="clip" id="s1" data-start="0" data-duration="3" data-track-index="0">
    <div class="scene-content" style="background: var(--cobalt);">
      <img id="logo" src="logo.svg" />
    </div>
  </div>

  <!-- Scene 2: Tagline -->
  <div class="clip" id="s2" data-start="3" data-duration="3" data-track-index="0">
    <div class="scene-content">
      <h1 class="display" style="font-family: 'Space Grotesk';">
        Build faster. Ship smarter.
      </h1>
    </div>
  </div>

  <!-- Scene 3: Feature 1 -->
  <div class="clip" id="s3" data-start="6" data-duration="3" data-track-index="0"
       data-transition="cinematic-zoom" data-transition-duration="0.5">
    <div class="scene-content">
      <img src="collab-icon.svg" />
      <h2>Real-time collaboration</h2>
    </div>
  </div>

  <!-- ... scenes 4-7 follow same pattern ... -->

  <!-- Scene 8: Logo + URL -->
  <div class="clip" id="s8" data-start="22" data-duration="3" data-track-index="0">
    <div class="scene-content" style="background: var(--cobalt);">
      <img id="logo-end" src="logo.svg" />
      <p class="url">example.com</p>
    </div>
  </div>

  <audio data-start="0" data-duration="25" data-track-index="2"
         data-volume="0.3" src="bg-music.wav"></audio>
</div>

Step 4: Review, refine, render.

npx hyperframes preview    # watch it in the browser
npx hyperframes lint       # check for timing issues
npx hyperframes render     # produce the MP4

The agent applied the Blue Professional template's typography (Space Grotesk display, Inter body) and colors (cobalt primary, cream background) automatically. The character stagger on the tagline, the counter animation on "10,000 teams," and the cinematic-zoom transitions at scenes 3 and 7 all came from the frame.md animation patterns. The agent didn't invent the visual language. It composed from the template's vocabulary.

When to Use frame.md

frame.md solves the cold-start problem for agent-generated video. Without it, the agent has to make every visual decision from scratch: what colors to use, what typeface, how fast to animate, when to transition. With frame.md, the brand decisions are already made. The agent's job is composition, not art direction.

This matters for three scenarios:

Scenario 1: A team with an existing design system. You maintain a DESIGN.md (Open Design format) or a Figma-based design system with extracted tokens. You want to produce branded video content without hiring a motion designer. frame.md translates your existing spec into video parameters. The output inherits your brand identity without manual configuration.

Scenario 2: Rapid video iteration. You produce multiple video variants for A/B testing social ads. Each variant needs different copy and different feature callouts but the same brand identity. With frame.md, you change the copy in each composition while the visual system stays consistent. Render ten variants in one batch.

Scenario 3: Agent-driven video at scale. A content team runs 50 product videos per month across multiple brands. Each brand has its own frame.md template. The agent selects the right template based on the product's brand, composes the video from the template's animation patterns, and renders. No human touches the visual identity. Humans review the narrative and the data accuracy.

My take: frame.md is the connective tissue between this chapter and Chapter 08 (Design Systems and Tokens). Your design system now has a path to video output that does not require a separate motion design effort. The agent reads the design spec, the frame.md template translates it, and the Hyperframes engine renders the result. This is the design-as-code principle extended to video: the same tokens, the same rules, the same review process, just a different output format.

From Reference Video to Animation: Agent-Driven Motion Extraction

Motion design often starts from reference. You see an animation on stripe.com, in an Apple keynote, or in a competitor's app, and you want to replicate it. The traditional approach is to describe the motion in words, let the agent interpret those words, and iterate through several failed attempts. The agent's mental model of "smooth ease-out with a slight overshoot" never matches what you actually saw.

A better approach emerged in mid-2026: drop the reference video into an agent session, let the agent analyze the motion, and produce a structured specification you can feed into Remotion, Hyperframes, or MagicPath. The agent does the hard part --- frame extraction, timing analysis, easing identification --- and you get an accurate starting point instead of a vague description.

Pipeline diagram showing the video-to-animation workflow: reference video is analyzed by an agent (Codex or Claude Code) using ffmpeg for frame extraction and LLM for motion understanding, producing a motion specification that feeds into AnimSpec for prompt generation or MagicPath for editable design output, then rendered via Remotion, Hyperframes, or MagicPath Canvas
The video-to-animation pipeline: agents extract motion parameters from reference video, producing structured specs for rendering tools

How the Agent Analyzes Video

The agent does not watch video the way you do. It breaks the video into frames, inspects the differences between consecutive frames, and builds a structured description of what moves, when, and how. Pietro Schirano demonstrated this workflow with Codex: drag a video into the prompt, tell Codex to "recreate these animations," and it analyzes the motion and generates implementation code (source: @skirano, retrieved 2026-06-03).

The process works in three steps:

  1. Frame extraction. The agent uses ffmpeg to split the video into individual frames at the source framerate. A 10-second video at 30fps produces 300 frames. The agent can also sample at lower rates for faster analysis.
  2. Motion analysis. The agent compares consecutive frames to identify what changed. It looks for position shifts (translate), size changes (scale), opacity transitions (fade), rotation, color shifts, and timing. The LLM's visual understanding turns pixel differences into structured motion descriptions.
  3. Specification generation. The agent produces a structured output: element, property, start value, end value, duration, easing function, and delay. This specification is the bridge between the reference video and your rendering tool.

This is not a Codex-specific capability. Any agent with access to ffmpeg and visual understanding can do it. Claude Code, Codex, Cursor, and OpenCode all support the workflow. The key requirement is that the agent can process the video frames, which means either native multimodal support or a preprocessing step that converts frames to images the agent can inspect.

AnimSpec: Video-to-Prompt Extraction

AnimSpec (animspec.com) automates this extraction. You upload a screen recording, choose an output format, and AnimSpec produces a structured prompt your coding agent can implement. It offers 16 output formats covering UI cloning, animation recreation, design token extraction, and UX audits.

The service runs on Google Gemini models: Gemini 2.5 Flash for fast analysis (1 credit), Gemini 3 Flash for balanced (3 credits), and Gemini 3.1 Pro for precise analysis (20 credits). The output is a text prompt, not code --- you paste it into your agent session and the agent writes the implementation.

For the motion design workflow, the relevant formats are:

FormatWhat it producesBest for
Clone UI AnimationStructured motion spec with timing, easing, and CSS/JS implementation codeRecreating specific animations from reference
Clone UI ComponentFull component specification including layout, styles, and interactionsRebuilding an entire UI element with its animations
Extract Design TokensColor, spacing, typography, and animation values as structured tokensBuilding a motion design system from reference
ExportProduction-ready code in React, Vue, or SvelteDirect implementation without agent interpretation

The AnimSpec workflow is: record your screen → upload → choose format → get prompt → paste into agent. It handles the frame extraction and motion analysis that your agent would otherwise need to do manually with ffmpeg.

MagicPath: Agent-to-Design Handoff

MagicPath takes a different approach. Instead of producing a prompt for your coding agent, it provides a shared canvas where external agents (Claude Code, Codex, Cursor) can build editable designs directly. The agent analyzes the video, generates the motion specification, and creates the animation on the MagicPath canvas --- no manual paste step required (source: MagicPath Documentation, retrieved 2026-06-04).

The workflow from Schirano's demonstration: drag a video into Codex, tell it to "recreate these animations in MagicPath," and Codex analyzes the motion, generates the design files, and sends them to the MagicPath canvas. When Schirano was asked about credit costs, his response was: "If you use external agents like in this example, it costs 0 MagicPath credits." The agent does the work; MagicPath receives the output.

Installation is a single command:

npx skills add https://github.com/magicpathai/agent-skills --skill magicpath

After installation, the agent knows how to read from and write to the MagicPath canvas. It can create new designs, modify existing ones, and pull designs from MagicPath into your codebase. The skill works with any external agent: Claude Code, Codex, Cursor, and the Claude mobile app. MagicPath does not need to be open --- the canvas lives in the cloud and the agent communicates with it via API.

For the motion design workflow, the MagicPath skill enables this pattern: your agent analyzes a reference video, extracts the motion parameters, and builds the animation directly on the canvas. You then review it visually, make edits in the visual editor, and export production-ready code when it looks right. The entire loop --- from reference video to editable animation --- runs through one agent session.

Fitting Extraction into the Motion Stack

These tools solve different parts of the same problem. AnimSpec extracts motion from video and produces a prompt. MagicPath receives an agent's output and renders it on an editable canvas. Remotion and Hyperframes render production video from code. They compose into a pipeline:

StageToolInputOutput
1. Extract motionAgent + ffmpeg, or AnimSpecReference videoStructured motion spec / prompt
2. Generate animationAgent (Claude Code, Codex, Cursor)Motion specAnimation code (CSS, GSAP, React)
3. Render videoRemotion or HyperframesAnimation codeMP4, WebM, GIF
3a. Editable designMagicPathAnimation codeCanvas design (visual editing)

You can skip stages depending on your needs. If you want production video, the full pipeline runs extract → generate → render. If you want an editable prototype to share with your team, extract → generate → MagicPath. If you already know what motion you want and just need to code it, skip extraction and go straight to generate → render.

The extraction step is where agent capability matters most. A good agent extracts accurate timing, identifies the correct easing function, and distinguishes between simultaneous and sequential animations. A less capable agent produces vague descriptions like "fades in smoothly" that still require manual interpretation. This is where AnimSpec's structured analysis adds value: it standardizes the extraction so the output quality does not depend on which agent you use.

Practical Workflow: Recreating a Competitor's Onboarding Animation

A concrete example ties the tools together. You see a competitor's onboarding flow with smooth card transitions and want to replicate the motion pattern in your own product.

Step 1: Capture the reference.

Screen-record the competitor's onboarding flow. A 15-second recording at 60fps is sufficient.

Step 2: Extract the motion.

Option A --- use an agent directly:

# In Claude Code or Codex
# Drop the video file into the session

"Analyze this onboarding animation. Extract:
1. Each card transition: start position, end position, duration, easing
2. The stagger delay between sequential elements
3. Any scale or opacity changes during transitions
4. The overall timeline: when does each animation start and end

Output as structured JSON with CSS animation equivalents."

Option B --- use AnimSpec:

# Upload to animspec.com, select "Clone UI Animation"
# Paste the generated prompt into your agent session

Step 3: Generate the animation.

For Remotion:

"Using the motion spec I just extracted, create a Remotion composition
for a 3-card onboarding carousel. Each card slides in from the right
with the same timing and easing as the reference. Use spring()
for physics-based motion. Total duration: 180 frames at 30fps."

For Hyperframes:

"Using the motion spec, create a Hyperframes scene with GSAP timelines
for the card transitions. Match the reference easing values exactly.
Use our frame.md template for consistent styling."

For MagicPath:

"Using the MagicPath skill, recreate the onboarding animation from
this reference video in my open project. Match the card transitions,
timing, and easing."

Step 4: Iterate and render.

Review the output. If the timing is off, tell the agent which element to adjust: "The second card enters 200ms too early. Push it back." If the easing feels wrong: "Change the ease-out to cubic-bezier(0.16, 1, 0.3, 1)." The structured spec from step 2 gives you precise control over individual parameters.

My take: Video-to-animation extraction is the missing piece in the motion design stack. The current workflow for reference-driven motion is broken: you describe what you see in words, the agent interprets those words, and you iterate on a foundation of ambiguity. Extraction tools fix this by starting from the actual motion parameters instead of a verbal description. AnimSpec standardizes the extraction. MagicPath gives the agent a direct path to editable output. Neither replaces Remotion or Hyperframes for production rendering --- they make those tools more effective by giving them accurate input. Expect this category to grow fast. As agents get better at visual analysis, the extraction step will become a standard part of every motion design workflow.

Animation Runtimes: GSAP, CSS, Lottie, Three.js

The animation runtime is the most consequential technical difference between Remotion and Hyperframes. It affects how every library behaves during rendering. Understanding this difference is essential for choosing the right tool.

Side-by-side comparison showing GSAP JavaScript animation code and CSS keyframe animation code producing identical visual results for an interactive button hover effect
GSAP and CSS animation approaches compared for the same interactive element

GSAP: The primary animation runtime for Hyperframes. Hyperframes pauses GSAP and seeks it to frame / fps before each capture. In Remotion, GSAP's internal performance.now() ticker races through the timeline at wall-clock speed during render. The result: mostly-empty frames after the initial frames. This is not a minor issue. It makes GSAP effectively unusable in Remotion for any non-trivial animation.

// GSAP in Hyperframes — deterministic seeking
const tl = gsap.timeline({ paused: true });
tl.from("#title", { opacity: 0, y: 50, duration: 1 })
  .to("#title", { opacity: 1, y: 0, duration: 0.5 });
// Hyperframes pauses this timeline and seeks to frame/fps before each capture

CSS Animations: Both tools support CSS. Hyperframes uses the Web Animations API adapter for frame-accurate seeking. Remotion renders CSS animations frame-by-frame but can struggle with complex keyframe sequences that depend on computed styles.

Lottie: Hyperframes supports Lottie via window.__hfLottie registration for deterministic seeking. Remotion requires the @remotion/lottie package, which provides a springify utility but adds a dependency.

Three.js: Hyperframes renders from hf-seek events and window.__hfThreeTime instead of wall-clock time. Remotion has @remotion/three for React Three Fiber integration, which works well if you're already in the React ecosystem. The React Three Fiber integration is one of Remotion's genuine strengths --- you get the full React component model for 3D scenes.

Anime.js: Hyperframes registers on window.__hfAnime for deterministic seeking. Remotion has no native Anime.js adapter.

Runtime Remotion Hyperframes
GSAP Wall-clock issues during render Seekable, frame-accurate
CSS Animations Frame-by-frame render Web Animations API adapter
Lottie @remotion/lottie package window.__hfLottie registration
Three.js @remotion/three (React Three Fiber) hf-seek events
Anime.js No native adapter window.__hfAnime
WAAPI Not documented document.getAnimations() seeking

The pattern is clear. Hyperframes supports more animation runtimes with deterministic seeking. Remotion supports fewer runtimes but integrates deeply with the React ecosystem. If your video relies on GSAP, the choice is straightforward. If your video is pure React with spring animations, Remotion is the natural fit.

Rendering, Preview, and the Dev Loop

The development loop differs significantly between the two tools. This affects how fast you can iterate, which matters more in agent workflows than in traditional video production.

Terminal output from a Remotion render command showing a progress bar rendering frame 120 of 300 at 30fps with estimated completion time and final output file size in megabytes
Remotion render progress showing frame-by-frame video encoding in the terminal

Remotion Studio: A browser-based application with a sidebar, timeline, props editor, and render button. You edit code, the studio live-reloads, you scrub the timeline to check timing. Deployable to the cloud for team access. The props editor is particularly useful with Zod schemas --- you can adjust text, colors, and timing visually without touching code. This is Remotion's strongest UX advantage.

Hyperframes Preview: npx hyperframes preview opens a live preview in the browser with instant hot reload. No build step. Edit the HTML, save, see changes. Simpler than Remotion Studio but less feature-rich --- no timeline scrubbing, no props editor. The lack of a build step is the key advantage. In agent workflows, every second of iteration latency compounds. Hyperframes eliminates the webpack compilation that Remotion requires after each code change.

Remotion Render: CLI, Node.js SSR API (renderMedia()), AWS Lambda (distributed and production-tested), GitHub Actions, Google Cloud Run (alpha). The Lambda story is the clear leader in distributed rendering. For high-volume video production (thousands of personalized videos), Lambda scales horizontally. The SSR API lets you render from any Node.js process --- useful for agent pipelines that need to produce video as part of a larger workflow.

Hyperframes Render: npx hyperframes render --output output.mp4. Single-machine today. Docker support exists for containerized workflows. The stateless architecture doesn't block future distributed rendering, but as of May 2026, it hasn't shipped. This is Hyperframes' most significant gap compared to Remotion.

$ npx remotion render ProductIntro                  # Remotion: local render
$ npx remotion lambda render ProductIntro           # Remotion: distributed

$ npx hyperframes render                           # Hyperframes: local render
$ npx hyperframes render --output final.mp4        # Hyperframes: custom output

Remotion supports multiple output codecs: H.264, H.265, VP8, VP9, ProRes, AV1. Output formats include MP4, WebM, audio-only, image sequences, still images, GIF, and transparent video overlays. Hyperframes supports MP4 output and HDR via a two-pass compositing pipeline. For most use cases, MP4 covers what you need.

Common rendering issues and their fixes:

Symptom Cause Fix
Blank frames in Remotion with GSAP Wall-clock timing races ahead Replace GSAP with interpolate() or spring()
Blurry text in Hyperframes Missing device pixel ratio Set data-scale attribute or render at 2x
Audio sync drift Variable frame timing Use fixed FPS; avoid dynamic duration calculations
Long render times locally Single-threaded capture Remotion: use Lambda. Hyperframes: use Docker parallelism.

Remotion vs Hyperframes: An Honest Comparison

I've used both. Here's where each wins and loses.

A single frame from an AI-generated course video showing branded educational content with animated text overlays, smooth transitions, and design system-consistent color palette and typography
Course video frame generated by Hyperframes with branded educational animation and text overlays
Dimension Remotion Hyperframes
Authoring format React components (TSX) HTML + CSS + GSAP
Build step Required (webpack/bundler) None --- index.html plays as-is
GSAP support Wall-clock during render (broken) Seekable, frame-accurate
Arbitrary HTML/CSS Must rewrite as JSX Paste and animate
Distributed rendering Lambda, production-ready Single-machine today
HDR output Not documented Supported (two-pass)
Visual editor Harder (code + build step) Native (same DOM is editable)
License Source-available, custom Remotion License Apache 2.0 (OSI-approved)
Commercial pricing Free for 3 people, paid above Free at any scale
GitHub stars 47.1k 19k
Maturity 622+ releases, v4.x 137 releases, v0.6.x
Agent integration CLAUDE.md, AGENTS.md in repo 13 skills, plugins for 4 agents
React ecosystem Full reuse No React dependency

The licensing difference deserves attention. Remotion is source-available under a custom license. Free for teams of 3 or fewer. Above that, you need a Company License: $25/month per seat (Creator tier), $100/month minimum with per-render pricing (Automator tier), or $500+/month (Enterprise). If you're rendering thousands of personalized videos via Lambda, the Automator pricing adds up.

Hyperframes is Apache 2.0. Free at any scale. No per-seat pricing. No per-render fees. For companies producing video at volume, this is a significant cost difference.

My take: If you're already in the React ecosystem and need distributed rendering at scale, Remotion is the right choice today. The Lambda story is mature and proven. If you're building agent-first video workflows, care about GSAP accuracy, or want to avoid license fees at scale, Hyperframes is the better bet. The licensing model alone tips the decision for many teams. I started with Remotion because of the ecosystem maturity. I switched to Hyperframes for agent workflows --- the build-step removal alone saves significant iteration time.

The practical decision framework:

  • Choose Remotion if: your team writes React daily, you need Lambda-scale distributed rendering, you want the mature ecosystem with 800+ pages of docs, or you're building a video editor product (Editor Starter).
  • Choose Hyperframes if: your primary author is an AI agent, you use GSAP for animations, you need HDR output, you want to paste arbitrary HTML and animate it, or license fees at scale are a concern.
  • Choose based on animation runtime: if your video relies heavily on GSAP, Lottie, or Three.js animations, Hyperframes' seek-based rendering produces more reliable output than Remotion's wall-clock approach.

Both tools support design token integration from your design system (Chapter 08). Whether you write var(--color-primary) in a TSX component or an HTML data-attribute div, the token flows through correctly. This means your videos share the same visual language as your web UI, your prototypes, and your slide decks.

The video export pipeline from Huashu Design (Chapter 07) sits adjacent to both tools. Huashu produces HTML animations and exports them to MP4/GIF using its own render pipeline (25fps base + 60fps interpolation). For design-focused motion --- product intros, UI walkthroughs, animated infographics --- Huashu's pipeline is simpler and faster to set up. For complex video compositions with multiple clips, audio tracks, and transitions, Remotion or Hyperframes are the right tools.

html-video: a CapCut-style editing layer on top of Hyperframes

Hyperframes is the engine an agent writes against: an HTML-native runtime that turns markup into rendered video frames. html-video is the dashboard built on top of it. It is an open-source (Apache 2.0) editing layer that sits on the Hyperframes runtime, and the pitch is blunt: a CapCut for agents that write HTML. You hand it a website link, a file, or an article, and the agent generates an MP4 from one of 20-plus style templates aimed at product promos and explainers.

What earns it a place in a video workflow is the editing model. Authoring against Hyperframes alone is render-then-look: you write markup, render the whole timeline, then watch it back to find what is wrong. html-video adds paginated preview and frame-level text editing, so you change a caption on page three and see it without re-rendering the whole timeline. That tightens the critique-revision step that pure CLI rendering leaves slow.

Two integrations matter. It auto-detects six local agent CLIs — Claude Code, Codex, Cursor, Hermes among them — and lets you switch in the top bar with no extra API keys. And it wires in MiniMax for narration and background music generated from the video's own content, so audio stops being a separate manual step.

This is a layering pattern: the framework gives the agent a deterministic, diffable target, and html-video adds the human-facing editing surface and template library on top. You do not have to choose. The agent can author against Hyperframes directly, or you can sit in html-video to scrub, edit text inline, and pick a template — the artifact underneath stays HTML either way.

html-video sits as an editing layer between the agent and the Hyperframes render engine.
Figure: html-video as an editing layer over the Hyperframes runtime.

Next: All of these design tools connect to agents through MCP servers. Chapter 10 covers the Model Context Protocol in depth --- how to configure MCP servers for Figma, Paper, Pencil, OpenPencil, and other tools, and how to chain them together for multi-tool workflows.

Next Chapter

MCP Integrations

Connecting Figma, Miro, MagicPattern, and other tools via MCP

Continue Reading

©2026 Mehran Mozaffari. All rights reserved.