Video Analysis API

Extract transcripts, visual scene breakdowns, on-screen text, brand detection, and audio analysis from any video with a single API call. Most competitors require 5-6 separate requests. VidContext returns everything in one structured JSON response, in under 60 seconds.

What is a video analysis API?

A video analysis API takes a video as input and returns structured data about everything inside it. Speech is transcribed. Visual scenes are described with timestamps. Text that appears on screen is extracted. Brands and logos are identified. Audio cues — music, sound effects, ambient noise — are cataloged.

The alternative is building this yourself: stitching together ffmpeg for frame extraction, Whisper for transcription, an OCR model for on-screen text, a vision model for scene descriptions, and custom logic to combine it all. A video analysis API replaces that entire pipeline with one HTTP request.

What you get from every request

Timestamped transcript

Full speech-to-text with speaker context and timing for every segment of dialogue.

Visual scene breakdown

Frame-by-frame descriptions of what appears on screen — people, objects, settings, actions, transitions.

On-screen text extraction

Every piece of text visible in the video: titles, captions, URLs, phone numbers, watermarks.

Brand and logo detection

Identifies brands, products, and logos that appear visually or are mentioned in speech.

Audio analysis

Music, sound effects, ambient noise, and tone classification beyond just speech content.

Scored analysis modes

8 specialized modes with expert scoring frameworks for ads, e-commerce, creators, and more.

One request, complete analysis

Send a video URL or file. Get structured JSON back in about 50 seconds for a 3-minute video.

curl -X POST https://api.vidcontext.com/v1/analyze \
  -H "X-API-Key: vc_your_key" \
  -F "source=https://example.com/product-demo.mp4" \
  -F "output_format=context"
{
  "metadata": {
    "duration": "3:12",
    "resolution": "1920x1080",
    "format": "mp4"
  },
  "transcript": [
    { "time": "0:00-0:15", "text": "Welcome to our product walkthrough..." }
  ],
  "visual_scenes": [
    { "time": "0:00-0:08", "description": "Speaker at desk, laptop open..." }
  ],
  "on_screen_text": ["Try free at vidcontext.com", "API Documentation"],
  "brands_detected": ["VidContext", "Chrome"],
  "audio": { "music": "upbeat corporate", "effects": ["keyboard typing"] }
}

Common use cases

  • AI agent workflows — Give your AI agent the ability to watch and understand video. Install the MCP server with pip install vidcontext-mcp and your agent can analyze videos autonomously.
  • Marketing analytics — Analyze competitor ads, track brand mentions across video content, and score ad effectiveness with built-in frameworks.
  • Content moderation — Screen user-generated video for brand safety, compliance issues, or content policy violations at scale.
  • E-commerce optimization — Analyze product videos for conversion factors, messaging clarity, and comparison against top-performing listings.

Frequently asked questions

What is a video analysis API?

A video analysis API is a service that accepts a video file or URL and returns structured data about its contents. This includes transcripts, visual scene descriptions, on-screen text, detected brands, audio analysis, and more. Instead of building your own video processing pipeline, you send one request and get back machine-readable results.

How fast does VidContext process videos?

A typical 3-minute video is processed in about 50 seconds. VidContext uses Gemini 3.1 Pro at 2 frames per second with high resolution, which balances thoroughness with speed. Longer videos scale linearly — a 10-minute video takes around 3 minutes.

What video formats are supported?

VidContext supports all common video formats including MP4, MOV, AVI, WebM, and MKV. You can send a direct URL to a video file or upload the file directly. Videos are deleted immediately after processing.

How does pricing work?

1 credit = 1 minute of video, rounded up. All features are included in every request — no per-feature billing. You get 5 free analyses without an account and 20 credits when you sign up. Credit packs start at 50 for $10. For unlimited processing, Pro (BYOK) is $29/mo with your own Gemini key.

Is my video stored after processing?

No. VidContext deletes your video file immediately after processing is complete. No frames, thumbnails, or copies are retained. You receive the structured analysis output, and the original video is gone from our servers.

See how we compareFull API documentation

Ready to start?

5 free analyses without an account. 20 credits on signup. No credit card required.

Try free