What happens to my video after processing?

Deleted immediately. We never store your video files. Only the text output is saved in your account.

What are the 8 analysis modes?

Context (extracts everything into structured text), Editor (frame-by-frame breakdown for AI video editors), Creator Analysis (content performance scoring), Ad Analysis (ad effectiveness for media buyers), E-commerce (product video conversion analysis), Training (pedagogical effectiveness for L&D), UGC Vetting (creator evaluation for brand partnerships), and Competitor Intelligence (competitive threat scoring).

How accurate is the extraction?

VidContext uses Gemini 3.1 Pro at 2 frames per second with high resolution. It captures on-screen text, brand logos, audio cues, and scene transitions that most humans miss on a first watch.

Is this just a wrapper around Gemini?

Gemini handles the vision layer. The 8 analysis modes, scoring frameworks, structured output format, and the full extraction pipeline are proprietary. Raw Gemini gives you a paragraph. VidContext gives you expert-scored analysis with actionable recommendations.

What about compliance and data privacy?

Videos are deleted immediately after processing. No video storage, no retention. All traffic over HTTPS. API key authentication on every request. We never log or store video content.

Video to JSON API

Video is the most unstructured data format there is. VidContext converts it into clean, predictable JSON — metadata, transcripts, scene descriptions, on-screen text, brand detection, and audio analysis. One API call, one structured response. Ready for your database, dashboard, AI model, or automation pipeline.

Try free — no account needed View API docs

Why convert video to JSON?

Databases cannot store a video and make it queryable. Dashboards cannot chart a video. AI models cannot reason about raw video frames efficiently. Automation workflows cannot branch on what happens in a video. Everything in your stack works with structured data — and video is not structured.

VidContext closes this gap. It watches the entire video, extracts every meaningful signal — visual, audio, textual — and packages it into a JSON document with a consistent, predictable schema. The output slots into any system that reads JSON, which is effectively every system.

What the JSON includes

Complete video structure

Metadata, transcript, visual scenes, on-screen text, brands, and audio analysis — all in a single JSON document. No assembly required.

Consistent schema

Same field names, same data types, same nesting structure for every video. Write your parser once and it works forever.

8 analysis modes

Choose Context, Editor, Analysis, Ad, E-commerce, Training, UGC Vetting, or Competitor Intelligence. Each adds mode-specific scoring to the base output.

Database-ready output

The JSON maps cleanly to database tables, document stores, and data warehouses. Insert directly without transformation.

Automation-friendly

Pipe the JSON into n8n, Make, Zapier, or any workflow tool. The structured format means each field is individually addressable in automation steps.

AI-agent compatible

Feed the JSON directly into LLM context windows. The structured format gives AI models precise, parseable information to reason about.

One call, full structure

curl -X POST https://api.vidcontext.com/v1/analyze \
  -H "X-API-Key: vc_your_key_here" \
  -F "file=@product-demo.mp4" \
  -F "output_format=context"

{
  "metadata": {
    "duration_seconds": 187,
    "resolution": "1920x1080",
    "format": "mp4"
  },
  "transcript": [
    {
      "timestamp": "0:00-0:08",
      "speaker": "Narrator",
      "text": "Meet the all-new TaskFlow — project management
        that actually keeps up with your team."
    },
    {
      "timestamp": "0:08-0:22",
      "speaker": "Narrator",
      "text": "Drag, drop, done. Every task, every deadline,
        every dependency — visible in one view."
    }
  ],
  "visual_scenes": [
    {
      "timestamp": "0:00-0:08",
      "description": "Animated logo reveal on dark background.
        TaskFlow logo assembles from geometric shapes.",
      "on_screen_text": ["TaskFlow", "Work, simplified."],
      "audio": "Upbeat electronic intro, synth pad"
    },
    {
      "timestamp": "0:08-0:22",
      "description": "Screen recording of the TaskFlow dashboard.
        Cursor drags a task card from 'To Do' to 'In Progress'.
        Gantt chart updates in real time on the right panel.",
      "on_screen_text": [
        "Sprint 14 — Design System Update",
        "Due: March 28",
        "3 subtasks remaining"
      ],
      "audio": "Narrator continues over soft background music"
    }
  ],
  "brands_detected": ["TaskFlow"],
  "audio_analysis": {
    "music": "Electronic, upbeat, modern",
    "has_speech": true,
    "speaker_count": 1
  }
}

Response shortened for readability. Actual output includes all scenes, full transcript, and complete brand and audio analysis.

Where the JSON goes

Databases and data warehouses

Insert video analysis directly into PostgreSQL, MongoDB, BigQuery, or Supabase. Query across thousands of analyzed videos using standard SQL or document queries.

Dashboards and reporting

Feed the structured data into dashboards that track video content metrics — brand mentions, sentiment, topic distribution, and content patterns over time.

Automation workflows

Use the JSON fields in n8n, Make, or Zapier workflows. Route videos based on detected brands, trigger alerts when specific text appears, or auto-categorize content by scene analysis.

AI model pipelines

Inject structured video data into LLM context windows, fine-tuning datasets, or RAG pipelines. The JSON format gives models precise, parseable information rather than unstructured descriptions.

Frequently asked questions

What does the JSON structure look like?

The response includes top-level fields for metadata (duration, resolution, format), transcript (timestamped speech), visual_scenes (array of scene objects with descriptions, on-screen text, and audio), brands_detected, and mode-specific scoring when using analysis modes. Every field uses consistent naming and types across all videos.

Is the schema consistent across different videos?

Yes. Every response follows the same JSON schema regardless of the input video. The structure, field names, and data types are predictable. This means your parsing code works for a 30-second ad the same way it works for a 10-minute tutorial. No special handling needed per video type.

Can I customize the output format?

VidContext offers 8 output modes — Context, Editor, Ad Analysis, Creator Analysis, E-commerce, Training, UGC Vetting, and Competitor Intelligence. Each mode returns the same base structure (scenes, transcript, OCR, brands) plus mode-specific scoring frameworks. Choose the mode that matches your use case via the mode parameter.

What about large or long videos?

VidContext handles videos of any length. Processing time scales linearly — a 3-minute video takes about 50 seconds, a 10-minute video takes proportionally longer. The JSON output grows with video length but maintains the same schema structure. There is no maximum duration limit.

Can I stream the response instead of waiting for the full JSON?

Currently, VidContext returns the complete JSON response when processing is finished. For long videos, you can poll the status endpoint. Streaming support for progressive JSON delivery is on the roadmap.

Ready to start?

5 free analyses without an account. 20 credits on signup. No credit card required.

Try VidContext free