Video to JSON API
Video is the most unstructured data format there is. VidContext converts it into clean, predictable JSON — metadata, transcripts, scene descriptions, on-screen text, brand detection, and audio analysis. One API call, one structured response. Ready for your database, dashboard, AI model, or automation pipeline.
Why convert video to JSON?
Databases cannot store a video and make it queryable. Dashboards cannot chart a video. AI models cannot reason about raw video frames efficiently. Automation workflows cannot branch on what happens in a video. Everything in your stack works with structured data — and video is not structured.
VidContext closes this gap. It watches the entire video, extracts every meaningful signal — visual, audio, textual — and packages it into a JSON document with a consistent, predictable schema. The output slots into any system that reads JSON, which is effectively every system.
What the JSON includes
Complete video structure
Metadata, transcript, visual scenes, on-screen text, brands, and audio analysis — all in a single JSON document. No assembly required.
Consistent schema
Same field names, same data types, same nesting structure for every video. Write your parser once and it works forever.
8 analysis modes
Choose Context, Editor, Analysis, Ad, E-commerce, Training, UGC Vetting, or Competitor Intelligence. Each adds mode-specific scoring to the base output.
Database-ready output
The JSON maps cleanly to database tables, document stores, and data warehouses. Insert directly without transformation.
Automation-friendly
Pipe the JSON into n8n, Make, Zapier, or any workflow tool. The structured format means each field is individually addressable in automation steps.
AI-agent compatible
Feed the JSON directly into LLM context windows. The structured format gives AI models precise, parseable information to reason about.
One call, full structure
curl -X POST https://api.vidcontext.com/v1/analyze \ -H "X-API-Key: vc_your_key_here" \ -F "file=@product-demo.mp4" \ -F "output_format=context"
{
"metadata": {
"duration_seconds": 187,
"resolution": "1920x1080",
"format": "mp4"
},
"transcript": [
{
"timestamp": "0:00-0:08",
"speaker": "Narrator",
"text": "Meet the all-new TaskFlow — project management
that actually keeps up with your team."
},
{
"timestamp": "0:08-0:22",
"speaker": "Narrator",
"text": "Drag, drop, done. Every task, every deadline,
every dependency — visible in one view."
}
],
"visual_scenes": [
{
"timestamp": "0:00-0:08",
"description": "Animated logo reveal on dark background.
TaskFlow logo assembles from geometric shapes.",
"on_screen_text": ["TaskFlow", "Work, simplified."],
"audio": "Upbeat electronic intro, synth pad"
},
{
"timestamp": "0:08-0:22",
"description": "Screen recording of the TaskFlow dashboard.
Cursor drags a task card from 'To Do' to 'In Progress'.
Gantt chart updates in real time on the right panel.",
"on_screen_text": [
"Sprint 14 — Design System Update",
"Due: March 28",
"3 subtasks remaining"
],
"audio": "Narrator continues over soft background music"
}
],
"brands_detected": ["TaskFlow"],
"audio_analysis": {
"music": "Electronic, upbeat, modern",
"has_speech": true,
"speaker_count": 1
}
}Response shortened for readability. Actual output includes all scenes, full transcript, and complete brand and audio analysis.
Where the JSON goes
Databases and data warehouses
Insert video analysis directly into PostgreSQL, MongoDB, BigQuery, or Supabase. Query across thousands of analyzed videos using standard SQL or document queries.
Dashboards and reporting
Feed the structured data into dashboards that track video content metrics — brand mentions, sentiment, topic distribution, and content patterns over time.
Automation workflows
Use the JSON fields in n8n, Make, or Zapier workflows. Route videos based on detected brands, trigger alerts when specific text appears, or auto-categorize content by scene analysis.
AI model pipelines
Inject structured video data into LLM context windows, fine-tuning datasets, or RAG pipelines. The JSON format gives models precise, parseable information rather than unstructured descriptions.
Frequently asked questions
What does the JSON structure look like?
The response includes top-level fields for metadata (duration, resolution, format), transcript (timestamped speech), visual_scenes (array of scene objects with descriptions, on-screen text, and audio), brands_detected, and mode-specific scoring when using analysis modes. Every field uses consistent naming and types across all videos.
Is the schema consistent across different videos?
Yes. Every response follows the same JSON schema regardless of the input video. The structure, field names, and data types are predictable. This means your parsing code works for a 30-second ad the same way it works for a 10-minute tutorial. No special handling needed per video type.
Can I customize the output format?
VidContext offers 8 output modes — Context, Editor, Ad Analysis, Creator Analysis, E-commerce, Training, UGC Vetting, and Competitor Intelligence. Each mode returns the same base structure (scenes, transcript, OCR, brands) plus mode-specific scoring frameworks. Choose the mode that matches your use case via the mode parameter.
What about large or long videos?
VidContext handles videos of any length. Processing time scales linearly — a 3-minute video takes about 50 seconds, a 10-minute video takes proportionally longer. The JSON output grows with video length but maintains the same schema structure. There is no maximum duration limit.
Can I stream the response instead of waiting for the full JSON?
Currently, VidContext returns the complete JSON response when processing is finished. For long videos, you can poll the status endpoint. Streaming support for progressive JSON delivery is on the roadmap.
Ready to start?
5 free analyses without an account. 20 credits on signup. No credit card required.
Try VidContext free