Skip to main content
Status: Brainstorm Phase: Phase 5 (AI Media) | Tier: Studio

Overview

AI Music Videos is the crown jewel of the AI media strategy. It combines the output of two separate pipelines — Session Music (audio) and Video Reports (visual) — into a single synchronized audiovisual experience. The result is a 60-90 second music video that visualizes your cannabis session with a custom soundtrack generated from the terpene profiles of the strains you purchased. Think about what that means in practice. You buy Blue Dream and OG Kush from a dispensary. You upload the receipt. High IQ generates a research report, and then it generates a music video. The video opens with abstract blue and green visuals — Myrcene’s warm amber blending with Pinene’s crisp forest tones. The soundtrack is a lo-fi hip-hop beat at 78 BPM because your dominant High Family is Relaxed Highs. The music and visuals evolve together over 90 seconds, building and settling in patterns that mirror the expected onset and peak of the strain combination. At the end, your name and a branded tag appear. You share it to your Instagram Story. “My cannabis app just made me a music video” is a sentence that makes people stop scrolling. It is absurd in the best possible way. Nobody expects a cannabis tracker to produce audiovisual content. That unexpectedness is the entire point — it reframes the app from a utility to an experience, and the experience is genuinely impressive because the content is genuinely personalized. This is not a generic “chill vibes” video with a stock beat. Every frame and every note is derived from your specific data. The tradeoff is that music videos are the most expensive AI media format to generate, the slowest to produce, and the most complex to synchronize. This is why it sits at the Studio tier and why it should be built last, after the Session Music and Video Reports pipelines are stable and well-understood. The music video is a composition of those two outputs — it does not need its own generation infrastructure, just a synchronization and rendering layer.

What It Does

Two Contexts

A music video for a single strain. The soundtrack is generated from the strain’s terpene profile (per the Session Music mapping). The visuals are generated from the strain’s data — abstract representations of its terpene colors, High Family aesthetic, and strain personality. The video runs 60-90 seconds.This format works as a “signature video” for a strain in the user’s collection — something they can revisit and share. “This is what OG Kush sounds and looks like according to my cannabis intelligence app.”
A music video for a complete order. The soundtrack blends the terpene profiles of ALL strains in the order. The visuals transition between strain-specific aesthetics, mirroring the musical transitions. If the order has three strains, the video has three visual movements that blend into each other, each with its own color palette and motion style.This is the more impressive format because it captures the full complexity of a multi-strain purchase. The video feels like a journey — different phases representing different strains, all held together by a cohesive musical throughline.

Video Structure (90-second Report Music Video)

TimeVisualAudioNarrative Purpose
0:00-0:05Fade in. User’s name. Order date. Dark background.Silence, then a single sustained tone.Personalized opening. Builds anticipation.
0:05-0:15Abstract visual representing Strain 1. Dominant terpene colors animate into frame.Musical motif for Strain 1’s terpene profile. Rhythm begins.Establish the first strain’s character.
0:15-0:30Visual evolves and intensifies. High Family aesthetic fully expressed. Data points appear subtly (terpene names, percentages).Full musical expression of Strain 1. BPM matches High Family genre.Peak expression of Strain 1.
0:30-0:40Transition. Colors shift. Visual elements from Strain 1 dissolve into Strain 2’s palette.Musical crossfade. New harmonic elements enter as Strain 1’s fade.The blend moment — strains interact.
0:40-0:55Abstract visual representing Strain 2. Different color palette, different motion style.Musical motif for Strain 2. Rhythm adapts to second High Family.Establish contrast or complement.
0:55-1:05Both visual styles merge. A combined aesthetic emerges.Musical elements from both strains coexist. Harmonic resolution.The combined experience — what using both feels like.
1:05-1:15Fade. Summary frame: strain names, High Families, key terpenes.Music decrescendos.Denouement. Data summary.
1:15-1:25Branded outro. High IQ logo. Deep link QR code.Final note. Silence.CTA. Attribution.

User Value

This is the format that makes people say “wait, your app does WHAT?” out loud. The music video is not the most useful AI media output — podcasts deliver more information, music accompanies sessions better, social posts drive more shares. But the music video is the most emotionally impressive, the most shareable as a novelty, and the strongest conversion tool for turning observers into subscribers. It is the demo reel for the entire AI media platform.

Technical Approach

Composition Architecture

AI Music Videos does not have its own generation pipeline. It composes outputs from two existing pipelines and adds a synchronization layer.
┌─────────────────────┐     ┌─────────────────────┐
│   Session Music      │     │   Video Reports      │
│   Pipeline           │     │   Pipeline           │
│                     │     │                     │
│ Terpene data ──────►│     │ Terpene data ──────►│
│ High Family  ──────►│     │ High Family  ──────►│
│ Strain context ────►│     │ Report data  ──────►│
│                     │     │                     │
│ Output: Audio track │     │ Output: Video track │
└────────┬────────────┘     └────────┬────────────┘
         │                           │
         └──────────┬────────────────┘


         ┌─────────────────────┐
         │   Synchronization    │
         │   Layer              │
         │                     │
         │ Audio waveform      │
         │ analysis ──────────►│
         │ Beat detection ────►│
         │ Visual timing ─────►│
         │                     │
         │ Output: Synced MP4  │
         └─────────────────────┘

Synchronization

The key technical challenge unique to music videos is synchronization — making the visuals feel connected to the music rather than randomly overlaid. Two approaches: Approach 1: Beat-Reactive Visuals Analyze the generated audio track for beat positions, frequency spectrum, and energy contour. Map visual parameters to audio features in real-time during rendering:
  • Beat → visual pulse (brightness flash, scale bump)
  • Bass energy → warm color intensity
  • High-frequency energy → brightness and detail
  • Energy contour → visual complexity over time
This is technically simpler (post-hoc analysis of audio) and produces a “music visualizer” feel. Approach 2: Shared Score Generate a “score” first — a timing document that specifies the musical and visual structure simultaneously. Both the music and video pipelines receive the score as input, ensuring they are structurally aligned from generation. Transitions happen at the same moments in both tracks because both tracks were planned together. This produces tighter synchronization but requires the music and video pipelines to accept external timing constraints, which may not be supported by the generation APIs. Recommendation: Start with Approach 1 (beat-reactive) for v1. Explore Approach 2 if sync quality needs improvement.

Rendering Pipeline

1

Audio Generation

The Session Music pipeline generates a 90-second audio track from the report’s terpene data and High Family classification. The track is structured with an intro, per-strain sections, a blend section, and an outro — matching the video structure described above.
2

Audio Analysis

The generated audio is analyzed for beat positions, spectral features, and energy contour. This analysis produces a timing map that the video renderer will use.
3

Video Generation

The Video Reports pipeline generates visual segments for each strain and the transitions between them. The visual parameters (timing, color intensity, complexity) are modulated by the audio timing map.
4

Synchronization and Mux

The audio and video tracks are combined using FFmpeg or a cloud video processing service. Beat-reactive effects are applied. Subtitles (strain names, terpene data) are overlaid at the appropriate moments. The final video is encoded as MP4 (H.264 + AAC).
5

Thumbnail and Preview

A thumbnail frame is extracted from the visual peak moment. A 15-second preview clip is extracted for the upgrade hook. Both are stored alongside the full video.
6

Delivery

The final video is uploaded to CDN. Push notification sent. The music video appears on the report detail page and in the user’s media library.

Cost Estimates

ComponentCost Per Music VideoNotes
Audio generation (90 sec)0.050.05 - 0.30Subset of Session Music cost
Audio analysis (beat detection)$0.01Lightweight compute
Video generation (90 sec)0.500.50 - 2.00Main cost driver
Synchronization + rendering0.100.10 - 0.30FFmpeg or cloud video processing
CDN storage (~50MB)$0.005Negligible
Total per music video0.660.66 - 2.64Highest per-unit cost of any format
At 11-3 per generation and potentially 4+ reports per month per Studio subscriber, music video costs could reach 44-12 per subscriber per month. This is why the feature is tentatively Studio-tier only. The economics need real benchmarks before committing to unlimited generation.

Tier Impact

TierAccess
FreeSee a blurred thumbnail of the music video on the report page. Tap to see a 3-second frozen preview with a “Studio” badge overlay. No audio preview.
ProSee the full thumbnail and a 15-second preview clip (audio + video, faded). Enough to understand what it is and want it. Cannot access the full video.
StudioFull access. Unlimited music video generation for every report. Download as MP4. Share natively. Priority rendering queue.

Why Music Videos Should Be Built Last

This feature depends on two other features being stable, performant, and well-understood:
  1. Session Music must be generating high-quality audio tracks reliably. The music is half the output.
  2. Video Reports must be rendering video segments reliably. The video is the other half.
Building Music Videos before these pipelines are stable means debugging synchronization issues while the underlying outputs are also unstable — a nightmare. Build the audio pipeline. Build the video pipeline. Verify both. Then compose them. Additionally, the synchronization layer is novel engineering that does not exist in the other features. It deserves focused attention without competing priorities.

Dependencies

  • Research report pipeline — built and live
  • Strain terpene profiles — built and live
  • High Family classification — built and live
  • Trigger.dev infrastructure — built and live
  • Session Music pipeline — see Session Music (prerequisite)
  • Video Reports pipeline — see Video Reports (prerequisite)
  • Audio waveform analysis (beat detection, spectral features)
  • Beat-reactive video rendering
  • Audio/video synchronization and muxing
  • CDN storage for video files (~50MB each)
  • 15-second preview clip extraction
  • Push notification for “music video ready” event
  • Studio tier implementation (gating)

Open Questions

  1. Separate generation or composed? — Should the music video use the SAME audio and video outputs generated for Session Music and Video Reports (cheaper, but the audio/video were not designed to sync), or should it trigger NEW generations with synchronization constraints (more expensive, better quality)? Recommendation: new generations with shared constraints for v1; explore reuse if costs need reduction.
  2. Target length — 60 seconds fits Instagram Reels and TikTok natively. 90 seconds allows more depth but may exceed optimal social media length. 30 seconds is punchier but may feel incomplete. Should we generate all three lengths and let the user choose?
  3. Style consistency across users — Should all music videos share a visual style (branded, cohesive) or should the style vary based on the data (more personalized but less brand-coherent)? A branded opening/closing with personalized middle content may be the balance.
  4. Is this actually a separate feature? — If Session Music and Video Reports both exist, a user could manually play the Session Music track while watching the Video Report. The music video is essentially “those two things, synced.” Is the synchronization value worth the additional engineering and Studio tier positioning? The answer may be yes (synced is dramatically better than manual overlay) or no (good enough is good enough).
  5. Share format — MP4 is universal but large. Should we also offer GIF export (no audio, lower quality, but embeds everywhere)? Or short-form vertical video specifically optimized for each social platform?
  • Session Music — Provides the audio half of the music video
  • Video Reports — Provides the visual half of the music video
  • Strain Art — Visual aesthetic shared between art and music video visuals
  • AI Worlds — A world flythrough with session music is conceptually similar
  • Strain Page Videos — Animated mascots could appear in music videos
  • Blog AI Content — Blog articles could also get music video treatments
  • AI Media Overview — Strategic context for all AI media formats