Skip to main content

Overview

The Shopping Agent is a real-time menu intelligence system that combines web scraping, database matching, and AI personalization into a single coherent pipeline. When a user taps Shop Now from a dispensary page, the system extracts the dispensary’s live menu, matches every product against the 16,000+ strain database, scores results against the user’s personal preference profile, and delivers ranked recommendations — all within 15–45 seconds. This page documents the complete technical architecture for developers working on the Shopping Agent pipeline.

System Architecture

Mobile App

  ├─ POST /api/v1/shopping/scan-token          (menu scan flow)
  │     ↓
  │   Hono API (Edge)
  │     ├─ Cache check (Supabase)
  │     ├─ Dedup check (active Trigger.dev runs)
  │     └─ Trigger.dev task trigger → returns publicAccessToken + runId

  ├─ WebSocket subscription (useRealtimeTaskTrigger)
  │     ↓
  │   Trigger.dev: shopping-menu-scan task
  │     │
  │     Stage 1: cache-check ──────────── 5%
  │     Stage 2: scrape (Firecrawl) ───── 10–40%
  │     Stage 3: match (pg_trgm) ──────── 40–60%
  │     Stage 4: personalize (Claude) ─── 60–85%
  │     Stage 5: cache-save (Supabase) ── 85–95%
  │     Stage 6: complete ──────────────── 100%
  │     │
  │     └─ Task output → WebSocket → Mobile App renders results

  └─ POST /api/v1/research/strains/queue-batch  (discovery queue flow)
        Authorization: Bearer <Clerk token>
        body: { strainNames, source: "shopping_discovery" }

      Hono API (Edge)

      Trigger.dev: strain research pipeline

      Supabase strains_v2 (full profile added within hours)

Stage 1: Cache Check

Before any scraping begins, the task queries the menu_scans Supabase table for an unexpired entry matching the dispensary domain.
const cached = await supabase
  .from('menu_scans')
  .select('*')
  .eq('website_domain', dispensaryDomain)
  .gt('expires_at', new Date().toISOString())
  .order('scanned_at', { ascending: false })
  .limit(1)
  .single();

if (cached.data) {
  // Return cached result immediately — skip all subsequent stages
  return buildOutputFromCache(cached.data);
}
Cache TTL: 4 hours, shared across all users of the same dispensary. This means one user’s scan benefits every other High IQ user who visits that dispensary page within the same window. Unique constraint: The table enforces UNIQUE (website_domain, categories_hash). If the menu composition has changed (new categories), a fresh scan overwrites the old entry.

Stage 2: Menu Extraction (Firecrawl Tool-Based Architecture)

Menu extraction uses a tool-based architecture built on Firecrawl. Instead of a single extraction method, the system provides five independent tools that the pipeline orchestrator chains with cascading fallback. Each tool is independently callable and testable.

The Five Tools

ToolFunctionBest ForCredits
ExtractextractProducts(urls)Multi-page menus with wildcard patterns1
Scrape+JSONscrapeAndExtract(url)Single-page menus with direct URL1
ScrapescrapeMenuPage(url)Raw markdown for AI processing1
Scrape+AIextractProductsWithAI(markdown)Fallback when structured extraction fails~0.05 (Claude)
SitemapdiscoverSitemapUrls(domain)URL discovery for multi-page menus1
All tools are exported from @tiwih/trigger and live in packages/trigger/src/lib/firecrawl-agent.ts.

Tool 1: Extract (Primary — Multi-Page)

The Extract tool uses Firecrawl’s extract() API with wildcard URL patterns to crawl an entire menu section and extract structured product data across multiple pages. This is the primary extraction method because most dispensary menus span multiple URLs (e.g., /shop/flower, /shop/edibles, /shop/vapes).
import { extractProducts, DISPENSARY_EXTRACT_SCHEMA } from '@tiwih/trigger';

const result = await extractProducts(
  ['https://dispensary.com/shop/*'],  // Wildcard pattern
  {
    schema: DISPENSARY_EXTRACT_SCHEMA,  // Plain JSON schema (not Zod!)
    prompt: 'Extract all cannabis products...',
    showSources: true,
    timeout: 180,
  }
);
// result.products: DispensaryProduct[]
// result.sources: string[] (URLs that were crawled)
The Firecrawl SDK’s isZodSchema() detection triggers a broken tryZodV4Conversion path with Zod v4. The DISPENSARY_EXTRACT_SCHEMA is a plain JSON schema object that bypasses this entirely. Never pass Zod schemas to the Extract API.

Tool 2: Scrape+JSON (Single-Page)

For menus on a single URL, the Scrape+JSON tool combines scraping and extraction in one call — Firecrawl renders the page, then uses its built-in LLM to fill the schema.
import { scrapeAndExtract } from '@tiwih/trigger';

const result = await scrapeAndExtract(
  'https://dispensary.com/shop/flower',
  { waitFor: 3000 }  // Wait for JS rendering
);
// result.products: DispensaryProduct[]

Tool 3: Scrape+AI (Fallback)

When structured extraction returns 0 products (e.g., heavily JavaScript-rendered SPAs, age-gated sites), the pipeline falls back to scraping raw markdown and having Claude extract products from the text.
import { scrapeMenuPage, extractProductsWithAI } from '@tiwih/trigger';

const scrapeResult = await scrapeMenuPage(url, { waitFor: 3000 });
const products = await extractProductsWithAI(scrapeResult.markdown, {
  maxProducts: 200,
});
// products: DispensaryProduct[]

Tool 4: Sitemap Discovery

The Sitemap tool uses Firecrawl’s map() to discover all URLs on a domain, then filters for menu-related paths. This is used as a fallback when the initial scan returns 0 products — it discovers additional menu URLs to retry extraction.
import { discoverSitemapUrls } from '@tiwih/trigger';

const sitemap = await discoverSitemapUrls('dispensary.com');
// sitemap.menuUrls: string[]  — URLs matching /menu, /shop, /products, etc.
// sitemap.productUrls: string[]  — Individual product page URLs
// sitemap.totalDiscovered: number  — Total URLs found

Orchestrator: Cascading Fallback

The scanDispensaryMenu() orchestrator chains these tools with automatic fallback:
Extract (wildcard) ─── products > 0? ──→ Return
        │ no

Scrape+JSON ────────── products > 0? ──→ Return
        │ no

Scrape → AI ────────── products > 0? ──→ Return
        │ no

Sitemap Discovery ──── find menu URLs → Retry Extract/Scrape
Each result includes a method field ("extract", "scrape-json", or "scrape-ai") indicating which tool succeeded.

Extraction Performance (Real Tests)

ToolNuera CannabisAscend CannabisNotes
Extract (wildcard /shop/*)392 productsCrawls all menu pages
Scrape+JSON (single URL)89 productsSingle page only
Extract (specific URL)39 productsSingle URL, no wildcard
Scrape+AI (Claude fallback)72 productsFrom raw markdown

Why Plain JSON Schema, Not Zod

The Firecrawl JS SDK detects Zod schemas via isZodSchema() and attempts conversion through tryZodV4Conversion. With Zod v4 (used in this project), this conversion silently fails, causing Extract to return 0 products. Using a plain JSON schema object (DISPENSARY_EXTRACT_SCHEMA) bypasses the SDK’s Zod detection entirely.

Progress Reporting During Extraction

The extraction stage emits progress metadata so the mobile UI can show a live status like “Scanning menu… found 47 products so far.”
task.updateMetadata({
  stage: 'scrape',
  progress: 10 + (extractedSoFar / estimatedTotal) * 30,
  message: `Scanning menu... found ${extractedSoFar} products`,
  productCount: extractedSoFar,
});

Testing the Tools

Two test scripts are available in packages/trigger/:
# Test the orchestrator (cascading fallback)
pnpm test:shopping                    # Default: Nuera Cannabis
pnpm test:shopping --url <url>        # Custom URL
pnpm test:shopping --sitemap          # Test sitemap discovery

# Test all 3 Firecrawl extraction methods side by side
pnpm test:extract                     # Default: Nuera Cannabis /shop/flower
pnpm test:extract <url> scrape        # Test only scrape+JSON mode
pnpm test:extract <url> wildcard      # Test only wildcard extract mode
pnpm test:extract <url> specific      # Test only specific-URL extract mode

Stage 3: Strain Matching

Every extracted product name is run through a three-tier matching algorithm against the strains_v2 Supabase table. Matches are attempted in order; the first successful match wins.

Tier 1: Exact Match

SELECT id, slug, name_display, high_family
FROM strains_v2
WHERE name_canonical = lower(trim($1))
LIMIT 1;
name_canonical is a pre-computed lowercase, stripped version of the strain name stored at ingestion time. This handles the most common case: the dispensary uses the standard strain name. Confidence: high

Tier 2: Slug Match

SELECT id, slug, name_display, high_family
FROM strains_v2
WHERE slug = slugify($1)
LIMIT 1;
slugify() converts a string to URL-safe format (lowercase, hyphens, no special characters). Catches cases like “OG Kush” → og-kush matching a database entry with slug og-kush. Confidence: high

Tier 3: Trigram Similarity (pg_trgm)

SELECT id, slug, name_display, high_family,
       similarity(name_canonical, lower(trim($1))) AS sim
FROM strains_v2
WHERE similarity(name_canonical, lower(trim($1))) > 0.4
ORDER BY sim DESC
LIMIT 1;
PostgreSQL pg_trgm breaks strings into trigrams (3-character substrings) and computes a similarity score from 0.0 to 1.0. A threshold of 0.4 is permissive enough to catch common dispensary name variations while filtering out genuinely unrelated strings. Confidence: medium if similarity > 0.6, low if 0.4–0.6.

Unmatched Products

Products that pass through all three tiers without a match are added to the discoveries array. These represent real strains available locally that are not yet in the High IQ database.

Parallelism

Matching runs concurrently for all extracted products using Promise.all() batched in groups of 20 to avoid overwhelming the Supabase connection pool.
const BATCH_SIZE = 20;
const batches = chunk(extractedProducts, BATCH_SIZE);
const matchResults: Product[] = [];

for (const batch of batches) {
  const batchResults = await Promise.all(
    batch.map(product => matchProduct(product, supabase))
  );
  matchResults.push(...batchResults);
}

Stage 4: AI Personalization

Once all products are matched, Claude Sonnet generates personalized recommendations. The personalization step has two parts: deterministic tag assignment and AI recommendation generation.

Deterministic Tag Assignment

Tags are assigned by comparing the matched strain IDs against the user’s profile data passed in the request. This is pure logic — no AI involved.
function assignTags(product: Product, userContext: UserContext): PersonalizationTag[] {
  const tags: PersonalizationTag[] = [];

  if (userContext.favoriteStrainIds.includes(product.strainId)) {
    tags.push('favorite_in_stock');
  }
  if (userContext.lowStashStrainIds.includes(product.strainId)) {
    tags.push('running_low');
  }
  if (userContext.recentStrainIds.includes(product.strainId)) {
    tags.push('bought_before');
  }
  if (isSimilarToFavorite(product, userContext)) {
    tags.push('similar_to_favorite');
  }
  if (matchesPreferences(product, userContext)) {
    tags.push('matches_preferences');
  }
  if (!product.strainId) {
    tags.push('new_discovery');
  }
  if (isOnSale(product)) {
    tags.push('great_deal');
  }

  return tags;
}
isSimilarToFavorite() uses the High IQ strain similarity scores (pre-computed and stored in Supabase) to find products with similar terpene and effect profiles to the user’s favorites.

AI Recommendation Generation

After tags are assigned, the full product list (with tags and strain data) is passed to Claude Sonnet. The AI selects the top 3–5 picks and writes a plain-English reason for each.
const recommendations = await generateObject({
  model: anthropic('claude-sonnet-4-6'),
  schema: RecommendationsSchema,
  prompt: `
    You are a knowledgeable cannabis advisor.

    The user has the following preferences:
    - Favorite strains: ${favoriteNames.join(', ')}
    - Preferred types: ${userContext.preferredTypes.join(', ')}
    - Recently purchased: ${recentNames.join(', ')}

    Here are the available products at this dispensary:
    ${JSON.stringify(taggedProducts, null, 2)}

    Select the 3-5 best products for this user and explain in 1-2 sentences why each
    is a great match. Focus on the specific combination of tags and strain properties
    that make it right for them. Be specific — mention actual terpene names, effects,
    or price value when relevant.
  `,
});
The generateObject() call uses AI SDK 6 with schema for structured output, ensuring the recommendations are always valid JSON.

Stage 5: Cache Save

Results are upserted into the menu_scans table with a 4-hour expiry.
await supabase.from('menu_scans').upsert({
  website_domain: dispensaryDomain,
  categories_hash: computeCategoriesHash(products),
  scanned_at: new Date().toISOString(),
  expires_at: addHours(new Date(), 4).toISOString(),
  total_products: products.length,
  matched_count: products.filter(p => p.matched).length,
  unmatched_count: products.filter(p => !p.matched).length,
  products_json: products,
  recommendations_json: recommendations,
  discoveries_json: discoveries,
}, {
  onConflict: 'website_domain,categories_hash',
});
The categories_hash is an MD5 of the sorted category list. If the dispensary adds a new product category (e.g., starts selling topicals), the hash changes, triggering a fresh scan on the next request.

Stage 6: Complete

The task returns the full output payload, which Trigger.dev delivers to the mobile app via WebSocket. The useRealtimeTaskTrigger hook in the app receives the completed run and triggers a state update to show the results screen.

De-duplication

The Hono API endpoint checks for active Trigger.dev runs before triggering a new one.
const activeRuns = await trigger.runs.list({
  taskIdentifier: 'shopping-menu-scan',
  status: ['EXECUTING', 'WAITING'],
  metadata: { dispensaryDomain },
});

if (activeRuns.data.length > 0) {
  // Return a token for the already-running scan
  const token = await trigger.auth.createPublicToken({
    scopes: { read: { runs: [activeRuns.data[0].id] } },
  });
  return { source: 'dedup', publicAccessToken: token, runId: activeRuns.data[0].id };
}
This prevents two users opening the same dispensary page at the same time from triggering two concurrent scans.

Mobile App Integration

Screens

ScreenFileDescription
ScanScreen_screens/shopping/ScanScreen.tsxAnimated progress during scanning
ResultsScreen_screens/shopping/ResultsScreen.tsxProduct list with category tabs and recommendations
DiscoveryScreen_screens/shopping/DiscoveryScreen.tsxBatch research queue for unmatched strains — calls Hono API directly

Real-Time Hook

The mobile app uses useRealtimeTaskTrigger from @trigger.dev/react-hooks to subscribe to run progress and output without polling.
import { useRealtimeTaskTrigger } from '@trigger.dev/react-hooks';
import type { ShoppingMenuScanTask } from '@tiwih/trigger';

function ShoppingAgentScreen({ dispensary }) {
  const { submit, runs } = useRealtimeTaskTrigger<typeof ShoppingMenuScanTask>(
    'shopping-menu-scan'
  );

  const handleShopNow = async () => {
    const { publicAccessToken, cachedResult, runId } = await api.shopping.scanToken({
      dispensaryDomain: dispensary.domain,
      menuUrl: dispensary.menuUrl,
      userId: currentUser.id,
      userContext: buildUserContext(favorites, stash, orders),
    });

    if (cachedResult) {
      // Skip scan — go directly to results
      setResults(cachedResult);
      return;
    }

    // Subscribe to the live run
    submit(
      { dispensaryDomain: dispensary.domain, menuUrl: dispensary.menuUrl },
      { publicAccessToken }
    );
  };

  const activeRun = runs[0];
  const metadata = activeRun?.metadata;

  if (activeRun?.status === 'COMPLETED') {
    return <ResultsScreen data={activeRun.output} />;
  }

  return (
    <ScanScreen
      stage={metadata?.stage}
      progress={metadata?.progress ?? 0}
      message={metadata?.message ?? 'Starting...'}
    />
  );
}

Discovery Queue: Queueing Unmatched Strains for Research

When a user taps “Add to Research Queue” in the DiscoveryScreen, the app submits the unmatched strain names directly to the Hono API — no Convex middleman involved.

Why Direct API, Not Convex

Convex is the source of truth for user-owned data: orders, stash, favorites, and dispensaries. Strain research is a platform-level concern — the data ends up in Supabase (strains_v2) and benefits all users, not just the submitter. Routing it through Convex would violate the data layer boundary and add unnecessary latency.

Data Flow

DiscoveryScreen

  ├─ useAuth() from Clerk — obtain Clerk session token

  └─ POST /api/v1/research/strains/queue-batch
       {
         strainNames: ["Strain A", "Strain B", ...],
         source: "shopping_discovery"
       }
       Authorization: Bearer <clerk_token>


       Hono API (Edge) — validates auth, enqueues strains


       Trigger.dev strain research pipeline


       Supabase strains_v2 (full strain profile, typically within hours)

Source Type

The /queue-batch endpoint accepts a source field that identifies how the strain was discovered. The shopping_discovery value was added alongside the existing order_upload and manual values specifically for this flow.
// Source enum values for /api/v1/research/strains/queue-batch
type QueueBatchSource = 'order_upload' | 'manual' | 'shopping_discovery';

Implementation Details

// DiscoveryScreen.tsx (simplified)
import { useAuth } from '@clerk/clerk-expo';

const { getToken } = useAuth();
const submittingRef = useRef(false); // guard against double-submission

const handleQueueForResearch = async (strainNames: string[]) => {
  if (submittingRef.current) return;
  submittingRef.current = true;

  try {
    const token = await getToken();
    const response = await fetch(
      `${Config.API_BASE_URL}/api/v1/research/strains/queue-batch`,
      {
        method: 'POST',
        headers: {
          'Content-Type': 'application/json',
          Authorization: `Bearer ${token}`,
        },
        body: JSON.stringify({
          strainNames,
          source: 'shopping_discovery',
        }),
      }
    );

    if (!response.ok) throw new Error('Queue request failed');
    Alert.alert('Queued', `${strainNames.length} strain(s) queued for research.`);
  } catch {
    Alert.alert('Error', 'Could not queue strains. Please try again.');
  } finally {
    submittingRef.current = false;
  }
};
The useRef guard prevents the user from double-submitting if they tap the button quickly while the request is in flight. Unlike useState, a ref update does not trigger a re-render, so the button can remain visually enabled for the next valid submission without a flash of disabled state.
The Trigger.dev strain research pipeline processes queued strains asynchronously. Full strain profiles (genetics, terpenes, effects, images) are typically available within a few hours of submission.

Configuration

Environment Variables

The following environment variables are required in apps/api/.env and must be set in the Vercel project settings for production.
VariableRequiredDescription
TRIGGER_SECRET_KEYYesTrigger.dev secret key for triggering tasks and creating public tokens
FIRECRAWL_API_KEYYesFirecrawl API key for the menu scanning agent
ANTHROPIC_API_KEYYesAnthropic API key for Claude Sonnet personalization (or set AI_GATEWAY_API_KEY if routing through Vercel AI Gateway)
SUPABASE_URLYesSupabase project URL
SUPABASE_SERVICE_ROLE_KEYYesSupabase service role key for server-side reads and writes

Trigger.dev Package Setup

The shopping-menu-scan task lives in the @tiwih/trigger package at packages/trigger/src/tasks/shopping-menu-scan.ts. It is deployed to the Trigger.dev cloud alongside the other pipeline tasks.
# Deploy trigger tasks (from monorepo root)
cd packages/trigger && npx trigger deploy

Adjustable Limits

ConstantDefaultDescription
CACHE_TTL_HOURS4Menu scan cache duration
TRIGRAM_THRESHOLD0.4Minimum similarity score for a trigram match
TRIGRAM_MEDIUM_CONFIDENCE0.6Threshold above which trigram matches are rated medium confidence
MAX_RECOMMENDATIONS5Maximum AI recommendations returned
MATCH_BATCH_SIZE20Products matched concurrently per database round-trip

Performance Characteristics

OperationTypical DurationNotes
Cache hit response< 100msFull results from Supabase, no task triggered
Extract (wildcard)15–60 secondsCrawls multiple pages; larger menus take longer
Scrape+JSON (single page)5–15 secondsSingle page render + LLM extraction
Scrape+AI (fallback)8–20 secondsScrape (3–8s) + Claude extraction (5–12s)
Sitemap discovery3–8 secondsFirecrawl map() — used only as fallback
Strain matching (100 products)1–3 secondspg_trgm with GIN index is fast
Claude Sonnet personalization2–5 secondsgenerateObject with schema is deterministic
Total (cold scan)15–60 secondsP50 ~25s, P95 ~60s (Extract crawls more pages)
The Supabase strains_v2 table has a GIN index on name_canonical for trigram searches: CREATE INDEX idx_strains_name_trgm ON strains_v2 USING GIN (name_canonical gin_trgm_ops). Without this index, trigram matching on 16,000 rows would be too slow for the pipeline.

Observability

All pipeline stages emit structured logs via @tiwih/logger with the shopping category. Trigger.dev’s dashboard shows per-run stage durations, metadata snapshots, and task output — making it straightforward to identify where time is spent on any given scan.