Skip to main content

Overview

Gmail Sync is one of High IQ’s most architecturally complex features, spanning four systems that work together to scan, classify, and import dispensary orders from a user’s Gmail inbox. This guide explains how it all fits together.
This is a technical deep-dive for users curious about how Gmail Sync works under the hood. For usage instructions, see the Gmail Sync help article.

System Architecture

Gmail Sync coordinates four independent systems:
SystemRoleTechnology
Mobile AppWizard UI, user interactions, real-time displayReact Native, Expo Router
ConvexReal-time state management, user data, webhook receiverConvex (reactive database)
Trigger.devBackground job orchestration, Gmail API access, AI classificationTrigger.dev v3 (serverless tasks)
SupabaseCrowdsourced dispensary domain databasePostgreSQL with RPC functions
Mobile App                    Convex                         Trigger.dev
-----------                   ------                         -----------
Gmail Sync Wizard ---------> gmailSync mutations -------->  gmail-scan-task
  Connect Step                 real-time queries               |
  Scan Step                    webhook handler (HTTP)          +--> gmail-classify-task
  Review Dispensaries          import job tracking             |
  Review Orders                order creation                  +--> gmail-batch-import-task
  Import Settings                  ^                           |      +--> gmail-import-task (x N)
  Import Progress                  |                           |
                                   +--- webhook POST ----------+
                                                               |
                                                            Supabase
                                                            --------
                                                            dispensary_email_domains
                                                            (crowdsourced, community-shared)

Why Four Systems?

Each system handles what it does best:
  • Convex excels at real-time subscriptions — the wizard UI updates instantly as background jobs progress, without polling.
  • Trigger.dev handles long-running jobs (scanning years of email can take minutes) with built-in retries, concurrency control, and no timeout limits.
  • Supabase provides a shared PostgreSQL database for crowdsourced domain data that is independent of any single user’s account.
  • The Mobile App provides the guided wizard experience where users review and approve everything before import.

The Pipeline: Step by Step

1. User Starts a Scan

When the user taps “Start Scan” in the wizard, the mobile app calls a Convex action (startGmailScan) which:
  1. Creates an import job record in the gmailImportJobs table with status scanning
  2. Sends a REST API request to Trigger.dev to start the gmail-scan task
  3. Returns the job ID to the mobile app for real-time tracking
The mobile app subscribes to the import job via a Convex useQuery hook, so any status changes from background tasks appear instantly in the UI. The scan task runs on Trigger.dev with server-side access to the user’s Gmail via a fresh OAuth token retrieved from Clerk’s Backend API. It performs a 3-layer search strategy:
Source: The user’s saved dispensaries in High IQ (website URLs and email addresses)How it works: Before the scan starts, Convex extracts email domains from the user’s dispensary records. For example, if the user saved “Green Leaf” with website greenleaf.com, the scan searches for from:@greenleaf.com.Confidence: 0.95 (very high — the user explicitly saved this dispensary)Why it matters: This is the most accurate layer because the user has already confirmed these are their dispensaries. Even obscure local shops are covered if the user has saved them.
Source: The dispensary_email_domains table in Supabase, contributed anonymously by all High IQ usersHow it works: The app fetches verified dispensary domains from Supabase (e.g., noreply@dutchiepay.com, orders@iheartjane.com) and batches them into Gmail search queries. Domains are searched in groups of 20 to stay within Gmail’s query length limits.Confidence: 0.90 (high — these domains have been confirmed by multiple users)Why it matters: This is the “network effect” layer. As more users import orders, the shared domain database grows, making detection better for everyone. The database is pre-seeded with 20+ major POS platform domains.
Source: Built-in keyword queries targeting cannabis-specific receipt languageHow it works: Four keyword queries search for combinations of:
  • Receipt/order confirmation terms + dispensary/cannabis terms
  • Receipt/order terms + product types (flower, edible, concentrate, vape)
  • THC/CBD/indica/sativa + purchase/order terms
  • Known POS platform names (dutchie, iheartjane, leafly, weedmaps)
Confidence: 0.50 (medium — these need AI classification to confirm)Why it matters: This catches receipts from dispensaries not yet in any domain database. It casts a wide net and relies on AI classification to filter out false positives.
All results are deduplicated by Gmail message ID across layers, so a receipt matching multiple layers is only shown once (with the highest confidence).

3. AI Classification

After scanning, the task chains directly into AI classification using triggerAndWait — a Trigger.dev primitive that runs the classify task within the same execution context, ensuring atomicity. Emails from Layers 1 and 2 (known dispensary domains) are auto-classified with high confidence — they already matched a confirmed dispensary domain, so no AI analysis is needed. Only Layer 3 (generic keyword) matches go through AI classification using GPT-4o-mini:
InputWhat the AI Receives
From headerorders@greenleaf.com
Subject lineYour order #4521 is ready for pickup
Email snippetFirst ~200 characters of body text
OutputWhat the AI Returns
Is dispensary receipt?true / false
Confidence0.0 to 1.0
Dispensary nameExtracted business name
Summary”3 items from Green Leaf, $85 total”
Estimated itemsApproximate item count
Estimated totalDollar amount if visible
Emails are classified in batches of 20 for efficiency. If the AI fails for a batch, those emails are marked as low-confidence (0.3) for manual review rather than being silently dropped.

4. User Review

Once classification is complete, the import job status changes to ready_for_review. The Convex real-time subscription instantly updates the mobile UI, transitioning the wizard from the scanning animation to the review screens. Candidates are presented in two review steps:
  1. Dispensary Groups — Emails grouped by dispensary name with toggle switches. Users can enable or disable entire dispensaries at once.
  2. Individual Orders — Each email shown with subject, date, confidence badge, and a toggle. Users fine-tune their selection.
All selection changes are persisted to Convex in real-time, so the user can leave and come back without losing their selections.

5. Order Import

When the user taps “Start Import,” a batch import task is triggered on Trigger.dev. It processes each selected email sequentially:
  1. Fetch email content — Retrieves the full email body from Gmail using a fresh OAuth token
  2. Parse receipt — Sends the email text to the Hono API receipt parser, which uses AI to extract structured order data (line items, prices, quantities)
  3. Create order — Posts the parsed data back to Convex via webhook, which creates the order record with items
  4. Check duplicates — Before creating, checks if an order with the same Gmail message ID already exists. If so, marks it as “duplicate” and skips creation.
  5. Auto-save dispensary — If the dispensary is new, creates a dispensary record and logs a visit
  6. Crowdsource domain — Upserts the sender’s email domain to the shared Supabase database
  7. Report progress — Posts a progress webhook so the mobile UI updates in real-time
Each step’s success or failure is reported individually, so the user sees a live feed of results as they happen.

6. Completion

When all emails are processed, the batch task sends a batch.complete webhook that:
  • Marks the import job as completed
  • Updates the user’s sync state with the current timestamp (for future Quick Syncs)
  • The mobile UI shows a completion summary with total orders imported, dispensaries found, total spend, and any failures

Webhook Communication

Trigger.dev tasks communicate results back to Convex via HTTP webhooks. This decoupled design means each system can be deployed and scaled independently.
EventSent ByPurpose
scan.completeScan taskDelivers candidate email list
classify.completeClassify taskDelivers AI classification results
import.progressImport task (per email)Reports success/failure for each order
batch.completeBatch import taskMarks the job as done
batch.failedBatch import taskMarks the job as failed with error
All webhooks are authenticated with a shared bearer token and include retry support — if a webhook fails, the Trigger.dev task throws an error, triggering automatic retries.

Data Lifecycle

Understanding what data is stored and for how long:
DataStored WhereRetention
Extracted order data (items, prices)Convex orders tablePermanent (user’s data)
Dispensary recordsConvex dispensaries tablePermanent (user’s data)
Email metadata (subject, sender, snippet)Convex gmailImportJobs7 days (auto-purged by daily cron)
Raw email contentTrigger.dev task memoryNever stored (processed and discarded)
Google OAuth tokensClerk (server-side)Managed by Clerk’s token lifecycle
Crowdsourced domainsSupabase dispensary_email_domainsPermanent (anonymous, shared)
The 7-day purge of email metadata is handled by a Convex cron job that runs daily. This ensures that sensitive email data (subjects, snippets) does not persist beyond the review window.

Quick Sync vs Full Wizard

AspectFull Wizard (first sync)Quick Sync (returning)
Entry point6-step wizard with date range pickerSingle-tap “Sync Recent Orders”
Date rangeUser-selected (6 months to all time)Automatic (last sync date to now)
Review flowDispensary groups then individual ordersSimplified order list with checkboxes
Empty stateNot applicable (first scan)“No new orders” with broader search option
Advanced fallbackN/A”Advanced Options” link to full wizard

Concurrency & Rate Limits

To prevent Gmail API abuse and ensure stable performance:
ResourceLimitScope
Scans1 concurrentPer user
Email imports5 concurrentGlobal (Trigger.dev queue)
Batch imports3 concurrentGlobal (Trigger.dev queue)
Gmail API callsStandard quotaGoogle’s per-project limits

Crowdsourced Domain Intelligence

The crowdsourced domain database is a key differentiator of Gmail Sync. Here is how it works:
  1. Seeded data — The database starts with 20+ pre-loaded domains from major POS platforms (Dutchie, Jane, Leafly, etc.)
  2. User contributions — Every successful order import upserts the sender’s email domain to the shared database with an incremented report count
  3. Confidence scoring — Domains with more reports get higher confidence scores, making them more likely to be used in scans
  4. No user linking — Contributions are completely anonymous. The database only stores the domain name, dispensary name, a sample subject line, and a report counter.
  5. Verification — Domains can be marked as verified by administrators, giving them the highest confidence tier
Over time, this creates a growing network effect: the more users import orders, the better Gmail Sync becomes at finding receipts for all users.