Overview
Gmail Sync is one of High IQ’s most architecturally complex features, spanning four systems that work together to scan, classify, and import dispensary orders from a user’s Gmail inbox. This guide explains how it all fits together.This is a technical deep-dive for users curious about how Gmail Sync works under the hood. For usage instructions, see the Gmail Sync help article.
System Architecture
Gmail Sync coordinates four independent systems:| System | Role | Technology |
|---|---|---|
| Mobile App | Wizard UI, user interactions, real-time display | React Native, Expo Router |
| Convex | Real-time state management, user data, webhook receiver | Convex (reactive database) |
| Trigger.dev | Background job orchestration, Gmail API access, AI classification | Trigger.dev v3 (serverless tasks) |
| Supabase | Crowdsourced dispensary domain database | PostgreSQL with RPC functions |
Why Four Systems?
Each system handles what it does best:- Convex excels at real-time subscriptions — the wizard UI updates instantly as background jobs progress, without polling.
- Trigger.dev handles long-running jobs (scanning years of email can take minutes) with built-in retries, concurrency control, and no timeout limits.
- Supabase provides a shared PostgreSQL database for crowdsourced domain data that is independent of any single user’s account.
- The Mobile App provides the guided wizard experience where users review and approve everything before import.
The Pipeline: Step by Step
1. User Starts a Scan
When the user taps “Start Scan” in the wizard, the mobile app calls a Convex action (startGmailScan) which:
- Creates an import job record in the
gmailImportJobstable with statusscanning - Sends a REST API request to Trigger.dev to start the
gmail-scantask - Returns the job ID to the mobile app for real-time tracking
useQuery hook, so any status changes from background tasks appear instantly in the UI.
2. Email Scanning (3-Layer Search)
The scan task runs on Trigger.dev with server-side access to the user’s Gmail via a fresh OAuth token retrieved from Clerk’s Backend API. It performs a 3-layer search strategy:Layer 1: Personal Dispensary Domains
Layer 1: Personal Dispensary Domains
Source: The user’s saved dispensaries in High IQ (website URLs and email addresses)How it works: Before the scan starts, Convex extracts email domains from the user’s dispensary records. For example, if the user saved “Green Leaf” with website
greenleaf.com, the scan searches for from:@greenleaf.com.Confidence: 0.95 (very high — the user explicitly saved this dispensary)Why it matters: This is the most accurate layer because the user has already confirmed these are their dispensaries. Even obscure local shops are covered if the user has saved them.Layer 2: Crowdsourced Community Domains
Layer 2: Crowdsourced Community Domains
Source: The
dispensary_email_domains table in Supabase, contributed anonymously by all High IQ usersHow it works: The app fetches verified dispensary domains from Supabase (e.g., noreply@dutchiepay.com, orders@iheartjane.com) and batches them into Gmail search queries. Domains are searched in groups of 20 to stay within Gmail’s query length limits.Confidence: 0.90 (high — these domains have been confirmed by multiple users)Why it matters: This is the “network effect” layer. As more users import orders, the shared domain database grows, making detection better for everyone. The database is pre-seeded with 20+ major POS platform domains.Layer 3: Generic Keyword Patterns
Layer 3: Generic Keyword Patterns
Source: Built-in keyword queries targeting cannabis-specific receipt languageHow it works: Four keyword queries search for combinations of:
- Receipt/order confirmation terms + dispensary/cannabis terms
- Receipt/order terms + product types (flower, edible, concentrate, vape)
- THC/CBD/indica/sativa + purchase/order terms
- Known POS platform names (dutchie, iheartjane, leafly, weedmaps)
3. AI Classification
After scanning, the task chains directly into AI classification usingtriggerAndWait — a Trigger.dev primitive that runs the classify task within the same execution context, ensuring atomicity.
Emails from Layers 1 and 2 (known dispensary domains) are auto-classified with high confidence — they already matched a confirmed dispensary domain, so no AI analysis is needed.
Only Layer 3 (generic keyword) matches go through AI classification using GPT-4o-mini:
| Input | What the AI Receives |
|---|---|
| From header | orders@greenleaf.com |
| Subject line | Your order #4521 is ready for pickup |
| Email snippet | First ~200 characters of body text |
| Output | What the AI Returns |
|---|---|
| Is dispensary receipt? | true / false |
| Confidence | 0.0 to 1.0 |
| Dispensary name | Extracted business name |
| Summary | ”3 items from Green Leaf, $85 total” |
| Estimated items | Approximate item count |
| Estimated total | Dollar amount if visible |
4. User Review
Once classification is complete, the import job status changes toready_for_review. The Convex real-time subscription instantly updates the mobile UI, transitioning the wizard from the scanning animation to the review screens.
Candidates are presented in two review steps:
- Dispensary Groups — Emails grouped by dispensary name with toggle switches. Users can enable or disable entire dispensaries at once.
- Individual Orders — Each email shown with subject, date, confidence badge, and a toggle. Users fine-tune their selection.
5. Order Import
When the user taps “Start Import,” a batch import task is triggered on Trigger.dev. It processes each selected email sequentially:- Fetch email content — Retrieves the full email body from Gmail using a fresh OAuth token
- Parse receipt — Sends the email text to the Hono API receipt parser, which uses AI to extract structured order data (line items, prices, quantities)
- Create order — Posts the parsed data back to Convex via webhook, which creates the order record with items
- Check duplicates — Before creating, checks if an order with the same Gmail message ID already exists. If so, marks it as “duplicate” and skips creation.
- Auto-save dispensary — If the dispensary is new, creates a dispensary record and logs a visit
- Crowdsource domain — Upserts the sender’s email domain to the shared Supabase database
- Report progress — Posts a progress webhook so the mobile UI updates in real-time
6. Completion
When all emails are processed, the batch task sends abatch.complete webhook that:
- Marks the import job as
completed - Updates the user’s sync state with the current timestamp (for future Quick Syncs)
- The mobile UI shows a completion summary with total orders imported, dispensaries found, total spend, and any failures
Webhook Communication
Trigger.dev tasks communicate results back to Convex via HTTP webhooks. This decoupled design means each system can be deployed and scaled independently.| Event | Sent By | Purpose |
|---|---|---|
scan.complete | Scan task | Delivers candidate email list |
classify.complete | Classify task | Delivers AI classification results |
import.progress | Import task (per email) | Reports success/failure for each order |
batch.complete | Batch import task | Marks the job as done |
batch.failed | Batch import task | Marks the job as failed with error |
Data Lifecycle
Understanding what data is stored and for how long:| Data | Stored Where | Retention |
|---|---|---|
| Extracted order data (items, prices) | Convex orders table | Permanent (user’s data) |
| Dispensary records | Convex dispensaries table | Permanent (user’s data) |
| Email metadata (subject, sender, snippet) | Convex gmailImportJobs | 7 days (auto-purged by daily cron) |
| Raw email content | Trigger.dev task memory | Never stored (processed and discarded) |
| Google OAuth tokens | Clerk (server-side) | Managed by Clerk’s token lifecycle |
| Crowdsourced domains | Supabase dispensary_email_domains | Permanent (anonymous, shared) |
The 7-day purge of email metadata is handled by a Convex cron job that runs daily. This ensures that sensitive email data (subjects, snippets) does not persist beyond the review window.
Quick Sync vs Full Wizard
| Aspect | Full Wizard (first sync) | Quick Sync (returning) |
|---|---|---|
| Entry point | 6-step wizard with date range picker | Single-tap “Sync Recent Orders” |
| Date range | User-selected (6 months to all time) | Automatic (last sync date to now) |
| Review flow | Dispensary groups then individual orders | Simplified order list with checkboxes |
| Empty state | Not applicable (first scan) | “No new orders” with broader search option |
| Advanced fallback | N/A | ”Advanced Options” link to full wizard |
Concurrency & Rate Limits
To prevent Gmail API abuse and ensure stable performance:| Resource | Limit | Scope |
|---|---|---|
| Scans | 1 concurrent | Per user |
| Email imports | 5 concurrent | Global (Trigger.dev queue) |
| Batch imports | 3 concurrent | Global (Trigger.dev queue) |
| Gmail API calls | Standard quota | Google’s per-project limits |
Crowdsourced Domain Intelligence
The crowdsourced domain database is a key differentiator of Gmail Sync. Here is how it works:- Seeded data — The database starts with 20+ pre-loaded domains from major POS platforms (Dutchie, Jane, Leafly, etc.)
- User contributions — Every successful order import upserts the sender’s email domain to the shared database with an incremented report count
- Confidence scoring — Domains with more reports get higher confidence scores, making them more likely to be used in scans
- No user linking — Contributions are completely anonymous. The database only stores the domain name, dispensary name, a sample subject line, and a report counter.
- Verification — Domains can be marked as verified by administrators, giving them the highest confidence tier
