Gmail Sync Architecture

Overview

Gmail Sync is one of High IQ’s most architecturally complex features, spanning four systems that work together to scan, classify, and import dispensary orders from a user’s Gmail inbox. This guide explains how it all fits together.

This is a technical deep-dive for users curious about how Gmail Sync works under the hood. For usage instructions, see the Gmail Sync help article.

System Architecture

Gmail Sync coordinates four independent systems:

System	Role	Technology
Mobile App	Wizard UI, user interactions, real-time display	React Native, Expo Router
Convex	Real-time state management, user data, webhook receiver	Convex (reactive database)
Trigger.dev	Background job orchestration, Gmail API access, AI classification	Trigger.dev v3 (serverless tasks)
Supabase	Crowdsourced dispensary domain database	PostgreSQL with RPC functions

Mobile App                    Convex                         Trigger.dev
-----------                   ------                         -----------
Gmail Sync Wizard ---------> gmailSync mutations -------->  gmail-scan-task
  Connect Step                 real-time queries               |
  Scan Step                    webhook handler (HTTP)          +--> gmail-classify-task
  Review Dispensaries          import job tracking             |
  Review Orders                order creation                  +--> gmail-batch-import-task
  Import Settings                  ^                           |      +--> gmail-import-task (x N)
  Import Progress                  |                           |
                                   +--- webhook POST ----------+
                                                               |
                                                            Supabase
                                                            --------
                                                            dispensary_email_domains
                                                            (crowdsourced, community-shared)

Why Four Systems?

Each system handles what it does best:

Convex excels at real-time subscriptions — the wizard UI updates instantly as background jobs progress, without polling.
Trigger.dev handles long-running jobs (scanning years of email can take minutes) with built-in retries, concurrency control, and no timeout limits.
Supabase provides a shared PostgreSQL database for crowdsourced domain data that is independent of any single user’s account.
The Mobile App provides the guided wizard experience where users review and approve everything before import.

The Pipeline: Step by Step

1. User Starts a Scan

When the user taps “Start Scan” in the wizard, the mobile app calls a Convex action (startGmailScan) which:

Creates an import job record in the gmailImportJobs table with status scanning
Sends a REST API request to Trigger.dev to start the gmail-scan task
Returns the job ID to the mobile app for real-time tracking

The mobile app subscribes to the import job via a Convex useQuery hook, so any status changes from background tasks appear instantly in the UI.

2. Email Scanning (3-Layer Search)

The scan task runs on Trigger.dev with server-side access to the user’s Gmail via a fresh OAuth token retrieved from Clerk’s Backend API. It performs a 3-layer search strategy:

Layer 1: Personal Dispensary Domains

Source: The user’s saved dispensaries in High IQ (website URLs and email addresses)How it works: Before the scan starts, Convex extracts email domains from the user’s dispensary records. For example, if the user saved “Green Leaf” with website greenleaf.com, the scan searches for from:@greenleaf.com.Confidence: 0.95 (very high — the user explicitly saved this dispensary)Why it matters: This is the most accurate layer because the user has already confirmed these are their dispensaries. Even obscure local shops are covered if the user has saved them.

Layer 2: Crowdsourced Community Domains

Source: The dispensary_email_domains table in Supabase, contributed anonymously by all High IQ usersHow it works: The app fetches verified dispensary domains from Supabase (e.g., noreply@dutchiepay.com, orders@iheartjane.com) and batches them into Gmail search queries. Domains are searched in groups of 20 to stay within Gmail’s query length limits.Confidence: 0.90 (high — these domains have been confirmed by multiple users)Why it matters: This is the “network effect” layer. As more users import orders, the shared domain database grows, making detection better for everyone. The database is pre-seeded with 20+ major POS platform domains.

Layer 3: Generic Keyword Patterns

Source: Built-in keyword queries targeting cannabis-specific receipt languageHow it works: Four keyword queries search for combinations of:

Receipt/order confirmation terms + dispensary/cannabis terms
Receipt/order terms + product types (flower, edible, concentrate, vape)
THC/CBD/indica/sativa + purchase/order terms
Known POS platform names (dutchie, iheartjane, leafly, weedmaps)

Confidence: 0.50 (medium — these need AI classification to confirm)Why it matters: This catches receipts from dispensaries not yet in any domain database. It casts a wide net and relies on AI classification to filter out false positives.

All results are deduplicated by Gmail message ID across layers, so a receipt matching multiple layers is only shown once (with the highest confidence).

3. AI Classification

After scanning, the task chains directly into AI classification using triggerAndWait — a Trigger.dev primitive that runs the classify task within the same execution context, ensuring atomicity. Emails from Layers 1 and 2 (known dispensary domains) are auto-classified with high confidence — they already matched a confirmed dispensary domain, so no AI analysis is needed. Only Layer 3 (generic keyword) matches go through AI classification using GPT-4o-mini:

Input	What the AI Receives
From header	`orders@greenleaf.com`
Subject line	`Your order #4521 is ready for pickup`
Email snippet	First ~200 characters of body text

Output	What the AI Returns
Is dispensary receipt?	`true` / `false`
Confidence	0.0 to 1.0
Dispensary name	Extracted business name
Summary	”3 items from Green Leaf, $85 total”
Estimated items	Approximate item count
Estimated total	Dollar amount if visible

Emails are classified in batches of 20 for efficiency. If the AI fails for a batch, those emails are marked as low-confidence (0.3) for manual review rather than being silently dropped.

4. User Review

Once classification is complete, the import job status changes to ready_for_review. The Convex real-time subscription instantly updates the mobile UI, transitioning the wizard from the scanning animation to the review screens. Candidates are presented in two review steps:

Dispensary Groups — Emails grouped by dispensary name with toggle switches. Users can enable or disable entire dispensaries at once.
Individual Orders — Each email shown with subject, date, confidence badge, and a toggle. Users fine-tune their selection.

All selection changes are persisted to Convex in real-time, so the user can leave and come back without losing their selections.

5. Order Import

When the user taps “Start Import,” a batch import task is triggered on Trigger.dev. It processes each selected email sequentially:

Fetch email content — Retrieves the full email body from Gmail using a fresh OAuth token
Parse receipt — Sends the email text to the Hono API receipt parser, which uses AI to extract structured order data (line items, prices, quantities)
Create order — Posts the parsed data back to Convex via webhook, which creates the order record with items
Check duplicates — Before creating, checks if an order with the same Gmail message ID already exists. If so, marks it as “duplicate” and skips creation.
Auto-save dispensary — If the dispensary is new, creates a dispensary record and logs a visit
Crowdsource domain — Upserts the sender’s email domain to the shared Supabase database
Report progress — Posts a progress webhook so the mobile UI updates in real-time

Each step’s success or failure is reported individually, so the user sees a live feed of results as they happen.

6. Completion

When all emails are processed, the batch task sends a batch.complete webhook that:

Marks the import job as completed
Updates the user’s sync state with the current timestamp (for future Quick Syncs)
The mobile UI shows a completion summary with total orders imported, dispensaries found, total spend, and any failures

Webhook Communication

Trigger.dev tasks communicate results back to Convex via HTTP webhooks. This decoupled design means each system can be deployed and scaled independently.

Event	Sent By	Purpose
`scan.complete`	Scan task	Delivers candidate email list
`classify.complete`	Classify task	Delivers AI classification results
`import.progress`	Import task (per email)	Reports success/failure for each order
`batch.complete`	Batch import task	Marks the job as done
`batch.failed`	Batch import task	Marks the job as failed with error

All webhooks are authenticated with a shared bearer token and include retry support — if a webhook fails, the Trigger.dev task throws an error, triggering automatic retries.

Data Lifecycle

Understanding what data is stored and for how long:

Data	Stored Where	Retention
Extracted order data (items, prices)	Convex `orders` table	Permanent (user’s data)
Dispensary records	Convex `dispensaries` table	Permanent (user’s data)
Email metadata (subject, sender, snippet)	Convex `gmailImportJobs`	7 days (auto-purged by daily cron)
Raw email content	Trigger.dev task memory	Never stored (processed and discarded)
Google OAuth tokens	Clerk (server-side)	Managed by Clerk’s token lifecycle
Crowdsourced domains	Supabase `dispensary_email_domains`	Permanent (anonymous, shared)

The 7-day purge of email metadata is handled by a Convex cron job that runs daily. This ensures that sensitive email data (subjects, snippets) does not persist beyond the review window.

Quick Sync vs Full Wizard

Aspect	Full Wizard (first sync)	Quick Sync (returning)
Entry point	6-step wizard with date range picker	Single-tap “Sync Recent Orders”
Date range	User-selected (6 months to all time)	Automatic (last sync date to now)
Review flow	Dispensary groups then individual orders	Simplified order list with checkboxes
Empty state	Not applicable (first scan)	“No new orders” with broader search option
Advanced fallback	N/A	”Advanced Options” link to full wizard

Concurrency & Rate Limits

To prevent Gmail API abuse and ensure stable performance:

Resource	Limit	Scope
Scans	1 concurrent	Per user
Email imports	5 concurrent	Global (Trigger.dev queue)
Batch imports	3 concurrent	Global (Trigger.dev queue)
Gmail API calls	Standard quota	Google’s per-project limits

Crowdsourced Domain Intelligence

The crowdsourced domain database is a key differentiator of Gmail Sync. Here is how it works:

Seeded data — The database starts with 20+ pre-loaded domains from major POS platforms (Dutchie, Jane, Leafly, etc.)
User contributions — Every successful order import upserts the sender’s email domain to the shared database with an incremented report count
Confidence scoring — Domains with more reports get higher confidence scores, making them more likely to be used in scans
No user linking — Contributions are completely anonymous. The database only stores the domain name, dispensary name, a sample subject line, and a report counter.
Verification — Domains can be marked as verified by administrators, giving them the highest confidence tier

Over time, this creates a growing network effect: the more users import orders, the better Gmail Sync becomes at finding receipts for all users.

Cannabis Education

Using the Website

Platform Concepts

Legal & Policies

Gmail Sync Architecture

Overview

System Architecture

Why Four Systems?

The Pipeline: Step by Step

1. User Starts a Scan

2. Email Scanning (3-Layer Search)

3. AI Classification

4. User Review

5. Order Import

6. Completion

Webhook Communication

Data Lifecycle

Quick Sync vs Full Wizard

Concurrency & Rate Limits

Crowdsourced Domain Intelligence

Cannabis Education

Using the Website

Platform Concepts

Legal & Policies

​Overview

​System Architecture

​Why Four Systems?

​The Pipeline: Step by Step

​1. User Starts a Scan

​2. Email Scanning (3-Layer Search)

​3. AI Classification

​4. User Review

​5. Order Import

​6. Completion

​Webhook Communication

​Data Lifecycle

​Quick Sync vs Full Wizard

​Concurrency & Rate Limits

​Crowdsourced Domain Intelligence

Overview

System Architecture

Why Four Systems?

The Pipeline: Step by Step

1. User Starts a Scan

2. Email Scanning (3-Layer Search)

3. AI Classification

4. User Review

5. Order Import

6. Completion

Webhook Communication

Data Lifecycle

Quick Sync vs Full Wizard

Concurrency & Rate Limits

Crowdsourced Domain Intelligence