Skip to main content

Overview

High IQ’s strain database contains over 16,000 cannabis strains with detailed profiles covering genetics, effects, terpenes, cannabinoids, grow information, and more. This data does not come from a single source — it is aggregated, normalized, and enriched through multiple pipelines that run continuously. This page explains where our data originates, how it flows through the platform, and what quality controls ensure accuracy.

Primary Data Sources

Supabase Strain Database

The canonical strain database lives in Supabase (PostgreSQL) and serves as the single source of truth for all strain information across the platform. Every API response, website page, and mobile app screen pulls from this centralized store. The database includes:
  • Strain profiles — Name, aliases, genetics (parent strains), breeder information, strain type (indica/sativa/hybrid percentages)
  • Chemical data — THC, CBD, CBG, CBN, THCV, CBC percentages with lab verification flags
  • Terpene profiles — Individual terpene concentrations in mg/g where available, plus qualitative aroma descriptors
  • Effects — Reported effects with frequency data (e.g., “relaxed” reported by 78% of users)
  • Growing information — Flowering time, yield, difficulty, indoor/outdoor suitability
  • Media — Strain images, descriptions, and educational content

Competitor Monitoring

High IQ runs automated weekly monitoring of major cannabis platforms to identify new strains and track market trends:
SourceStrain CountSync FrequencyMethod
Leafly~4,500 strainsWeekly (Sunday 3 AM CT)Public sitemap extraction
AllBud~15,000 strainsWeekly (Sunday 3 AM CT)Public sitemap extraction
The competitor monitoring pipeline works as follows:
1

Sitemap Extraction

Public sitemaps are parsed to discover strain URLs. Only strain page URLs are extracted — no scraping of copyrighted content occurs.
2

Deduplication

Discovered strain names are normalized and compared against the existing database to identify genuinely new strains versus name variations of existing entries.
3

Status Tracking

Each competitor strain receives a status: pending (new discovery), queued (ready for processing), processed (enriched and added), skipped (duplicate or insufficient data), or duplicate (exact match found).
4

Queue Prioritization

New strain candidates enter the unprocessed strain queue, sorted by popularity so high-demand strains are enriched first.
Competitor monitoring extracts only strain names from public sitemaps. We do not copy descriptions, images, or other copyrighted content from competitor platforms. All strain profile content in High IQ is independently sourced and written.

Research Pipelines

Two automated research pipelines enrich the database with scientific and media content:

Strain Research Pipeline

The strain research pipeline is a multi-stage process powered by Trigger.dev that runs in the cloud with no timeout limits:
  1. Data collection — Gathers strain genetics, chemical profiles, and effect data from multiple public sources
  2. AI enrichment — Generates comprehensive descriptions, effect summaries, and educational content
  3. Quality validation — Automated quality gates ensure data meets minimum completeness thresholds
  4. Database insertion — Validated data is written to the production database
Each pipeline run can process strains for over 2 hours without interruption, with per-stage visibility and granular retry capabilities.

Paper Research Pipeline

A daily automated pipeline aggregates cannabis research papers:
  1. Search — Queries PubMed for recent cannabis research publications
  2. Deduplication — Filters out papers already in the database
  3. Summarization — AI generates plain-language summaries of each paper
  4. Quality Gate — Ensures summaries are accurate and educational
  5. Imaging — Generates thumbnail images for each paper
  6. Save — Validated papers are stored for display in the research hub
This pipeline runs daily at 7 AM Central Time, keeping the research hub current with the latest published science.

User Contributions

High IQ users contribute data in several ways:

Label Scanner

Users scan dispensary labels with the AI-powered label scanner, which extracts terpene profiles, THC/CBD percentages, and strain names. This real-world lab data enriches existing strain profiles. Learn more

Stash Data

When users add strains to their stash with dispensary information and pricing, this aggregated data helps track strain availability and market pricing trends.

Ratings & Reviews

User ratings and effect reports contribute to the engagement component of strain scoring and help validate effect profiles.

Strain Submissions

Users can submit strains not found in the database. Submissions enter the processing queue for verification and enrichment.
Every label scan contributes to the collective knowledge base. Even if a strain is already in the database, your scan may add terpene data from a different grower or batch, improving the overall profile.

Data Flow Architecture

The following shows how data moves through the High IQ platform:
┌─────────────────┐     ┌──────────────────┐     ┌─────────────────┐
│  Competitor      │     │  Research         │     │  User            │
│  Monitoring      │     │  Pipelines        │     │  Contributions   │
│  (Weekly)        │     │  (Daily/On-demand)│     │  (Real-time)     │
└────────┬────────┘     └────────┬─────────┘     └────────┬────────┘
         │                       │                         │
         ▼                       ▼                         ▼
┌─────────────────────────────────────────────────────────────────┐
│                    Supabase Database                              │
│                    (Single Source of Truth)                       │
└────────────────────────────┬────────────────────────────────────┘


                    ┌─────────────────┐
                    │    Hono API      │
                    │    (Edge Cache)  │
                    └────────┬────────┘

              ┌──────────────┼──────────────┐
              ▼              ▼              ▼
        ┌──────────┐  ┌──────────┐  ┌──────────┐
        │ Website  │  │ Mobile   │  │ Third    │
        │          │  │ App      │  │ Party    │
        └──────────┘  └──────────┘  └──────────┘

Data Quality Controls

Normalization

Raw strain data arrives in inconsistent formats. The @tiwih/api-normalizers package standardizes:
  • Strain names — Consistent capitalization, removal of special characters, alias resolution (e.g., “GSC” maps to “Girl Scout Cookies”)
  • Terpene names — Standardized to canonical names (e.g., “b-caryophyllene” becomes “caryophyllene”)
  • Effect names — Mapped to a controlled vocabulary to prevent duplicates like “relaxed” vs. “relaxing”
  • Cannabinoid values — Converted to consistent percentage format with validation ranges

Deduplication

Multiple layers of deduplication prevent strain records from being created for the same cultivar:
  • Name matching — Fuzzy matching catches common variations (e.g., “Blue Dream” vs. “BlueDream” vs. “Blue Dream #1”)
  • Genetic matching — Strains with identical parent lineage are flagged for manual review
  • Competitor cross-referencing — The competitor monitoring pipeline tracks which strains have already been imported

Freshness

Data freshness varies by source:
Data TypeUpdate FrequencyStaleness Threshold
Competitor discoveriesWeekly7 days
Research papersDaily24 hours
User label scansReal-timeImmediate
Strain enrichmentOn-demand via pipelineVaries by queue position
YouTube videosOn-demand with 30-day cache30 days

Database Statistics

The strain database currently contains:
  • 16,000+ total strain profiles
  • 11,500+ candidate strains from competitor monitoring awaiting enrichment
  • 6 High Spectrum Families classifying strains by terpene profile
  • 20+ terpenes tracked with concentration data
  • 6+ cannabinoids tracked per strain
  • Daily research paper ingestion from PubMed
Strain data is provided for informational purposes only. Effects and potency can vary based on growing conditions, individual tolerance, and consumption method. Always start low and go slow.

Frequently Asked Questions

No. We extract only strain names from publicly available sitemaps (the same data search engines use to index pages). We do not copy descriptions, images, reviews, or any copyrighted content. All High IQ strain profiles are independently written and sourced.
Contact us at support@thisiswhyimhigh.com with the strain name and the specific data you believe is incorrect. We investigate all reports and update our database accordingly.
Where possible, we include data sourced from Certificates of Analysis (COAs) submitted through the label scanner. However, we cannot independently verify all lab results. Look for the lab-verified badge on strain profiles to identify data backed by COA submissions.
We are exploring partnerships with dispensaries for direct data feeds. Contact support@thisiswhyimhigh.com if your dispensary is interested.