Overview
High IQ’s strain database contains over 16,000 cannabis strains with detailed profiles covering genetics, effects, terpenes, cannabinoids, grow information, and more. This data does not come from a single source — it is aggregated, normalized, and enriched through multiple pipelines that run continuously. This page explains where our data originates, how it flows through the platform, and what quality controls ensure accuracy.Primary Data Sources
Supabase Strain Database
The canonical strain database lives in Supabase (PostgreSQL) and serves as the single source of truth for all strain information across the platform. Every API response, website page, and mobile app screen pulls from this centralized store. The database includes:- Strain profiles — Name, aliases, genetics (parent strains), breeder information, strain type (indica/sativa/hybrid percentages)
- Chemical data — THC, CBD, CBG, CBN, THCV, CBC percentages with lab verification flags
- Terpene profiles — Individual terpene concentrations in mg/g where available, plus qualitative aroma descriptors
- Effects — Reported effects with frequency data (e.g., “relaxed” reported by 78% of users)
- Growing information — Flowering time, yield, difficulty, indoor/outdoor suitability
- Media — Strain images, descriptions, and educational content
Competitor Monitoring
High IQ runs automated weekly monitoring of major cannabis platforms to identify new strains and track market trends:| Source | Strain Count | Sync Frequency | Method |
|---|---|---|---|
| Leafly | ~4,500 strains | Weekly (Sunday 3 AM CT) | Public sitemap extraction |
| AllBud | ~15,000 strains | Weekly (Sunday 3 AM CT) | Public sitemap extraction |
Sitemap Extraction
Public sitemaps are parsed to discover strain URLs. Only strain page URLs are extracted — no scraping of copyrighted content occurs.
Deduplication
Discovered strain names are normalized and compared against the existing database to identify genuinely new strains versus name variations of existing entries.
Status Tracking
Each competitor strain receives a status: pending (new discovery), queued (ready for processing), processed (enriched and added), skipped (duplicate or insufficient data), or duplicate (exact match found).
Queue Prioritization
New strain candidates enter the unprocessed strain queue, sorted by popularity so high-demand strains are enriched first.
Competitor monitoring extracts only strain names from public sitemaps. We do not copy descriptions, images, or other copyrighted content from competitor platforms. All strain profile content in High IQ is independently sourced and written.
Research Pipelines
Two automated research pipelines enrich the database with scientific and media content:Strain Research Pipeline
The strain research pipeline is a multi-stage process powered by Trigger.dev that runs in the cloud with no timeout limits:- Data collection — Gathers strain genetics, chemical profiles, and effect data from multiple public sources
- AI enrichment — Generates comprehensive descriptions, effect summaries, and educational content
- Quality validation — Automated quality gates ensure data meets minimum completeness thresholds
- Database insertion — Validated data is written to the production database
Paper Research Pipeline
A daily automated pipeline aggregates cannabis research papers:- Search — Queries PubMed for recent cannabis research publications
- Deduplication — Filters out papers already in the database
- Summarization — AI generates plain-language summaries of each paper
- Quality Gate — Ensures summaries are accurate and educational
- Imaging — Generates thumbnail images for each paper
- Save — Validated papers are stored for display in the research hub
User Contributions
High IQ users contribute data in several ways:Label Scanner
Users scan dispensary labels with the AI-powered label scanner, which extracts terpene profiles, THC/CBD percentages, and strain names. This real-world lab data enriches existing strain profiles. Learn more
Stash Data
When users add strains to their stash with dispensary information and pricing, this aggregated data helps track strain availability and market pricing trends.
Ratings & Reviews
User ratings and effect reports contribute to the engagement component of strain scoring and help validate effect profiles.
Strain Submissions
Users can submit strains not found in the database. Submissions enter the processing queue for verification and enrichment.
Data Flow Architecture
The following shows how data moves through the High IQ platform:Data Quality Controls
Normalization
Raw strain data arrives in inconsistent formats. The@tiwih/api-normalizers package standardizes:
- Strain names — Consistent capitalization, removal of special characters, alias resolution (e.g., “GSC” maps to “Girl Scout Cookies”)
- Terpene names — Standardized to canonical names (e.g., “b-caryophyllene” becomes “caryophyllene”)
- Effect names — Mapped to a controlled vocabulary to prevent duplicates like “relaxed” vs. “relaxing”
- Cannabinoid values — Converted to consistent percentage format with validation ranges
Deduplication
Multiple layers of deduplication prevent strain records from being created for the same cultivar:- Name matching — Fuzzy matching catches common variations (e.g., “Blue Dream” vs. “BlueDream” vs. “Blue Dream #1”)
- Genetic matching — Strains with identical parent lineage are flagged for manual review
- Competitor cross-referencing — The competitor monitoring pipeline tracks which strains have already been imported
Freshness
Data freshness varies by source:| Data Type | Update Frequency | Staleness Threshold |
|---|---|---|
| Competitor discoveries | Weekly | 7 days |
| Research papers | Daily | 24 hours |
| User label scans | Real-time | Immediate |
| Strain enrichment | On-demand via pipeline | Varies by queue position |
| YouTube videos | On-demand with 30-day cache | 30 days |
Database Statistics
The strain database currently contains:- 16,000+ total strain profiles
- 11,500+ candidate strains from competitor monitoring awaiting enrichment
- 6 High Spectrum Families classifying strains by terpene profile
- 20+ terpenes tracked with concentration data
- 6+ cannabinoids tracked per strain
- Daily research paper ingestion from PubMed
Frequently Asked Questions
Do you scrape competitor websites?
Do you scrape competitor websites?
No. We extract only strain names from publicly available sitemaps (the same data search engines use to index pages). We do not copy descriptions, images, reviews, or any copyrighted content. All High IQ strain profiles are independently written and sourced.
How can I report incorrect strain data?
How can I report incorrect strain data?
Contact us at support@thisiswhyimhigh.com with the strain name and the specific data you believe is incorrect. We investigate all reports and update our database accordingly.
Is lab data verified?
Is lab data verified?
Where possible, we include data sourced from Certificates of Analysis (COAs) submitted through the label scanner. However, we cannot independently verify all lab results. Look for the lab-verified badge on strain profiles to identify data backed by COA submissions.
Can dispensaries submit their strain data?
Can dispensaries submit their strain data?
We are exploring partnerships with dispensaries for direct data feeds. Contact support@thisiswhyimhigh.com if your dispensary is interested.