HEN Data Validation Report — 2026-06-09

Source: HEN Leads Supabase mirror (HubSpot → hen-sync Cloudflare Worker → Postgres). Read-only analysis. Emails masked (abc***@domain), phones masked (last 4 digits only).

Routine periodic re-run, one day after the 2026-06-08 post-sync-fix baseline. The mirror is healthy and incremental — the scheduled cron completed cleanly across three consecutive cycles overnight (00:00, 03:00, 06:00 UTC), all seven entities, 0 errors. Day-over-day figures are stable with small organic growth; the structural fixes from 2026-06-08 (contacts backfill, deals-hang, ISO-date/high-water) all hold. v_stale_rows is steady at 49.

Snapshot taken ~07:17 UTC (12:47 IST); last successful sync 06:00 UTC; next cron 09:00 UTC.

1. Total counts

Table Rows vs. 2026-06-08
companies 32,996 +11
contacts 37,546 +29
deals 8,280 +26
leads 3,192 +10
products 109 same
line_items 17,425 +28
deal_contacts 11,434 +34
contact_segments 2,865 +7
select 'contacts', count(*) from contacts where deleted_at is null;  -- repeat per table

Growth is small and incremental — exactly what a healthy 3-hourly sync should look like one day on (no backfill spikes, no regressions).

Field population health (contacts)

Field Populated of 37,546
first_name 37,195 99.1%
email 36,236 96.5%
company_id 33,620 89.5%
phone 13,471 35.9%

Population rates are unchanged from yesterday — the contacts table that was empty stubs pre-fix remains fully populated.


2. Sync health — green

All seven entities completed in the 06:00 UTC run with 0 errors:

Entity Last successful sync (UTC) Latest run Records processed
companies 2026-06-09 06:00 completed 24
contacts 2026-06-09 06:00 completed 70
deals 2026-06-09 06:00 completed 49
leads 2026-06-09 06:00 completed 28
line_items 2026-06-09 06:00 completed 1
products 2026-06-09 06:00 completed 1
segments 2026-06-09 06:00 completed 2,858

v_stale_rows = 49 (unchanged). High-water marks advance per entity, so the small per-run record counts (24–70) confirm the sync is genuinely incremental, not re-scanning. Three clean cron cycles were observed overnight (00:00 / 03:00 / 06:00 UTC) with no human present — the scheduler runs entirely on Cloudflare's edge.

select * from last_successful_sync order by entity_type;
select count(*) from v_stale_rows;          -- 49
Minor observation: the segments sync still processes all 2,858 rows every run (its high-water mark advances to "now" each time, so it full-scans). Harmless at this volume, but a candidate for the same incremental treatment as the other entities if segment counts grow.

3. Dedupe summary (v_dup_summary)

Check Clusters Rows in dupes Excess rows
Companies by name 2,434 6,193 3,759
Deals by name+company 454 1,147 693
Contacts by name+company 449 904 455
Companies by domain 288 655 367
Companies by phone 151 318 167
Leads by contact 46 103 57
Contacts by email 0 0 0
select * from v_dup_summary order by excess_rows desc;

Numbers crept up by single digits vs. yesterday (companies-by-name excess 3,752 → 3,759) — normal as new records land. Contacts-by-email remains 0: HubSpot enforces unique contact emails, so email is a clean merge key. The company-name backlog (3,759 excess) is the dominant cleanup item; domain matching stays noisy (shared registrar/association mailboxes → false positives), so prefer name/phone for companies.


4. Data quality flags

View Count Notes / change
v_deals_without_company 850 real (+5)
v_deals_without_contacts 166 real (+1)
v_closed_won_no_line_items 183 revenue not itemized (same)
v_closed_won_zero_amount 10 mostly demo/replacement deals (same)
v_orphan_line_items 106 line items with no parent deal (same)
v_contacts_without_company 3,926 10.5% of contacts (+14)
v_contacts_not_in_segments 34,728 expected — only 5 segment lists tracked
v_existing_customers 685 usable (+2)
v_reseller_companies 202 real (same)
v_distributor_companies 0 corrected view; 0 because deals.managing_distributor_company_id still unpopulated (Action 4)
v_leads_without_contact 3 real (same)
v_deals_stage_pipeline_mismatch 0 clean — no stage/pipeline contradictions
v_unknown_owners 0 misleading — owners table has null names, so the view can't flag them (Action 5)
v_stale_rows 49 unchanged

All flags are flat or moved by single digits vs. 2026-06-08 — no new data-quality regressions.


5. Drilldowns (top 5–6)

5.1 Top closed-won deals by revenue

Deal Amount Close
Hen Foundation - Closed Deal 250,000 2025-12-23
HEN Foundation - Closed Deal 125,000 2026-05-22
Ontario Fire Department - RFQ (Structure) 121,280 2026-03-31
Bauer Compressors Inc. - Ontario Fire Department 121,280 2026-03-31
US Marine Corp 89,838 2024-03-22
Sacramento FD - Hydro 200s - Order 2 59,350.31 2024-10-29
The two Ontario/Bauer rows at 121,280 are confirmed to share the same company_id, amount, and close date — a duplicate deal (Bauer Compressors is the partner on the Ontario FD order). Surfaced by v_dup_deals; see Action 9.

5.2 Top existing customers by revenue (resellers excluded — business rule 2)

Company Revenue Won deals
HEN Foundation 375,000 2
Ontario Fire Department 289,527 4
Sacramento Fire Dept 249,022.12 9
Contra Costa County Fire Protection District 122,418.29 11
Maui Fire Department 95,945.31 9
United States Marine Corps Forces Central Command 89,838 1
Why the reseller filter matters: an unfiltered "top accounts by revenue" is dominated by distributors, not end customers — Dinges Fire Supply alone shows $1,170,602 across 666 won deals, then L.N. Curtis & Sons ($534,563), Atlantic Emergency Solutions ($407,320). Per business rule 2 (customer_vertical IN ('US_Reseller','International_Reseller')) these are excluded from existing-customer analysis. The table above is the true end-customer ranking and matches the prior report.

5.3 Closed-won with no line items (top by amount)

Deal Amount
Hen Foundation - Closed Deal 250,000
HEN Foundation - Closed Deal 125,000
Sacramento FD - High Rise - 2 59,350.31
Sacramento Wildland - Blade45t 21,806
Bellevue Borough FD PA 12,936
AES - Demo Kit #2 11,888
$375k of the top closed-won revenue (the two HEN Foundation deals) has no itemized line items — revenue is booked at the deal level only, so product-line attribution misses it.

5.4 Contact duplicates by name+company

Name Records
Shannon Cherry 4
Doke White 3
Eric Saylors 3
Keith Gammel 3
Ken Wangen 3
Aaron Harris 2

5.5 Top company-name duplicate clusters (merge backlog)

Sample name Records
Los Angeles Fire Department 18
Orange County fire authority 16
San Francisco Fire Department 14
Santa Monica Fire Department 14
Redwood city fire department 13
Self-Employed 12
Most are legitimate fire departments entered with case/spelling variants (real merge candidates). "Self-Employed" ×12 is junk-label noise and should be scrubbed rather than merged.

6. Masking applied


7. Prioritized action list

P0 — RESOLVED ✅ (from 2026-06-08, holding)

  1. ✅ Contacts skeleton — fixed & backfilled; remains fully populated (96.5% email).
  2. ✅ Deals sync hang — fixed; deals complete cleanly every run.
  3. ✅ ISO-date / frozen high-water — fixed; incremental sync confirmed across multiple cron cycles.

P1 — still open (Worker-side mappings, unchanged)

  1. Managing Distributor association not synceddeals.managing_distributor_company_id is null for all deals, so v_distributor_companies = 0. Add the labeled association to the deals sync. (Note: §5.2 shows distributors *do* carry large revenue — Dinges $1.17M — so this association is high-value for distributor analysis.)
  2. Owner identityowners rows still have null names/emails, so v_unknown_owners reads a misleading 0 and business rule 6 (Jacob McAfee → Joe Schuller; Curt Johnson RSD null) can't run.
  3. Lead stage labels — raw *-stage-id values still leak unmapped; 863 leads have null lead_type (was ~877).

P2 — data cleanup (on the complete mirror)

  1. Merge company name duplicates (3,759 excess — top: LA Fire ×18, Orange County ×16) and contact name+company duplicates (455 excess); scrub the "Self-Employed" ×12 junk label.
  2. Resolve 850 deals without a company, 183 closed-won with no line items ($375k unattributed at the top), 106 orphan line items.
  3. De-duplicate the Ontario/Bauer 121,280 deal (§5.1) — confirmed same company/amount/close.

P3 — optimization (new, minor)

  1. Make the segments sync incremental — it currently full-scans all 2,858 rows each run (harmless now, see §2).

*Routine re-run 2026-06-09 (~07:17 UTC). All figures reflect the live mirror; last sync 06:00 UTC, sync healthy and incremental.*