Blog

Customer Data Unification: Strategies and Implementation

22 Apr
11min read
MaxMax

The average B2B company has customer data scattered across 15-20 different systems. The same account appears in your CRM, marketing automation, product analytics, billing system, and support platform—often with different names, conflicting data, and no clear connection between records.

Data unification solves this by creating a single, authoritative view of each customer. This guide covers how to implement customer data unification that actually works.

The Unification Challenge #

Why Data Gets Fragmented

Multiple Entry Points

  • Website forms with different fields
  • Sales creating records manually
  • Marketing importing lists
  • Product signups
  • Support ticket creation

Different Identifiers

  • Email addresses (work vs. personal)
  • Company names (variations, subsidiaries)
  • Phone numbers (formats)
  • Domain names

System Silos

  • Each tool maintains its own database
  • Limited synchronization
  • Different data models
  • Conflicting updates

Business Impact

ProblemConsequence
Duplicate accountsWrong targeting, wasted spend
Missing dataIncomplete personalization
Conflicting dataWrong decisions
Delayed syncStale information
No single viewManual research required

Unification Fundamentals #

Entity Resolution

Matching records across systems to the same real-world entity:

Account Matching

  • Company name normalization
  • Domain matching
  • Subsidiary mapping
  • Alias handling

Contact Matching

  • Email matching
  • Name + company matching
  • Phone matching
  • LinkedIn ID matching

Data Hierarchy

When data conflicts, which source wins?

Source Priority Example

FieldPriority 1Priority 2Priority 3
Company nameEnrichmentCRMMarketing
Employee countEnrichmentUser-providedCRM
Contact emailUser-providedEnrichmentMarketing
RevenueBillingCRMEnrichment
EngagementMarketingProductCRM

Golden Record

The authoritative, unified record combining all sources:

Golden Account Record:
├── Core Identity (from resolution)
│   ├── account_id (generated)
│   ├── domain (canonical)
│   └── company_name (normalized)

├── Firmographics (from enrichment)
│   ├── industry
│   ├── employee_count
│   ├── annual_revenue
│   └── location

├── Engagement (aggregated)
│   ├── total_engagement_score
│   ├── last_activity_date
│   ├── engagement_by_channel
│   └── key_activities

├── Revenue (from CRM + billing)
│   ├── pipeline_value
│   ├── closed_revenue
│   ├── subscription_status
│   └── renewal_date

└── Relationships (linked)
    ├── contacts[]
    ├── opportunities[]
    └── activities[]

Unification Architecture #

Approach 1: Warehouse-Centric

Sources → ETL → Warehouse → Transformation → Golden Records → Activation

Process

  1. Ingest raw data from all sources
  2. Store in warehouse
  3. Apply matching logic via SQL/dbt
  4. Create unified models
  5. Push to operational systems

Pros

  • Full control over logic
  • Flexible transformations
  • Cost-effective at scale

Cons

  • Latency (batch processing)
  • Technical expertise required
  • Complex to maintain

Approach 2: CDP-Centric

Sources → CDP → Unified Profiles → Activation

Process

  1. Connect sources to CDP
  2. CDP handles identity resolution
  3. Profiles unified automatically
  4. Activate to destinations

Pros

  • Faster implementation
  • Built-in identity resolution
  • Real-time capabilities

Cons

  • Vendor dependency
  • Less customization
  • Higher cost at scale

Approach 3: Operational Layer

Sources → Operational Platform → Unified View → Actions

Process

  1. Connect sources to operational platform
  2. Real-time unification
  3. Immediate action capabilities
  4. Bi-directional sync

Pros

  • Real-time unification
  • Action-oriented
  • Purpose-built for GTM

Cons

  • May duplicate warehouse capabilities
  • Integration requirements

Identity Resolution Techniques #

Deterministic Matching

Exact matches on unique identifiers:

Match Conditions:
- Email exact match
- Domain exact match
- LinkedIn ID match
- Phone number (normalized) match

Strengths: High confidence, no false positives Weaknesses: Misses variations, case sensitivity issues

Probabilistic Matching

Fuzzy matching based on multiple signals:

Match Scoring:
- Company name similarity: 0-30 points
- Domain similarity: 0-25 points
- Location match: 0-15 points
- Industry match: 0-10 points
- Employee count proximity: 0-10 points
- Contact overlap: 0-10 points

Threshold: 70+ points = match

Strengths: Catches variations, more complete Weaknesses: Risk of false positives, requires tuning

Hierarchical Matching

Handle parent/subsidiary relationships:

Company Hierarchy:
├── Ultimate Parent: Acme Corp
│   ├── Subsidiary: Acme EMEA Ltd
│   ├── Subsidiary: Acme APAC Pte
│   └── Subsidiary: Acme Canada Inc

Options:
1. Roll up to parent (enterprise view)
2. Keep separate (regional view)
3. Both (flexible analysis)

Implementation Guide #

Step 1: Source Inventory

Document all customer data sources:

SourceEntity TypesKey FieldsVolumeUpdate Frequency
SalesforceAccount, ContactDomain, Email50K accountsReal-time
HubSpotCompany, ContactDomain, Email80K contactsHourly
SegmentUser, GroupEmail, GroupID100K usersReal-time
ClearbitCompany, PersonDomain, EmailOn-demandN/A

Step 2: Identity Key Design

Define your matching keys:

Account Keys

  • Primary: Domain (canonical)
  • Secondary: Company name (normalized)
  • Tertiary: External IDs

Contact Keys

  • Primary: Email (lowercase)
  • Secondary: LinkedIn URL
  • Tertiary: Name + Company

Step 3: Matching Rules

Define how records match:

Account Matching Rules:

Rule 1: Domain Match (Deterministic)
IF source1.domain = source2.domain
THEN match (confidence: 100%)

Rule 2: Name + Location Match (Probabilistic)
IF similarity(source1.name, source2.name) > 0.9
AND source1.city = source2.city
THEN match (confidence: 85%)

Rule 3: Name Only Match (Low confidence)
IF similarity(source1.name, source2.name) > 0.95
THEN potential_match (confidence: 60%)
→ Human review

Step 4: Conflict Resolution

Define rules for conflicting data:

Conflict Resolution Rules:

Employee Count:
- If enrichment.employees EXISTS: use enrichment
- Else if crm.employees EXISTS: use crm
- Else: null

Company Name:
- If enrichment.name EXISTS: use enrichment (most standardized)
- Else if crm.name EXISTS: use crm
- Else: use first source

Last Activity Date:
- Use MAX(all_sources.last_activity)

Engagement Score:
- Aggregate from all sources

Step 5: Build Unified Model

Create the golden record model:

-- Unified Account Model
CREATE TABLE unified_accounts AS
WITH matched_accounts AS (
  -- Matching logic here
),
merged_data AS (
  -- Field selection with priority
)
SELECT
  unified_account_id,
  canonical_domain,
  company_name,
  industry,
  employee_count,
  -- ... more fields
  source_ids, -- Keep track of source records
  updated_at
FROM merged_data;

Step 6: Ongoing Maintenance

Unification is not a one-time project:

Daily

  • Process new records
  • Update existing matches
  • Handle conflicts

Weekly

  • Review potential matches
  • Audit match quality
  • Address duplicates

Monthly

  • Analyze match rates
  • Tune matching rules
  • Review source quality

Unification with Cargo #

Cargo provides native data unification:

Real-Time Unification

Workflow: Record Unification

Trigger: New record from any source

→ Extract: Key identifiers
→ Match: Against existing records
→ If match found:
  → Merge: New data with existing
  → Resolve: Conflicts by priority
→ If no match:
  → Create: New unified record
  → Enrich: With external data
→ Update: All connected systems

Continuous Enrichment

Workflow: Unified Enrichment

Trigger: New unified record OR refresh schedule

→ Query: Enrichment providers
→ Merge: Results with unified record
→ Update: Connected systems
→ Track: Data freshness

Bi-Directional Sync

Workflow: System Synchronization

Trigger: Unified record updated

→ Identify: Connected systems
→ Map: Fields to each system
→ Update: Each connected record
→ Handle: Sync conflicts
→ Log: Sync status

Measuring Unification Quality #

Quality Metrics

MetricDefinitionTarget
Match rate% records matched> 85%
False positive rate% incorrect matches< 2%
Data completeness% fields populated> 90%
Duplicate rateDuplicate records found< 3%
FreshnessAvg data age< 24 hours

Quality Monitoring

Track over time:

  • Match rate trends
  • New duplicate detection
  • Conflict frequency
  • Source quality by system

Common Unification Challenges #

Challenge 1: Company Name Variations

“Acme Corp” vs “Acme Corporation” vs “ACME” vs “Acme, Inc.”

Solutions:

  • Name normalization algorithms
  • Domain as primary key
  • Alias management

Challenge 2: Multiple Domains

Large companies with multiple domains (acme.com, acme.io, acme.co.uk)

Solutions:

  • Domain graph relationships
  • Parent company mapping
  • Enrichment data for hierarchy

Challenge 3: Personal vs. Work Email

User signs up with personal email, then uses work email

Solutions:

  • Multi-email support per contact
  • Company email extraction
  • Progressive linking

Challenge 4: Data Decay

People change jobs, companies get acquired

Solutions:

  • Regular enrichment refresh
  • Job change monitoring
  • Acquisition tracking

Best Practices #

  1. Start with high-value use cases - Unify for specific needs first
  2. Use domain as anchor - Most reliable account identifier
  3. Preserve source data - Keep original records accessible
  4. Monitor continuously - Data quality degrades over time
  5. Document decisions - Why specific rules were chosen

Data unification is foundational to effective revenue operations. Get it right, and everything downstream—scoring, routing, personalization, analytics—becomes dramatically more effective.

Ready to unify your customer data? Cargo’s data unification engine creates a single view of every account and contact across all your systems.

Key Takeaways #

  • Unification enables everything downstream: scoring, routing, personalization, and analytics all depend on having a unified customer view
  • Domain is your anchor: email domain is the most reliable account identifier in B2B—use it as the primary matching key
  • Identity resolution is complex: same person may exist as multiple records across systems with different emails, names, and companies
  • Preserve source data: unification creates a golden record but should preserve original data for audit and edge cases
  • Continuous monitoring required: data quality degrades over time—build processes to detect and fix degradation

Frequently Asked Questions #

MaxMaxApr 22, 2025
grid-square-full

Engineer your growth now

Set the new standard in revenue orchestration.Start creating playbooks to fast-track your success.