The average B2B company has customer data scattered across 15-20 different systems. The same account appears in your CRM, marketing automation, product analytics, billing system, and support platform—often with different names, conflicting data, and no clear connection between records.
Data unification solves this by creating a single, authoritative view of each customer. This guide covers how to implement customer data unification that actually works.
The Unification Challenge #
Why Data Gets Fragmented
Multiple Entry Points
- Website forms with different fields
- Sales creating records manually
- Marketing importing lists
- Product signups
- Support ticket creation
Different Identifiers
- Email addresses (work vs. personal)
- Company names (variations, subsidiaries)
- Phone numbers (formats)
- Domain names
System Silos
- Each tool maintains its own database
- Limited synchronization
- Different data models
- Conflicting updates
Business Impact
| Problem | Consequence |
|---|---|
| Duplicate accounts | Wrong targeting, wasted spend |
| Missing data | Incomplete personalization |
| Conflicting data | Wrong decisions |
| Delayed sync | Stale information |
| No single view | Manual research required |
Unification Fundamentals #
Entity Resolution
Matching records across systems to the same real-world entity:
Account Matching
- Company name normalization
- Domain matching
- Subsidiary mapping
- Alias handling
Contact Matching
- Email matching
- Name + company matching
- Phone matching
- LinkedIn ID matching
Data Hierarchy
When data conflicts, which source wins?
Source Priority Example
| Field | Priority 1 | Priority 2 | Priority 3 |
|---|---|---|---|
| Company name | Enrichment | CRM | Marketing |
| Employee count | Enrichment | User-provided | CRM |
| Contact email | User-provided | Enrichment | Marketing |
| Revenue | Billing | CRM | Enrichment |
| Engagement | Marketing | Product | CRM |
Golden Record
The authoritative, unified record combining all sources:
Golden Account Record:
├── Core Identity (from resolution)
│ ├── account_id (generated)
│ ├── domain (canonical)
│ └── company_name (normalized)
│
├── Firmographics (from enrichment)
│ ├── industry
│ ├── employee_count
│ ├── annual_revenue
│ └── location
│
├── Engagement (aggregated)
│ ├── total_engagement_score
│ ├── last_activity_date
│ ├── engagement_by_channel
│ └── key_activities
│
├── Revenue (from CRM + billing)
│ ├── pipeline_value
│ ├── closed_revenue
│ ├── subscription_status
│ └── renewal_date
│
└── Relationships (linked)
├── contacts[]
├── opportunities[]
└── activities[]
Unification Architecture #
Approach 1: Warehouse-Centric
Sources → ETL → Warehouse → Transformation → Golden Records → Activation
Process
- Ingest raw data from all sources
- Store in warehouse
- Apply matching logic via SQL/dbt
- Create unified models
- Push to operational systems
Pros
- Full control over logic
- Flexible transformations
- Cost-effective at scale
Cons
- Latency (batch processing)
- Technical expertise required
- Complex to maintain
Approach 2: CDP-Centric
Sources → CDP → Unified Profiles → Activation
Process
- Connect sources to CDP
- CDP handles identity resolution
- Profiles unified automatically
- Activate to destinations
Pros
- Faster implementation
- Built-in identity resolution
- Real-time capabilities
Cons
- Vendor dependency
- Less customization
- Higher cost at scale
Approach 3: Operational Layer
Sources → Operational Platform → Unified View → Actions
Process
- Connect sources to operational platform
- Real-time unification
- Immediate action capabilities
- Bi-directional sync
Pros
- Real-time unification
- Action-oriented
- Purpose-built for GTM
Cons
- May duplicate warehouse capabilities
- Integration requirements
Identity Resolution Techniques #
Deterministic Matching
Exact matches on unique identifiers:
Match Conditions:
- Email exact match
- Domain exact match
- LinkedIn ID match
- Phone number (normalized) match
Strengths: High confidence, no false positives Weaknesses: Misses variations, case sensitivity issues
Probabilistic Matching
Fuzzy matching based on multiple signals:
Match Scoring:
- Company name similarity: 0-30 points
- Domain similarity: 0-25 points
- Location match: 0-15 points
- Industry match: 0-10 points
- Employee count proximity: 0-10 points
- Contact overlap: 0-10 points
Threshold: 70+ points = match
Strengths: Catches variations, more complete Weaknesses: Risk of false positives, requires tuning
Hierarchical Matching
Handle parent/subsidiary relationships:
Company Hierarchy:
├── Ultimate Parent: Acme Corp
│ ├── Subsidiary: Acme EMEA Ltd
│ ├── Subsidiary: Acme APAC Pte
│ └── Subsidiary: Acme Canada Inc
Options:
1. Roll up to parent (enterprise view)
2. Keep separate (regional view)
3. Both (flexible analysis)
Implementation Guide #
Step 1: Source Inventory
Document all customer data sources:
| Source | Entity Types | Key Fields | Volume | Update Frequency |
|---|---|---|---|---|
| Salesforce | Account, Contact | Domain, Email | 50K accounts | Real-time |
| HubSpot | Company, Contact | Domain, Email | 80K contacts | Hourly |
| Segment | User, Group | Email, GroupID | 100K users | Real-time |
| Clearbit | Company, Person | Domain, Email | On-demand | N/A |
Step 2: Identity Key Design
Define your matching keys:
Account Keys
- Primary: Domain (canonical)
- Secondary: Company name (normalized)
- Tertiary: External IDs
Contact Keys
- Primary: Email (lowercase)
- Secondary: LinkedIn URL
- Tertiary: Name + Company
Step 3: Matching Rules
Define how records match:
Account Matching Rules:
Rule 1: Domain Match (Deterministic)
IF source1.domain = source2.domain
THEN match (confidence: 100%)
Rule 2: Name + Location Match (Probabilistic)
IF similarity(source1.name, source2.name) > 0.9
AND source1.city = source2.city
THEN match (confidence: 85%)
Rule 3: Name Only Match (Low confidence)
IF similarity(source1.name, source2.name) > 0.95
THEN potential_match (confidence: 60%)
→ Human review
Step 4: Conflict Resolution
Define rules for conflicting data:
Conflict Resolution Rules:
Employee Count:
- If enrichment.employees EXISTS: use enrichment
- Else if crm.employees EXISTS: use crm
- Else: null
Company Name:
- If enrichment.name EXISTS: use enrichment (most standardized)
- Else if crm.name EXISTS: use crm
- Else: use first source
Last Activity Date:
- Use MAX(all_sources.last_activity)
Engagement Score:
- Aggregate from all sources
Step 5: Build Unified Model
Create the golden record model:
-- Unified Account Model
CREATE TABLE unified_accounts AS
WITH matched_accounts AS (
-- Matching logic here
),
merged_data AS (
-- Field selection with priority
)
SELECT
unified_account_id,
canonical_domain,
company_name,
industry,
employee_count,
-- ... more fields
source_ids, -- Keep track of source records
updated_at
FROM merged_data;
Step 6: Ongoing Maintenance
Unification is not a one-time project:
Daily
- Process new records
- Update existing matches
- Handle conflicts
Weekly
- Review potential matches
- Audit match quality
- Address duplicates
Monthly
- Analyze match rates
- Tune matching rules
- Review source quality
Unification with Cargo #
Cargo provides native data unification:
Real-Time Unification
Workflow: Record Unification
Trigger: New record from any source
→ Extract: Key identifiers
→ Match: Against existing records
→ If match found:
→ Merge: New data with existing
→ Resolve: Conflicts by priority
→ If no match:
→ Create: New unified record
→ Enrich: With external data
→ Update: All connected systems
Continuous Enrichment
Workflow: Unified Enrichment
Trigger: New unified record OR refresh schedule
→ Query: Enrichment providers
→ Merge: Results with unified record
→ Update: Connected systems
→ Track: Data freshness
Bi-Directional Sync
Workflow: System Synchronization
Trigger: Unified record updated
→ Identify: Connected systems
→ Map: Fields to each system
→ Update: Each connected record
→ Handle: Sync conflicts
→ Log: Sync status
Measuring Unification Quality #
Quality Metrics
| Metric | Definition | Target |
|---|---|---|
| Match rate | % records matched | > 85% |
| False positive rate | % incorrect matches | < 2% |
| Data completeness | % fields populated | > 90% |
| Duplicate rate | Duplicate records found | < 3% |
| Freshness | Avg data age | < 24 hours |
Quality Monitoring
Track over time:
- Match rate trends
- New duplicate detection
- Conflict frequency
- Source quality by system
Common Unification Challenges #
Challenge 1: Company Name Variations
“Acme Corp” vs “Acme Corporation” vs “ACME” vs “Acme, Inc.”
Solutions:
- Name normalization algorithms
- Domain as primary key
- Alias management
Challenge 2: Multiple Domains
Large companies with multiple domains (acme.com, acme.io, acme.co.uk)
Solutions:
- Domain graph relationships
- Parent company mapping
- Enrichment data for hierarchy
Challenge 3: Personal vs. Work Email
User signs up with personal email, then uses work email
Solutions:
- Multi-email support per contact
- Company email extraction
- Progressive linking
Challenge 4: Data Decay
People change jobs, companies get acquired
Solutions:
- Regular enrichment refresh
- Job change monitoring
- Acquisition tracking
Best Practices #
- Start with high-value use cases - Unify for specific needs first
- Use domain as anchor - Most reliable account identifier
- Preserve source data - Keep original records accessible
- Monitor continuously - Data quality degrades over time
- Document decisions - Why specific rules were chosen
Data unification is foundational to effective revenue operations. Get it right, and everything downstream—scoring, routing, personalization, analytics—becomes dramatically more effective.
Ready to unify your customer data? Cargo’s data unification engine creates a single view of every account and contact across all your systems.
Key Takeaways #
- Unification enables everything downstream: scoring, routing, personalization, and analytics all depend on having a unified customer view
- Domain is your anchor: email domain is the most reliable account identifier in B2B—use it as the primary matching key
- Identity resolution is complex: same person may exist as multiple records across systems with different emails, names, and companies
- Preserve source data: unification creates a golden record but should preserve original data for audit and edge cases
- Continuous monitoring required: data quality degrades over time—build processes to detect and fix degradation