Change Data Capture (CDC) Patterns for Dynamics 365 Integration
Change Data Capture patterns (Change Tracking, Data Lake Export, Synapse Link) enable near-real-time data replication from D365 to analytics & integration platforms.
Change Data Capture (CDC) is the practice of detecting & extracting changes to data so they can be replicated elsewhere. Instead of reading the entire dataset every sync cycle, CDC captures only what changed—additions, updates, deletions—enabling incremental, efficient replication to data lakes, warehouses, & downstream systems.
Dynamics 365 offers multiple CDC mechanisms: Change Tracking (F&O/Dataverse), Azure Data Lake Export, Synapse Link, & custom Debezium/Kafka approaches. This guide covers each mechanism, architectural patterns, trade-offs, & when to choose CDC over full-extract ETL or API polling.
TL;DR
- CDC captures record-level changes (create, update, delete) rather than full syncs; reduces bandwidth & processing load by 10–100x vs full extracts.
- D365 provides built-in CDC: Change Tracking tables in F&O/Dataverse, Azure Data Lake Export, Synapse Link; Debezium adds CDC for non-D365 sources.
- Polling (query change tables on schedule) is simple but higher latency; push (events notify consumer) is lower latency but requires event infrastructure.
- Delta detection uses change versions, timestamps, or modified-date columns to identify what changed since last sync; atomic markers prevent duplicates.
- Data warehousing uses CDC to maintain dimensional & fact tables with slowly changing dimensions (SCD) & late-arriving facts.
- Real-time integrations combine CDC events with message queues (Service Bus, Kafka) to propagate changes sub-second.
- Consistency challenges: late arrivals, out-of-order events, & cascading updates require careful ordering, versioning, & idempotency.
- CDC is best for high-volume incremental syncs; ETL is best for complex transformations; APIs are best for on-demand, low-volume reads.
CDC Foundations & Why It Matters
Without CDC, integrations resort to full extract: read all records from F&O, compare to destination, identify & apply changes. This is simple but expensive:
- Full extracts of large tables (millions of records) take hours.
- Network bandwidth is high.
- Destination systems must store change state (last sync time, hash of each record) to avoid reprocessing.
- Detecting deletions is hard (you don’t know what was removed unless you compare snapshots).
CDC inverts this: instead of you figuring out what changed, the source system tells you. The source database maintains a change log: “Record 12345 was updated at 14:23; Record 67890 was deleted at 14:25.” You read the log, process only those changes, & move on.
Benefits:
- Performance: 10–100x faster for high-volume, low-change-rate scenarios.
- Reliability: Easier to detect deletions & ensure idempotency.
- Scalability: Handles billions of rows efficiently.
- Cost: Lower compute, storage, & network spend.
D365 CDC Mechanisms
Dynamics 365 provides multiple CDC options:
| Mechanism | Module | Destination | Latency | Complexity |
|---|---|---|---|---|
| Change Tracking | F&O, Dataverse | Read via API query | Variable (poll-based) | Low |
| Azure Data Lake Export | F&O | ADLS Gen2 (Parquet + CSV) | Minutes to hours | Low |
| Synapse Link | Dataverse | Azure Synapse (SQL, Spark) | Near real-time (2–5 min) | Low-Medium |
| Debezium | Any DB (SQL Server, PostgreSQL) | Kafka, S3, Elasticsearch, etc. | Sub-second to seconds | High |
| Business Events | F&O | Service Bus, Event Grid, webhooks | Sub-second | Medium |
Change Tracking: Built into F&O & Dataverse. Query the CHANGEDTABLE() function to fetch changes since last version. Simple, but requires active polling.
Azure Data Lake Export: F&O automatically exports table snapshots + deltas to Azure Data Lake Storage Gen2 (Parquet). Scheduled every few minutes. Best for data lakes & warehouses.
Synapse Link: Dataverse-only. Continuously syncs Dataverse tables to Azure Synapse via a managed link. Near real-time, serverless query layer.
Debezium: Third-party log-based CDC. Reads SQL Server transaction log, publishes changes to Kafka. Advanced, but provides sub-second latency.
Business Events: F&O pushes events when records change (see Business Events article). Combines event-driven architecture with change notification.
Polling vs Push Patterns
Two styles of CDC consumption:
| Aspect | Polling | Push |
|---|---|---|
| Trigger | Consumer queries change table on schedule (every 5, 15, 60 min) | Source notifies consumer via event/message when change occurs |
| Latency | Up to poll interval (e.g., 15 min) | Sub-second to seconds |
| Infrastructure | Scheduler (ADF, Logic App, cron job), CDC query API | Event hub (Service Bus, Kafka), message queue |
| Complexity | Low; simple query & batch processing | Higher; event handling, ordering, retries |
| Use Cases | EOD reconciliation, reports, data warehouse syncs | Real-time dashboards, supply chain notifications, sub-second SLA |
Polling Example: Every 30 minutes, a Data Factory pipeline queries F&O’s Change Tracking table for all records changed since the last run. It processes them, inserts/updates the warehouse, & records the max version number.
Push Example: F&O publishes a Business Event “Invoice Posted.” A Logic App receives the event, calls the accounting system API, & updates GL in real time.
Delta Detection Strategies
Detecting what changed is the core of CDC. Common strategies:
1. Change Version Numbers: F&O Change Tracking assigns each change a version. Query CHANGEDTABLE() for changes > last_version. Atomic & reliable.
2. Timestamps (Modified Date): Track ModifiedDateTime on each record. Query records where ModifiedDateTime > @lastSyncTime. Simple, but watch for clock skew & out-of-order updates.
3. CDC Table in Staging: Create a CDC log table (separate from source). When source updates, a trigger writes a CDC row. Consume the CDC log. Requires maintenance but precise control.
4. Watermark Marker: Track a watermark (version, timestamp, or cursor) after each successful sync. Query changes > watermark. Store watermark in destination so retries resume from the right point.
Idempotency Consideration: If a record is updated multiple times in one sync cycle, you may see it multiple times in CDC. Use the ChangedFields array to apply only the latest change, or use record version as the authoritative source.
CDC for Data Warehousing
Data warehouses use CDC to maintain historical & dimensional tables.
Slowly Changing Dimensions (SCD): A customer record has dimensions: Name, Address, Industry. When a customer updates their address, do you overwrite or keep history? SCD Type 2 (slowly changing dimension) creates a new version: same Customer ID, new row with new address & effective date. CDC makes this easy: detect the address change, insert a new dimension row, update active flag.
Fact Table Loads: Fact tables (sales, invoices) append new records. CDC captures new & updated facts. Use a merge strategy: insert new facts, update existing ones (if PO total changed due to credit memo).
Late-Arriving Facts: A shipment is recorded in F&O, but the actual invoice posts 2 days later. The fact table already recorded the shipment. CDC ensures the updated invoice amount flows to the warehouse & fact table is correct.
Azure Data Lake Export + Azure Synapse: F&O exports tables to ADLS in Parquet format. Synapse queries the Parquet files directly. CDC updates append to delta logs. Synapse can then build dimensional & fact models via SQL or Spark.
CDC for Real-Time Integrations
Real-time integrations pair CDC with event infrastructure:
- F&O publishes a Business Event (e.g., “Item Received”) to Service Bus.
- Logic App or Azure Function consumes the event.
- Logic App queries F&O API to fetch full item details (if not in event).
- Logic App posts to warehouse management system.
- WMS updates inventory & alerts picker system.
Alternative using Debezium:
- Debezium connector reads F&O SQL transaction log.
- Publishes changes to Kafka topic (e.g., “inventory.items”).
- Stream processing job (Kafka Streams, Spark Streaming) consumes changes, enriches, & publishes to another topic.
- Multiple consumers subscribe: warehouse system, reporting, analytics.
This pattern decouples producers & consumers, allows multiple subscribers, & provides replay capability (Kafka retains history).
Dynamics 365 Finance & Operations Implementation Overview
Complete roadmap for implementing Dynamics 365 Finance & Operations from pre-assessment and scoping through design, migration, testing, and post-go-live support.
Read MoreDebezium & Kafka Patterns
Debezium is an open-source log-based CDC platform. It reads database transaction logs & publishes changes to Kafka.
How Debezium Works:
- Debezium connector (e.g., SQL Server connector) reads the transaction log from F&O’s underlying SQL database.
- For each change, Debezium creates a Kafka message with before/after state, timestamp, & operation type.
- Publishers (data lake, analytics platform) consume Kafka topics & process changes.
Advantages:
- Sub-second latency: Changes flow to Kafka milliseconds after they hit the database.
- Capture all operations: inserts, updates, deletes, even soft-deletes.
- No code changes to F&O: Debezium taps the transaction log, not the application.
- Kafka provides replay: Consumers can re-process history if needed.
Challenges:
- Requires database admin access (read transaction log).
- Database changes (schema evolution) require Debezium schema updates.
- Kafka infrastructure adds complexity & operational load.
- Ordering: Kafka partitions guarantee order within partition, but not globally (unless using single partition).
Use Case: A large enterprise needs to feed changes from F&O to a data lake, analytics platform, & multiple downstream systems in real time. Debezium/Kafka provides the backbone.
Data Consistency & Ordering Challenges
CDC can introduce consistency issues:
Out-of-Order Events: Two events (PO created, PO confirmed) fire within milliseconds. If they arrive at the consumer out of order, the consumer may try to confirm a PO it hasn’t seen created. Solution: order by timestamp or version; queue & replay if out of order.
Cascading Updates: A purchase order is created, which triggers a forecast recalculation, which updates demand plan. Three CDC events fire. The consumer must handle all three in order or risk stale state in destination system. Solution: use saga patterns or transactional outbox (record all changes to a log, process transactionally).
Late-Arriving Data: A shipment is recorded; warehouse system is notified. 3 days later, the invoice is posted. The fact table already has the shipment. Must update the existing fact (not insert a duplicate). Solution: use idempotent upserts (update if exists, insert if not).
Duplicates: Consumer processes an event, crashes, restarts. Event is reprocessed. Consumer must detect it’s a duplicate & skip or idempotently update. Solution: store event ID in destination; check before processing.
Missing Deletes: If a record is deleted in F&O but CDC only captures it if delete is committed within the log window, you might miss it. Solution: periodically run a full scan to detect orphans.
Performance Impact & Optimization
CDC has a cost:
Source Database Impact: Change Tracking & transaction log reading add overhead. On high-volume databases, this can increase CPU & I/O. Mitigation:
- Change Tracking is lightweight for most tables; disable for rarely-needed tables.
- Debezium should run on a read replica, not production primary.
- Schedule bulk extracts (Data Lake Export) off-peak.
Consumer Processing: Polling adds scheduler overhead; push adds event infrastructure overhead. High-volume CDC (millions of changes/hour) requires:
- Parallel processing (e.g., partition Kafka by entity, process partitions in parallel).
- Batch processing (accumulate 100 changes, process as one batch) to reduce per-message overhead.
- Caching to avoid repeated lookups (e.g., cache customer master while processing 1000 invoice deltas).
Storage: CDC logs accumulate. Retention policy needed. Azure Data Lake Export retains snapshots + deltas for ~30 days. Debezium/Kafka retention is configurable (e.g., 7 days). Purge old change logs to control storage cost.
CDC vs ETL vs API Comparison
When should you use CDC vs full-extract ETL vs API polling?
| Approach | Latency | Bandwidth | Complexity | Best For |
|---|---|---|---|---|
| CDC | Minutes to seconds | Low (changes only) | Medium | High-volume incremental syncs, data warehousing, near real-time dashboards |
| ETL Full Extract | Hours | High (all data) | Low-Medium | Complex transformations, periodic reconciliation, low-frequency imports |
| API Polling | Variable | Medium (filtered data) | Low | On-demand reads, dashboards, low-volume queries, reference data |
| Business Events | Sub-second | Low (change deltas) | Medium-High | Real-time notifications, supply chain, finance events, sub-second SLA |
Example Decision Tree:
- Need to update data warehouse nightly? Use CDC (Data Lake Export). Efficient, simple setup.
- Need real-time alert when invoice posts? Use Business Events. Pushes immediately to email/Teams.
- Need to sync 100 new customers hourly, with address enrichment? Use API with filtering (top 100 by CreatedDate). ETL with transformation is overkill.
- Need to rebuild entire GL from scratch once per quarter? Use full ETL (Azure Data Factory). No CDC overhead, runs once.
- Need all purchase order changes in real time to supply chain system? Use CDC (Debezium/Kafka) or Business Events. High volume, low latency.
Frequently Asked Questions
Q: Does Change Tracking capture soft deletes?
A: Change Tracking captures deletes only if the record is physically deleted from the database. Soft deletes (flagging IsDeleted=1) are treated as updates. To track soft deletes, monitor the IsDeleted field separately.
Q: How far back can I query CDC changes?
A: Change Tracking retention is configurable (F&O defaults to ~7 days). Azure Data Lake Export retains ~30 days. Debezium/Kafka retention is configurable (typically 7–30 days). After retention expires, you’ll need a full extract to catch up.
Q: Can CDC handle schema changes in F&O?
A: If a new column is added to a table, CDC automatically includes it in future changes. If a column is deleted, CDC stops tracking it. Downstream consumers may need schema migration logic. Debezium can handle schema evolution via schema registry.
Q: How do I ensure idempotency with CDC?
A: Store the change ID (or record ID + version) in the destination after processing. Before processing, check if it’s already been applied. If yes, skip. If no, process & record it.
Q: Is CDC faster than API polling?
A: Yes, typically 10–100x faster for large datasets because you only process what changed. APIs require filtering & comparison, which is slower for millions of rows.
Q: Can I combine CDC with real-time events?
A: Yes. Events provide immediate notification; CDC provides reliable durable log. Use events for urgent actions (send alert), CDC for data consistency (ensure database reflects change).
Q: How does Synapse Link compare to Data Lake Export for CDC?
A: Synapse Link is Dataverse-only, provides near real-time sync (2–5 min), & integrates tightly with Synapse. Data Lake Export works for F&O, exports to Parquet, allows querying with Synapse SQL/Spark. Both are CDC; Synapse Link is more managed.
Q: What happens if the consumer falls behind?
A: Messages/events queue up in Service Bus or Kafka. If consumer catches up, all is well. If consumer stays behind beyond retention window, old changes are lost. Solution: scale consumer, or switch to lower-latency pull strategy.
Methodology
This guide synthesizes CDC concepts from database theory, Microsoft’s D365 technical documentation, Apache Kafka & Debezium architecture guides, & enterprise data integration best practices. Topics span CDC mechanisms in D365 (Change Tracking, Data Lake Export, Synapse Link), polling vs push consumption patterns, delta detection strategies, data warehouse & real-time integration use cases, Debezium/Kafka architecture, consistency challenges, performance considerations, & decision frameworks for CDC vs ETL vs API.
Dataset & Sources: Microsoft Learn documentation for Change Tracking, Azure Data Lake Export, Synapse Link; Apache Kafka & Debezium project documentation; enterprise architecture patterns from Gartner & Forrester reports; D365 implementation case studies.
Analytical Approach: Compared CDC mechanisms on latency, complexity, cost, & use-case fit. Evaluated polling vs push tradeoffs. Analyzed consistency & ordering challenges with real-world examples (out-of-order events, late-arriving facts, cascading updates).
Limitations: This guide covers standard D365 CDC options & common third-party tools (Debezium). Custom CDC implementations (e.g., database triggers + custom tables) are beyond scope. Kafka cluster design & tuning is referenced but not detailed.
Data Currency: Accurate as of March 2026. D365 data export & Synapse Link are rapidly evolving; consult Microsoft Learn for latest capabilities & performance benchmarks.
Frequently Asked Questions
Yes. CDC is 10–100x faster for high-volume, low-change scenarios. Instead of reading millions of rows and comparing, CDC captures only changed records. For a table with 1M rows where 10K changed, CDC processes 10K; full extract processes 1M.
Change Tracking retention is configurable; F&O defaults to ~7 days. Azure Data Lake Export retains ~30 days. Debezium/Kafka retention is configurable (typically 7–30 days). After retention expires, you need a full extract to catch up.
Events may be delivered out of order if processed in parallel. Solution: order by timestamp or version number before processing. Queue out-of-order events and replay them in correct sequence, or use a state machine in your consumer that handles out-of-order updates gracefully.
Change Tracking captures physical deletes from the database. Soft deletes (flagging IsDeleted=1) appear as updates. To track soft deletes, monitor the IsDeleted field separately or trigger a custom event when soft-delete occurs.
Change Tracking and transaction log reading add minimal overhead on most tables (~2–5% CPU increase). For high-volume databases, consider running Debezium on a read replica instead of production. Disable Change Tracking on tables you don’t need to monitor.
Store the change ID (or record ID + version) in your destination database after processing. Before processing a new change, check if it's already been applied. If yes, skip. If no, process and record it. This prevents duplicate records from retries.
Related Reading
Dynamics 365 Enterprise Integration: The Complete Guide
What Is Microsoft Dataverse? The Complete Guide for Dynamics 365 Users (2026)
Data Synchronization Patterns for Enterprise Integration
Master data sync architectures for D365 integration. Learn request-reply, fire-and-forget, pub-sub, saga patterns, master data challenges, conflict resolution, bidirectional sync, and operational best practices.