
Case Study: Clean Energy Council Data Platform Stabilisation
Executive Summary
The Clean Energy Council (CEC), Australia's peak body for the clean energy industry, had invested in a Microsoft Fabric-based data platform to power its regulatory and member reporting requirements. One of those requirements was the development of a Memberships Dashboard based on Salesforce data, however this was held up in the UAT phase of development as testers had identified a number of gaps and data accuracy issues relating to Salesforce data ingestion, which was impacting key Memberships KPIs. Whilst the data and reporting team were able to identify the variances and discrepancies, they weren’t sure how to go about fixing the pipelines that had been built previously, and as such needed a partner who could not only fix the immediate issues but also build a foundation for future scalability.
After an initial week of complementary discovery and root cause identification, Archeon Consulting deployed one data engineer via a tactical, one-off engagement to re-engineer the Salesforce ingestion pipelines based on a modern data ingestion pattern.
The Challenges
TheArcheon Approach: Results First, Process Later When we audited the data architecture itself, the root cause was obvious. The platform wasn't failing because of Microsoft Fabric per se, it was failing because a fragile, legacy SQL Server methodology (Stored Procedures) had been shoehorned into a modern tool (Fabric Lakehouses).
Instead of attempting to fix the existing data pipelines and stored procedures, Archeon Consulting instead executed a rapid, targeted remediation plan. We transitioned the platform away from brittle legacy patterns to a modern, resilient, lambda-based ingestion pattern to really take advantage of the Medallion Lakehouse Architecture.
Specific Challenges Identified The Challenges & Archeon's Engineered Solutions
Challenge 1: The "Full Load & Overwrite" Data Purge
The Issue: The existing architecture forced all data sources into a single "Big Pipeline" using a hard-coded full load and overwrite pattern into the bronze layer. When the Salesforce API inevitably hit pagination limits or timed out mid-extraction, the partial data fetch would overwrite and permanently delete the existing historical dataset.
TheArcheon Fix: The "File-First" Landing Zone. We established an idempotent and immutable bronze layer. Data is now extracted incrementally and landed as raw Parquet files. By decoupling extraction from processing, if a subsequent Delta table write fails, zero data is lost. We simply replay the raw files without re-querying the source system
Challenge 2: The "Blind Watermark" creating Ghost Records
The Issue: The legacy solution relied on a strict timestamp check without a safety buffer. Records created milliseconds before the pipeline started, but committed to the database after the fetch began, were missed entirely—leaving silent data gaps.
TheArcheon Fix: The Overlapping Watermark. We implemented a dynamic lookback buffer, subtracting 30 minutes from the last run time to intentionally capture overlapping records. In the Silver layer, we utilized PySpark MERGE (upsert) statements to gracefully apply the latest state based on modification timestamps, ensuring deduplicated, pixel-perfect reporting without the risk of double-counting.
Challenge 3: Brittle Metadata Mapping Causing Pipeline Crashes
The Issue: Extraction queries were built by concatenating a hard-coded list of columns. If a new custom field was added in Salesforce, the pipeline actively ignored it. If a field was renamed or hidden, the entire pipeline crashed.
TheArcheon Fix: Dynamic Schema Evolution. We abandoned hardcoded columns and transitioned to native bulk ingestion using the flexibility of the Spark engine. Utilizing Delta Lake's native mergeSchema capabilities, any new columns added by the business automatically drift into the Bronze and Silver tables. Zero developer intervention required.
Challenge 4: The "One Big Pipeline" Bottleneck
The Issue: A single master pipeline orchestrated everything. A failure in a minor source system (like Monday.com) would block or crash the entire run, starving the Finance team of their Salesforce invoice data.
TheArcheon Fix: Decoupled Orchestration. We completely segregated the architecture, isolating Salesforce into its own independent pipeline to ensure that upstream failures in disparate systems no longer impact core CRM/ERP analytics.
The Business Impact & High-Velocity Delivery
A Strategic Approach to Platform Stabilisation
Archeon didn't just fix broken pipelines; we fundamentally upgraded the CEC's data integrity. The "missing invoice" anomaly was entirely eradicated. Daily ingestion pipelines now execute flawlessly, and the shift to optimized Delta Parquet files has dramatically improved read speeds and compute efficiency.
Most importantly? We executed this entire architectural step-change—from technical audit to DEV refactoring, UAT, and final PROD deployment—in just 30.5 hours of actual delivery time.
Services Provided
Services Provided Archeon Consulting delivered an end-to-end data platform engagement, encompassing:
Data Platform Assessment
Technical audit identifying four critical points of failure; detailed remediation roadmap.
Architecture Redesign
Migration from monolithic "Big Pipeline" to decoupled, source-specific orchestrators.
Fabric Data Engineering
Salesforce ingestion and transformation pipeline development using Microsoft Fabric Data Factory and Lakehouses.
Incremental Ingestion Remediation
Review and implementation of overlapping watermark logic; lookback capture mechanisms.
DevOps & Deployment
DEV-first testing methodology with PROD deployment and scheduling.
Quality Assurance
Full reload assurance checks with row count validation; user acceptance testing coordination.
Knowledge Transfer
Comprehensive technical documentation; handover to Fabric Service Account.
Technology Stack
Tech we used on this project
Salesforce
Ingestion of 10 Salesforce objects via SOQL connector
Microsoft Fabric
Modern, scalable architecture conforming to Azure security best practices.
Data Factory
Fit-for-purpose Salesforce connectors available in Fabric-native Data Factory.
Apache Spark
Enabled Medallion table materialization with ease and aligns with Databricks-style patterns for future AI readiness.
OneLake
Provides robust state management and ACID transaction capabilities
Fabric Pipelines
Automated scheduling, monitoring, and error handling
What Clean Energy Council Say
“Explanations regarding snapshot pipeline setup and backfill considerations were really helpful. ”
“Unit testing post-development was detailed and at the right level of validation. ”
“Explanations regarding snapshot pipeline setup and backfill considerations were really helpful. ”
“There was clear follow-up outlining the rationale, scope boundaries, and success criteria for Phase 1. ”
