archeon-logo-light-1024.png
cec2.png
Clean Energy Council

Case Study: Clean Energy Council Data Platform Stabilisation

Government

Executive Summary

The Clean Energy Council (CEC), Australia's peak body for the clean energy industry, had invested in a Microsoft Fabric-based data platform to power its regulatory and member reporting requirements. One of those requirements was the development of a Memberships Dashboard based on Salesforce data, however this was held up in the UAT phase of development as testers had identified a number of gaps and data accuracy issues relating to Salesforce data ingestion, which was impacting key Memberships KPIs. Whilst the data and reporting team were able to identify the variances and discrepancies, they weren’t sure how to go about fixing the pipelines that had been built previously, and as such needed a partner who could not only fix the immediate issues but also build a foundation for future scalability.

After an initial week of complementary discovery and root cause identification, Archeon Consulting deployed one data engineer via a tactical, one-off engagement to re-engineer the Salesforce ingestion pipelines based on a modern data ingestion pattern.

The Challenges

TheArcheon Approach: Results First, Process Later When we audited the data architecture itself, the root cause was obvious. The platform wasn't failing because of Microsoft Fabric per se, it was failing because a fragile, legacy SQL Server methodology (Stored Procedures) had been shoehorned into a modern tool (Fabric Lakehouses).

Instead of attempting to fix the existing data pipelines and stored procedures, Archeon Consulting instead executed a rapid, targeted remediation plan. We transitioned the platform away from brittle legacy patterns to a modern, resilient, lambda-based ingestion pattern to really take advantage of the Medallion Lakehouse Architecture.

Specific Challenges Identified The Challenges & Archeon's Engineered Solutions

Challenge 1: The "Full Load & Overwrite" Data Purge

The Issue: The existing architecture forced all data sources into a single "Big Pipeline" using a hard-coded full load and overwrite pattern into the bronze layer. When the Salesforce API inevitably hit pagination limits or timed out mid-extraction, the partial data fetch would overwrite and permanently delete the existing historical dataset.

TheArcheon Fix: The "File-First" Landing Zone. We established an idempotent and immutable bronze layer. Data is now extracted incrementally and landed as raw Parquet files. By decoupling extraction from processing, if a subsequent Delta table write fails, zero data is lost. We simply replay the raw files without re-querying the source system

Challenge 2: The "Blind Watermark" creating Ghost Records

The Issue: The legacy solution relied on a strict timestamp check without a safety buffer. Records created milliseconds before the pipeline started, but committed to the database after the fetch began, were missed entirely—leaving silent data gaps.

TheArcheon Fix: The Overlapping Watermark. We implemented a dynamic lookback buffer, subtracting 30 minutes from the last run time to intentionally capture overlapping records. In the Silver layer, we utilized PySpark MERGE (upsert) statements to gracefully apply the latest state based on modification timestamps, ensuring deduplicated, pixel-perfect reporting without the risk of double-counting.

Challenge 3: Brittle Metadata Mapping Causing Pipeline Crashes

The Issue: Extraction queries were built by concatenating a hard-coded list of columns. If a new custom field was added in Salesforce, the pipeline actively ignored it. If a field was renamed or hidden, the entire pipeline crashed.

TheArcheon Fix: Dynamic Schema Evolution. We abandoned hardcoded columns and transitioned to native bulk ingestion using the flexibility of the Spark engine. Utilizing Delta Lake's native mergeSchema capabilities, any new columns added by the business automatically drift into the Bronze and Silver tables. Zero developer intervention required.

Challenge 4: The "One Big Pipeline" Bottleneck

The Issue: A single master pipeline orchestrated everything. A failure in a minor source system (like Monday.com) would block or crash the entire run, starving the Finance team of their Salesforce invoice data.

TheArcheon Fix: Decoupled Orchestration. We completely segregated the architecture, isolating Salesforce into its own independent pipeline to ensure that upstream failures in disparate systems no longer impact core CRM/ERP analytics.

The Business Impact & High-Velocity Delivery

A Strategic Approach to Platform Stabilisation

Archeon didn't just fix broken pipelines; we fundamentally upgraded the CEC's data integrity. The "missing invoice" anomaly was entirely eradicated. Daily ingestion pipelines now execute flawlessly, and the shift to optimized Delta Parquet files has dramatically improved read speeds and compute efficiency.

Most importantly? We executed this entire architectural step-change—from technical audit to DEV refactoring, UAT, and final PROD deployment—in just 30.5 hours of actual delivery time.

Services Provided

Services Provided Archeon Consulting delivered an end-to-end data platform engagement, encompassing:

Data Platform Assessment

Technical audit identifying four critical points of failure; detailed remediation roadmap.

Architecture Redesign

Migration from monolithic "Big Pipeline" to decoupled, source-specific orchestrators.

Fabric Data Engineering

Salesforce ingestion and transformation pipeline development using Microsoft Fabric Data Factory and Lakehouses.

Incremental Ingestion Remediation

Review and implementation of overlapping watermark logic; lookback capture mechanisms.

DevOps & Deployment

DEV-first testing methodology with PROD deployment and scheduling.

Quality Assurance

Full reload assurance checks with row count validation; user acceptance testing coordination.

Knowledge Transfer

Comprehensive technical documentation; handover to Fabric Service Account.

Technology Stack

Tech we used on this project

salesforce.jpg
Source System

Salesforce

Ingestion of 10 Salesforce objects via SOQL connector

Fabric_final_x256.png
Data Platform

Microsoft Fabric

Modern, scalable architecture conforming to Azure security best practices.

fabric data factory.webp
Data Integration

Data Factory

Fit-for-purpose Salesforce connectors available in Fabric-native Data Factory.

Apache_Spark_logo.svg.png
Compute

Apache Spark

Enabled Medallion table materialization with ease and aligns with Databricks-style patterns for future AI readiness.

onelake_512_color.png
Storage

OneLake

Provides robust state management and ACID transaction capabilities

pasted-image.jpg
Orchestration

Fabric Pipelines

Automated scheduling, monitoring, and error handling

What Clean Energy Council Say

Explanations regarding snapshot pipeline setup and backfill considerations were really helpful.
Reporting and Analytics Manager
Unit testing post-development was detailed and at the right level of validation.
Reporting and Analytics Manager
Explanations regarding snapshot pipeline setup and backfill considerations were really helpful.
Reporting and Analytics Manager
There was clear follow-up outlining the rationale, scope boundaries, and success criteria for Phase 1.
Reporting and Analytics Manager