Testing a Fabric data platform

This guide demonstrates a complete use case of Ploosh in Microsoft Fabric: validating workloads that write data to Lakehouse tables and events to KQL databases, distributed across multiple workspaces.

Architecture

┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│  Ingestion WS   │     │  Transform WS   │     │  Reporting WS   │
│  ┌───────────┐  │     │  ┌───────────┐  │     │  ┌───────────┐  │
│  │ Raw       │  │     │  │ DW        │  │     │  │ Datamart   │  │
│  │ Lakehouse │  │     │  │ Lakehouse │  │     │  │ Lakehouse  │  │
│  └───────────┘  │     │  └───────────┘  │     │  └───────────┘  │
└────────┬────────┘     └────────┬────────┘     └────────┬────────┘
         │     shortcuts         │     shortcuts         │
         └───────────────┬───────┴───────────────────────┘
                         │
              ┌──────────▼──────────┐
              │    Ploosh WS        │
              │  ┌───────────────┐  │
              │  │ Ploosh        │  │
              │  │ Lakehouse     │  │
              │  │  ├ cases/     │  │
              │  │  ├ outputs/   │  │
              │  │  └ shortcuts/ │  │
              │  ├───────────────┤  │
              │  │ Notebook      │  │
              │  │ Semantic Model│  │
              │  │ PBI Report    │  │
              │  └───────────────┘  │
              └─────────────────────┘

Test scenarios

1. Cross-layer validation

Verify that the transformation layer correctly aggregates raw data:

Test employee aggregation:
  options:
    sort:
      - department
  source:
    type: sql_spark
    query: |
      SELECT department, COUNT(*) AS employee_count
      FROM raw_lakehouse.employees
      GROUP BY department
  expected:
    type: sql_spark
    query: |
      SELECT department, employee_count
      FROM dwlakehouse.dimdepartment_summary

2. KQL event completeness

Verify that pipeline events are correctly logged:

Test pipeline events logged:
  source:
    type: fabrickqlspark
    connection: kql_events
    query: |
      PipelineEvents
      | where Timestamp > ago(1d)
      | summarize event_count = count() by PipelineName
      | order by PipelineName asc
  expected:
    type: sql_spark
    query: |
      SELECT pipelinename AS PipelineName, expectedcount AS event_count
      FROM plooshresources.expecteddaily_events
      ORDER BY PipelineName ASC

3. Data quality checks

Detect anomalies using the empty connector:

Test no NULL customer IDs:
  source:
    type: sql_spark
    query: |
      SELECT *
      FROM dwlakehouse.factorders
      WHERE customer_id IS NULL
  expected:
    type: empty_sparkTest no duplicate orders:
  source:
    type: sql_spark
    query: |
      SELECT order_id, COUNT(*) AS cnt
      FROM dwlakehouse.factorders
      GROUP BY order_id
      HAVING cnt > 1
  expected:
    type: empty_spark

4. Reference data validation

Compare Lakehouse data against known reference files:

Test product categories:
  options:
    sort:
      - category_id
  source:
    type: sql_spark
    query: |
      SELECT categoryid, categoryname, is_active
      FROM dwlakehouse.dimproduct_category
  expected:
    type: csv_spark
    path: /lakehouse/default/Files/plooshresources/expectedcategories.csv
    header: true
    inferSchema: true

Connections file

kql_events:
  type: fabrickqlspark
  connection_mode: native
  kusto_uri: https://mycluster.kusto.windows.net
  databaseid: eventsdatabase

Spark SQL queries against Lakehouse tables (via shortcuts) do not require a connection definition.

Results exploitation

After execution, the results are:

Exported as JSON for immediate review
Persisted to a Delta table (ploosh_results) for historical tracking
Visualized in Power BI through a Semantic Model

See Fabric reporting for dashboard setup details.

End-to-end automation

Upstream pipeline completes data processing
Fabric Pipeline triggers the Ploosh notebook
Ploosh executes all test cases and exports results
History tracking appends results to the Delta table
Power BI refreshes and shows updated quality metrics
Alerts notify the team if tests fail

ploosh.

Documentation