Microsoft Fabric setup

This guide details how to set up Ploosh in a Microsoft Fabric environment to validate your data platform workloads.

Architecture overview

The recommended architecture is based on a dedicated Fabric workspace structured as follows:

Ploosh Workspace
├── Python Environment     → Ploosh package pre-installed
├── Lakehouse
│   ├── Files/
│   │   ├── ploosh_cases/         → Test case definitions (YAML)
│   │   ├── ploosh_connections.yaml → Connection definitions
│   │   ├── ploosh_resources/     → Reference datasets (CSV, Parquet, etc.)
│   │   └── ploosh_outputs/       → Test results (JSON, XLSX)
│   ├── Tables/
│   │   └── ploosh_results        → Results history (Delta table)
│   └── Shortcuts/                → Links to other workspace Lakehouses
├── Notebook                → Orchestration notebook
├── Semantic Model          → Results exposure for analysis
└── Power BI Report         → Quality dashboard

Step 1: Create the Python environment

In your Fabric workspace, create a new Environment
In the environment settings, add ploosh as a pip package
Save and publish the environment

This ensures Ploosh is available by default in all notebooks using this environment.

Step 2: Create the Lakehouse

Create a Lakehouse named (e.g. ploosh_lakehouse) and organize the Files/ folder:

Folder	Purpose
`plooshcases/`	Test case YAML files
`plooshresources/`	Reference data files (CSV, JSON, Parquet) used in expected tests
`ploosh_outputs/`	Output folder for test results

Upload your ploosh_connections.yaml file to Files/.

Step 3: Configure shortcuts

To access data located in other Fabric workspaces, use shortcuts:

In the Lakehouse, go to the Tables section
Click New shortcut
Select the source (OneLake, Azure Data Lake, etc.)
Map the target Lakehouse tables from other workspaces

This makes remote tables queryable via Spark SQL as if they were local.

See Shortcuts strategy for more details.

Step 4: Create the connections file

Create a ploosh_connections.yaml file for your Fabric data sources:

# For KQL databases
kql_connection:
  type: fabrickqlspark
  connection_mode: native
  kusto_uri: https://mycluster.kusto.windows.net
  databaseid: mykql_database
No connection needed for Spark SQL (uses shortcuts)

Spark SQL queries against Lakehouse tables via shortcuts do not require a connection definition. Use the sql_spark connector directly.

Step 5: Create test cases

Create YAML files in ploosh_cases/:

Test employee count:
  source:
    type: sql_spark
    query: |
      SELECT department, COUNT(*) AS employee_count
      FROM hr_lakehouse.employees
      GROUP BY department
  expected:
    type: sql_spark
    query: |
      SELECT department, expectedcount AS employeecount
      FROM plooshresources.expectedemployee_countsTest no KQL anomalies:
  source:
    type: fabrickqlspark
    connection: kql_connection
    query: |
      AnomalyEvents
      | where Timestamp > ago(1d)
      | where Severity == "Critical"
  expected:
    type: empty_spark

Step 6: Create the orchestration notebook

See Fabric notebook orchestration for a complete notebook implementation.

Step 7: Schedule execution

You can automate test execution through:

Fabric Pipeline: Add a Notebook activity pointing to the orchestration notebook
Schedule: Configure a recurring schedule on the notebook directly
Event trigger: Trigger tests after upstream pipeline completion

See Fabric pipeline integration for pipeline examples.

ploosh.

Documentation