ploosh.
Documentation
Fabric shortcuts strategy
When running Ploosh in Microsoft Fabric, your test cases often need to query data located in multiple workspaces. Fabric shortcuts provide a mechanism to make remote data accessible locally without copying it.
The problem
In a typical Fabric environment, data is distributed across multiple workspaces:
- Workspace A: Raw data Lakehouse (ingestion)
- Workspace B: Data warehouse / transformed data
- Workspace C: Reporting / datamart
Ploosh needs to access tables from all these workspaces to run cross-layer validations.
The solution: shortcuts
Shortcuts create virtual links to data in other locations, making it queryable via Spark SQL from the Ploosh Lakehouse.
Types of shortcuts
| Source | Description |
|---|---|
| OneLake | Link to another Fabric Lakehouse in the same or different workspace |
| Azure Data Lake Storage | Link to ADLS Gen2 storage |
| Amazon S3 | Link to S3 buckets |
Creating a shortcut
- Open your Ploosh Lakehouse
- In the Tables section, click New shortcut
- Select the source type (e.g. OneLake)
- Navigate to the target workspace and Lakehouse
- Select the tables to link
- The tables appear as local tables in your Lakehouse
Querying shortcut data
Once shortcuts are created, the remote tables are queryable via Spark SQL using the sql_spark connector:
Test data warehouse employees:
source:
type: sql_spark
query: |
SELECT department, COUNT(*) AS count
FROM dw_lakehouse.employees
GROUP BY department
expected:
type: sql_spark
query: |
SELECT department, count
FROM reportinglakehouse.employeesummary
No connection configuration is required for sql_spark — Spark SQL resolves tables through the Lakehouse metadata.Combining shortcuts with KQL
For workloads that write events to KQL databases, use the fabrickqlspark connector alongside sql_spark:
connections:
kql_events:
type: fabrickqlspark
connection_mode: native
kusto_uri: https://mycluster.kusto.windows.net
databaseid: eventsdb
Test event completeness:
source:
type: fabrickqlspark
connection: kql_events
query: |
ProcessingEvents
| where Timestamp > ago(1d)
| summarize event_count = count() by Pipeline
expected:
type: sql_spark
query: |
SELECT pipeline AS Pipeline, expectedcount AS eventcount
FROM plooshresources.expectedevent_counts
Best practices
- Organize shortcuts by source workspace: Create a naming convention (e.g.
dwlakehouse,rawlakehouse) to identify origins - Use reference tables: Store expected data as tables or files in the Ploosh Lakehouse for tests that don't compare two live sources
- Minimize shortcut count: Only link tables that are actually tested to reduce metadata overhead