Fabric shortcuts strategy

When running Ploosh in Microsoft Fabric, your test cases often need to query data located in multiple workspaces. Fabric shortcuts provide a mechanism to make remote data accessible locally without copying it.

The problem

In a typical Fabric environment, data is distributed across multiple workspaces:

Workspace A: Raw data Lakehouse (ingestion)
Workspace B: Data warehouse / transformed data
Workspace C: Reporting / datamart

Ploosh needs to access tables from all these workspaces to run cross-layer validations.

The solution: shortcuts

Shortcuts create virtual links to data in other locations, making it queryable via Spark SQL from the Ploosh Lakehouse.

Types of shortcuts

Source	Description
OneLake	Link to another Fabric Lakehouse in the same or different workspace
Azure Data Lake Storage	Link to ADLS Gen2 storage
Amazon S3	Link to S3 buckets

Creating a shortcut

Open your Ploosh Lakehouse
In the Tables section, click New shortcut
Select the source type (e.g. OneLake)
Navigate to the target workspace and Lakehouse
Select the tables to link
The tables appear as local tables in your Lakehouse

Querying shortcut data

Once shortcuts are created, the remote tables are queryable via Spark SQL using the sql_spark connector:

Test data warehouse employees:
  source:
    type: sql_spark
    query: |
      SELECT department, COUNT(*) AS count
      FROM dw_lakehouse.employees
      GROUP BY department
  expected:
    type: sql_spark
    query: |
      SELECT department, count
      FROM reportinglakehouse.employeesummary

No connection configuration is required for sql_spark — Spark SQL resolves tables through the Lakehouse metadata.

Combining shortcuts with KQL

For workloads that write events to KQL databases, use the fabrickqlspark connector alongside sql_spark:

connections:
  kql_events:
    type: fabrickqlspark
    connection_mode: native
    kusto_uri: https://mycluster.kusto.windows.net
    databaseid: eventsdb

Test event completeness:
  source:
    type: fabrickqlspark
    connection: kql_events
    query: |
      ProcessingEvents
      | where Timestamp > ago(1d)
      | summarize event_count = count() by Pipeline
  expected:
    type: sql_spark
    query: |
      SELECT pipeline AS Pipeline, expectedcount AS eventcount
      FROM plooshresources.expectedevent_counts

Best practices

Organize shortcuts by source workspace: Create a naming convention (e.g. dwlakehouse, rawlakehouse) to identify origins
Use reference tables: Store expected data as tables or files in the Ploosh Lakehouse for tests that don't compare two live sources
Minimize shortcut count: Only link tables that are actually tested to reduce metadata overhead

ploosh.

Documentation