Core concepts

Test case

A test case is defined in a YAML file and consists of:

  • A source: the data to validate (query result, file content, etc.)
  • An expected: the reference data to compare against
  • Options (optional): comparison settings (sort, cast, ignore, tolerance, etc.)
My test case:
  disabled: false
  options:
    sort:
      - column1
  source:
    type: mysql
    connection: my_connection
    query: SELECT * FROM my_table
  expected:
    type: csv
    path: ./expected_data.csv

Connectors

A connector defines how to read data from a source. Each connector has:

  • A type identifier (e.g. mysql, csv, parquet_spark)
  • Optional connection parameters (hostname, credentials, etc.) defined in a connections file
  • Configuration parameters (query, path, etc.) defined in the test case

There are two families of connectors:

FamilyEngineConnectors
NativePandasCSV, JSON, Parquet, Delta, Excel, Empty, MySQL, PostgreSQL, SQL Server, Snowflake, BigQuery, Databricks, ODBC, Analysis Services, Semantic Model
SparkPySparkCSV Spark, JSON Spark, Parquet Spark, Delta Spark, Empty Spark, SQL Spark, Fabric KQL Spark, Dremio Spark
⚠️ Both source and expected must use the same family. You cannot mix a native connector with a Spark connector in the same test case.

Connections

Connections are defined in a separate YAML file and contain the parameters needed to connect to data sources (hostnames, credentials, ports, etc.). They are referenced by name in the test case configuration.

my_connection:
  type: mysql
  hostname: server.database.windows.net
  database: my_db
  username: user
  password: $var.db_password

Compare engine

The compare engine validates that source and expected datasets match. The comparison process follows three steps:

  1. Structure check: Verify that both datasets have the same columns (case-insensitive, trimmed).
  2. Row count check: Verify that both datasets have the same number of rows.
  3. Data comparison: Compare data row by row, applying preprocessing options (trim, case_insensitive, tolerance).

Compare modes

ModeDescriptionAvailable in
orderRow-by-row comparison based on row position or sort orderNative, Spark
joinMatch rows by join keys instead of positionSpark only

Test case states

StateDescription
passedSource and expected datasets match
failedDifferences found between datasets
errorAn exception occurred during execution
notExecutedTest case is disabled

Error types

When a test case fails or errors, an error type indicates the stage where the issue occurred:

Error typeDescription
headersColumn structure mismatch
countRow count mismatch
dataData values differ between source and expected
compareException during comparison

Exporters

Exporters save test results to files. All exporters also generate Excel files (.xlsx) with detailed gap analysis for failed test cases.

FormatOutput fileDescription
JSONtestresults.jsonStructured results with nested objects
CSVtestresults.csvFlattened results, one row per test
TRXtest_results.xmlVisual Studio Test Results format, compatible with Azure DevOps
For each failed test, an .xlsx file is generated containing only the rows and columns that differ, with {column}source and {column}expected side by side. This makes it easy to identify exactly where source and expected data diverge.

Custom parameters

Variables can be used in YAML files with the $var. syntax and passed at runtime via --p_ arguments. This avoids hardcoding sensitive information like passwords.

# In connections.yml
my_connection:
  type: mysql
  password: $var.db_password
# At runtime
ploosh --connections connections.yml --cases testcases --pdb_password "secret"