Merge rows (diff) transform Icon Merge rows (diff)

Description

The Merge Rows (Diff) transform compares and merges rows between two data streams called Reference (previous data) and Compare (new data), adding a flag to each row.

Supported Engines

Hop Engine

Supported

Spark

Maybe Supported

Flink

Maybe Supported

Dataflow

Maybe Supported

Usage

This transform is useful for comparing data collected at two different times.

For example, the source system of your data warehouse might not contain a timestamp of the last data update.

You could use this transform to compare the two data streams and merge the dates and timestamps in the rows.

Based on key fields and comparison fields, this transform merges reference rows with compare rows and creates merged output rows.

A flag in each row indicates how the values were compared and merged. The possible flag values are:

  • identical: The key was found in both rows, and the compared values are identical.

  • changed: The key was found in both rows, but one or more compared values are different.

  • new: The key was not found in the reference rows.

  • deleted: The key was not found in the compare rows.

If the row’s flag is identical or deleted, the merged output row is based on the reference row.

If the flag is new or changed, the merged output row is based on the compare row.

You can also send the merged and flagged rows to a subsequent transform in your pipeline, such as Switch-Case or Synchronize after merge. In the subsequent transform, you can use the flag field generated by Merge rows (diff) to control updates/inserts/deletes on a target table.

Options

Option Description

Transform name

Name of the transform.

Reference rows origin

Specify the transform that produces the reference rows. It’s a Stream with original rows (rows that you want to compare the new rows to).

Compare rows origin

Specify the transform that produces the compare rows. It’s a Stream with new rows

Flag fieldname

Specify the name of the flag field on the output stream.

Keys to match

Specify fields containing the keys on which to match. Click "Get key fields" to insert all of the fields from the reference rows

Values to compare

Specify fields contaning the values to compare. Click "Get value fields" to insert all of the fields from the compare rows. Key fields do not need to be repeated here.

Metadata Injection Support

All fields of this transform support metadata injection. You can use this transform with ETL Metadata Injection to pass metadata to your pipeline at runtime.