Pipeline Probe
Description
The Pipeline Probe metadata type allows to stream output rows of a pipeline to another pipeline.
This receiving pipeline can then process this data for e.g. data quality, data profiling, data lineage etc.
The Pipeline Probe metadata type works by specifying a receiving pipeline (Pipeline executed to capture logging
). This receiving pipeline is "just another pipeline" that takes a Pipeline Data Probe as the input transform.
The receiving pipeline can then continue to process the probe data with all the functionality Apache Hop pipelines have to offer.
Options
Option | Default | Description |
---|---|---|
Name | The name to be used for this pipeline probe | |
Enabled | true | |
Pipeline executed to capture logging | the pipeline to process the data for this pipeline probe | |
Capture output of the following transforms | list of pipelines and transforms to capture logging for |
Samples
The samples project comes with a preconfigured data probe metadata item, a probing (receiving) pipeline and a source pipeline.
-
pipeline probe: metadata perspective -→ pipeline probes -→ pipeline-probe-example
-
probing (receiving) pipeline:
${PROJECT_HOME}/pipeline-probe/pipeline-probe-example.hpl
-
source pipeline:
${PROJECT_HOME}/pipeline-probe/pipeline-probe-generate-fake-books.hpl
To run this sample and see the pipeline probe in action, all it takes is to run the source pipeline ${PROJECT_HOME}/pipeline-probe/pipeline-probe-generate-fake-books.hpl
.
This pipeline will generate 10.000 lines of fake books data. The pipeline probe will read the last transform in the pipeline (dummy
) and pass the data that flows through this transform to the receiving (probing) transform.
The receiving (probing) pipeline (${PROJECT_HOME}/pipeline-probe/pipeline-probe-example.hpl
) has a Pipeline Data Probe as input. This pipeline will then denormalize the received data to field, count the number of books per genre, sort the results and writes the final data out to a file (${PROJECT_HOME}/books-per-genre/probe-data.csv
).
After pipeline-probe-generate-fake-books.hpl
finished, check the pipeline-probe/output
folder in your samples project for the csv file that contains these results. You’ll find the data that was generated in ${PROJECT_HOME}/pipeline-probe/pipeline-probe-generate-fake-books.hpl
, aggregated by book genre.