Parquet File Input
Description
The Parquet File Input transform reads (primitive) values from an Apache Parquet file.
For more information on this see: Apache Parquet.
Options
Notes:
-
To support reading from any location through Apache VFS each file is loaded into memory (one at a time). Make sure to allocate enough memory to allow this.
-
Long values can be de-serialized to Dates if they are EPOC: milliseconds since
1970-01-01 00:00:00.000
-
Parquet Binary fields are considered to be Hop Strings but you can read them as Hop Binary.
-
All input values are passed to the output
-
INT96 is converted to the Hop Binary data type.
Option | Description |
---|---|
Transform name | Name of the transform this name has to be unique in a single pipeline. |
Filename field | Specify the input field. Use a transform like Get File Names to obtain file names. Any supported file location is fine. |
Fields | In this table you can specify all the fields you want to obtain from the parquet files as well as their desired Hop output type. |
Get fields button | With this button you can select a parquet file from which we’ll read the schema to populate the Fields grid. |