Variables

What is a Hop variable?

You don’t want to hardcode your solutions. It’s simply bad form to hardcode host names, user names, passwords, directories and so on. Variables allow your solutions to adapt to a changing environment. If for example the database server is different when developing than it is when running in production, you set it as a variable.

How do I use a variable?

In the Hop user interface all places where you can enter a variable have a '$' symbol to the right of the input field:

You can specify a variable by referencing it like this:

${VARIABLE_NAME}

TIP You can see a list of defined variables by using CTRL-SPACE (CMD-SPACE on OSX) in the input field. This helper will insert a selected variable into the input field.

Please note that you don’t have to specify a variable in these places and that you can combine the variable with other information like:

${ENVIRONMENT_HOME}/input/source-file.txt

Hexadecimal values

In rare cases you might have a need to enter non-character values as separators in 'binary' text files with for example a zero byte as a separator. In those scenarios you can use a special 'variable' format:

$[<hexadecimal value>]

A few examples:

Value Meaning

Value	Meaning
`$[00]`	A single byte with decimal value 0
`$[FFFF]`	Two bytes with decimal value 65535
`$[CEA3]`	The 2 bytes representing UTF-8 character Σ
`$[e28ca8]`	The 3 bytes representing UTF-8 character ⌨
`$[f09d849e]`	The 4 bytes representing UTF-8 character 𝄞

$[00]

A single byte with decimal value 0

$[FFFF]

Two bytes with decimal value 65535

$[CEA3]

The 2 bytes representing UTF-8 character Σ

$[e28ca8]

The 3 bytes representing UTF-8 character ⌨

$[f09d849e]

The 4 bytes representing UTF-8 character 𝄞

How can I define variables?

Variables can be defined and set in all the places where it makes sense:

in hop-config.json when it applies to the installation
in an environment configuration file when it concerns a specific lifecycle environment
In a project
In a pipeline run configuration
As a default parameter value in a pipeline or workflow
Using the Set Variables transform in a pipeline
Using the Set Variables action in a workflow
When executing with Hop run

Locality

Variables are local to the place where they are defined. This means that setting them in a certain place means that it’s going to be inherited from that place on. This also means that’s important to know where variables can be set and used and the hierarchy behind it.

Hierarchy

Table 1. Variables Hierarchy
Place	Inheritance
System properties	Inherited by the JVM and all other places in Hop
Environment	Inherited by run configurations
Run Configurations	Pipeline run configurations are inherited by the environment
Pipeline	Inherited by the pipeline run configuration
Workflow	Inherited by the workflow run configuration
Metadata objects	Inherited by the place where it is loaded

System properties

All system properties defined in the Hop configuration file 'hop-config.json' are available as variables as well as all Java system properties. You can define new system properties in the Hop GUI using the system properties editor:

You can also edit the 'hop-config.json' file manually:

{
  "systemProperties" : {
    "MY_SYSTEM_PROPERTY" : "SomeValue"
  }
}

You can also use the hop-config command line tool to define system properties:

sh hop-config.sh -s MY_SYSTEM_PROPERTY=SomeValue

System properties get set in Java Virtual Machine that Hop is running. This means that you should limit yourself to only those variables which are really system specific.

Environment Variables

As you can read in the documentation regarding environments you can set variables there as well. This helps you configure folders and other things which are environment specific.

You can set those in the Environment settings dialog or using the command line:

sh hop-config.sh -e MyEnvironment -em -ev VARIABLE1=value1

Run Configurations

You can specify variables here to make a pipeline or workflow run in an engine agnostic way. As an example you can have the same pipeline run on Hadoop with Spark and specify an input directory using hdfs:// and on Google DataFlow using gs://

Workflow

You can define variables in a workflow either with the "Set Variables", "JavaScript" actions or by defining parameters. Parameters are variables which have a description and can have a default value.

Pipelines

You can define variables in a pipeline either with the "Set Variables", "JavaScript" transforms or by defining parameters. Parameters are variables which have a description and can have a default value.

IMPORTANT Since in pipelines all transforms run in parallel you should never try to set and use the same variable in the same pipeline.

Available global variables

The following variables are available in Hop through the configuration perspective. If you have the menu toolbar enabled in the configuration perspective’s General tab, these variables are also available through Tools → Edit config variables.

Variable name	Default Value	Description
HOP_AGGREGATION_ALL_NULLS_ARE_ZERO	N	Set this variable to Y to return 0 when all values within an aggregate are NULL. Otherwise by default a NULL is returned when all values are NULL.
HOP_AGGREGATION_MIN_NULL_IS_VALUED	N	Set this variable to Y to set the minimum to NULL if NULL is within an aggregate. Otherwise by default NULL is ignored by the MIN aggregate and MIN is set to the minimum value that is not NULL. See also the variable HOP_AGGREGATION_ALL_NULLS_ARE_ZERO.
HOP_ALLOW_EMPTY_FIELD_NAMES_AND_TYPES	N	Set this variable to Y to allow your pipeline to pass 'null' fields and/or empty types.
HOP_BATCHING_ROWSET	N	Set this variable to 'Y' if you want to test a more efficient batching row set.
HOP_DEFAULT_BIGNUMBER_FORMAT		The name of the variable containing an alternative default bignumber format
HOP_DEFAULT_BUFFER_POLLING_WAITTIME	20	This is the default polling frequency for the transforms input buffer (in ms)
HOP_DEFAULT_DATE_FORMAT		The name of the variable containing an alternative default date format
HOP_DEFAULT_INTEGER_FORMAT		The name of the variable containing an alternative default integer format
HOP_DEFAULT_NUMBER_FORMAT		The name of the variable containing an alternative default number format
HOP_DEFAULT_SERVLET_ENCODING		Defines the default encoding for servlets, leave it empty to use Java default encoding
HOP_DEFAULT_TIMESTAMP_FORMAT		The name of the variable containing an alternative default timestamp format
HOP_DISABLE_CONSOLE_LOGGING	N	Set this variable to Y to disable standard Hop logging to the console. (stdout)
HOP_EMPTY_STRING_DIFFERS_FROM_NULL	N	NULL vs Empty String. If this setting is set to Y, an empty string and null are different. Otherwise they are not.
HOP_FILE_OUTPUT_MAX_STREAM_COUNT	1024	This project variable is used by the Text File Output transform. It defines the max number of simultaneously open files within the transform. The transform will close/reopen files as necessary to insure the max is not exceeded
HOP_FILE_OUTPUT_MAX_STREAM_LIFE	0	This project variable is used by the Text File Output transform. It defines the max number of milliseconds between flushes of files opened by the transform.
HOP_GLOBAL_LOG_VARIABLES_CLEAR_ON_EXPORT	N	Set this variable to N to preserve global log variables defined in pipeline / workflow Properties → Log panel. Changing it to true will clear it when export pipeline / workflow.
HOP_JSON_INPUT_INCLUDE_NULLS	N	Name of te variable to set so that Nulls are considered while parsing JSON files. If HOP_JSON_INPUT_INCLUDE_NULLS is "Y" then nulls will be included otherwise they will not be included (default behavior)
HOP_LENIENT_STRING_TO_NUMBER_CONVERSION	N	System wide flag to allow lenient string to number conversion for backward compatibility. If this setting is set to "Y", an string starting with digits will be converted successfully into a number. (example: 192.168.1.1 will be converted into 192 or 192.168 or 192168 depending on the decimal and grouping symbol). The default (N) will be to throw an error if non-numeric symbols are found in the string.
HOP_LICENSE_HEADER_FILE	-	This is the name of the variable which when set should contains the path to a file which will be included in the serialization of pipelines and workflows
HOP_LOG_MARK_MAPPINGS	N	Set this variable to 'Y' to precede transform/action name in log lines with the complete path to the transform/action. Useful to perfectly identify where a problem happened in our process.
HOP_LOG_SIZE_LIMIT	0	The log size limit for all pipelines and workflows that don’t have the "log size limit" property set in their respective properties.
HOP_LOG_TAB_REFRESH_DELAY	1000	The hop log tab refresh delay.
HOP_LOG_TAB_REFRESH_PERIOD	1000	The hop log tab refresh period.
HOP_MAX_ACTIONS_LOGGED	5000	The maximum number of action results kept in memory for logging purposes.
HOP_MAX_LOGGING_REGISTRY_SIZE	10000	The maximum number of logging registry entries kept in memory for logging purposes.
HOP_MAX_LOG_SIZE_IN_LINES	0	The maximum number of log lines that are kept internally by Hop. Set to 0 to keep all rows (default)
HOP_MAX_LOG_TIMEOUT_IN_MINUTES	1440	The maximum age (in minutes) of a log line while being kept internally by Hop. Set to 0 to keep all rows indefinitely (default)
HOP_MAX_TAB_LENGTH	-	A variable to configure Tab size
HOP_MAX_WORKFLOW_TRACKER_SIZE	5000	The maximum number of workflow trackers kept in memory
HOP_PASSWORD_ENCODER_PLUGIN	Hop	Specifies the password encoder plugin to use by ID (Hop is the default).
HOP_PIPELINE_ROWSET_SIZE	-	Name of the environment variable that contains the size of the pipeline rowset size. This overwrites values that you set pipeline settings
HOP_PLUGIN_CLASSES		A comma delimited list of classes to scan for plugin annotations
HOP_ROWSET_GET_TIMEOUT	50	The name of the variable that optionally contains an alternative rowset get timeout (in ms). This only makes a difference for extremely short lived pipelines.
HOP_ROWSET_PUT_TIMEOUT	50	The name of the variable that optionally contains an alternative rowset put timeout (in ms). This only makes a difference for extremely short lived pipelines.
HOP_S3_VFS_PART_SIZE	5MB	The default part size for multi-part uploads of new files to S3 (added and used by by the AWS S3 VFS plugin)
HOP_SERVER_DETECTION_TIMER	-	The name of the variable that defines the timer used for detecting server nodes
HOP_SERVER_JETTY_ACCEPTORS		A variable to configure jetty option: acceptors for Carte
HOP_SERVER_JETTY_ACCEPT_QUEUE_SIZE		A variable to configure jetty option: acceptQueueSize for Carte
HOP_SERVER_JETTY_RES_MAX_IDLE_TIME		A variable to configure jetty option: lowResourcesMaxIdleTime for Carte
HOP_SERVER_OBJECT_TIMEOUT_MINUTES	1440	This project variable will set a time-out after which waiting, completed or stopped pipelines and workflows will be automatically cleaned up. The default value is 1440 (one day).
HOP_SERVER_REFRESH_STATUS	-	A variable to configure refresh for Hop server workflow/pipeline status page
HOP_SPLIT_FIELDS_REMOVE_ENCLOSURE	N	Set this variable to N to preserve enclosure symbol after splitting the string in the Split fields transform. Changing it to true will remove first and last enclosure symbol from the resulting string chunks.
HOP_SYSTEM_HOSTNAME		You can use this variable to speed up hostname lookup. Hostname lookup is performed by Hop so that it is capable of logging the server on which a workflow or pipeline is executed.
HOP_TRANSFORM_PERFORMANCE_SNAPSHOT_LIMIT	0	The maximum number of transform performance snapshots to keep in memory. Set to 0 to keep all snapshots indefinitely (default)
HOP_USE_NATIVE_FILE_DIALOG	N	Set this value to Y if you want to use the system file open/save dialog when browsing files
HOP_ZIP_MAX_ENTRY_SIZE_DEFAULT_STRING
HOP_ZIP_MAX_TEXT_SIZE	-	A variable to configure the maximum number of characters of text that are extracted before an exception is thrown during extracting text from documents
HOP_ZIP_MAX_TEXT_SIZE_DEFAULT_STRING	-
HOP_ZIP_MIN_INFLATE_RATIO	-	A variable to configure the minimum allowed ratio between de- and inflated bytes to detect a zipbomb
HOP_ZIP_MIN_INFLATE_RATIO_DEFAULT_STRING	-
NEO4J_LOGGING_CONNECTION		Set this variable to the name of an existing Neo4j connection to enable execution logging to a Neo4j database.

Variable name

Default Value

Description

HOP_AGGREGATION_ALL_NULLS_ARE_ZERO

Set this variable to Y to return 0 when all values within an aggregate are NULL. Otherwise by default a NULL is returned when all values are NULL.

HOP_AGGREGATION_MIN_NULL_IS_VALUED

Set this variable to Y to set the minimum to NULL if NULL is within an aggregate. Otherwise by default NULL is ignored by the MIN aggregate and MIN is set to the minimum value that is not NULL. See also the variable HOP_AGGREGATION_ALL_NULLS_ARE_ZERO.

HOP_ALLOW_EMPTY_FIELD_NAMES_AND_TYPES

Set this variable to Y to allow your pipeline to pass 'null' fields and/or empty types.

HOP_BATCHING_ROWSET

Set this variable to 'Y' if you want to test a more efficient batching row set.

HOP_DEFAULT_BIGNUMBER_FORMAT

The name of the variable containing an alternative default bignumber format

HOP_DEFAULT_BUFFER_POLLING_WAITTIME

This is the default polling frequency for the transforms input buffer (in ms)

HOP_DEFAULT_DATE_FORMAT

The name of the variable containing an alternative default date format

HOP_DEFAULT_INTEGER_FORMAT

The name of the variable containing an alternative default integer format

HOP_DEFAULT_NUMBER_FORMAT

The name of the variable containing an alternative default number format

HOP_DEFAULT_SERVLET_ENCODING

Defines the default encoding for servlets, leave it empty to use Java default encoding

HOP_DEFAULT_TIMESTAMP_FORMAT

The name of the variable containing an alternative default timestamp format

HOP_DISABLE_CONSOLE_LOGGING

Set this variable to Y to disable standard Hop logging to the console. (stdout)

HOP_EMPTY_STRING_DIFFERS_FROM_NULL

NULL vs Empty String. If this setting is set to Y, an empty string and null are different. Otherwise they are not.

HOP_FILE_OUTPUT_MAX_STREAM_COUNT

1024

This project variable is used by the Text File Output transform. It defines the max number of simultaneously open files within the transform. The transform will close/reopen files as necessary to insure the max is not exceeded

HOP_FILE_OUTPUT_MAX_STREAM_LIFE

This project variable is used by the Text File Output transform. It defines the max number of milliseconds between flushes of files opened by the transform.

HOP_GLOBAL_LOG_VARIABLES_CLEAR_ON_EXPORT

Set this variable to N to preserve global log variables defined in pipeline / workflow Properties → Log panel. Changing it to true will clear it when export pipeline / workflow.

HOP_JSON_INPUT_INCLUDE_NULLS

Name of te variable to set so that Nulls are considered while parsing JSON files. If HOP_JSON_INPUT_INCLUDE_NULLS is "Y" then nulls will be included otherwise they will not be included (default behavior)

HOP_LENIENT_STRING_TO_NUMBER_CONVERSION

System wide flag to allow lenient string to number conversion for backward compatibility. If this setting is set to "Y", an string starting with digits will be converted successfully into a number. (example: 192.168.1.1 will be converted into 192 or 192.168 or 192168 depending on the decimal and grouping symbol). The default (N) will be to throw an error if non-numeric symbols are found in the string.

HOP_LICENSE_HEADER_FILE

This is the name of the variable which when set should contains the path to a file which will be included in the serialization of pipelines and workflows

HOP_LOG_MARK_MAPPINGS

Set this variable to 'Y' to precede transform/action name in log lines with the complete path to the transform/action. Useful to perfectly identify where a problem happened in our process.

HOP_LOG_SIZE_LIMIT

The log size limit for all pipelines and workflows that don’t have the "log size limit" property set in their respective properties.

HOP_LOG_TAB_REFRESH_DELAY

1000

The hop log tab refresh delay.

HOP_LOG_TAB_REFRESH_PERIOD

1000

The hop log tab refresh period.

HOP_MAX_ACTIONS_LOGGED

5000

The maximum number of action results kept in memory for logging purposes.

HOP_MAX_LOGGING_REGISTRY_SIZE

10000

The maximum number of logging registry entries kept in memory for logging purposes.

HOP_MAX_LOG_SIZE_IN_LINES

The maximum number of log lines that are kept internally by Hop. Set to 0 to keep all rows (default)

HOP_MAX_LOG_TIMEOUT_IN_MINUTES

1440

The maximum age (in minutes) of a log line while being kept internally by Hop. Set to 0 to keep all rows indefinitely (default)

HOP_MAX_TAB_LENGTH

A variable to configure Tab size

HOP_MAX_WORKFLOW_TRACKER_SIZE

5000

The maximum number of workflow trackers kept in memory

HOP_PASSWORD_ENCODER_PLUGIN

Hop

Specifies the password encoder plugin to use by ID (Hop is the default).

HOP_PIPELINE_ROWSET_SIZE

Name of the environment variable that contains the size of the pipeline rowset size. This overwrites values that you set pipeline settings

HOP_PLUGIN_CLASSES

A comma delimited list of classes to scan for plugin annotations

HOP_ROWSET_GET_TIMEOUT

The name of the variable that optionally contains an alternative rowset get timeout (in ms). This only makes a difference for extremely short lived pipelines.

HOP_ROWSET_PUT_TIMEOUT

The name of the variable that optionally contains an alternative rowset put timeout (in ms). This only makes a difference for extremely short lived pipelines.

HOP_S3_VFS_PART_SIZE

5MB

The default part size for multi-part uploads of new files to S3 (added and used by by the AWS S3 VFS plugin)

HOP_SERVER_DETECTION_TIMER

The name of the variable that defines the timer used for detecting server nodes

HOP_SERVER_JETTY_ACCEPTORS

A variable to configure jetty option: acceptors for Carte

HOP_SERVER_JETTY_ACCEPT_QUEUE_SIZE

A variable to configure jetty option: acceptQueueSize for Carte

HOP_SERVER_JETTY_RES_MAX_IDLE_TIME

A variable to configure jetty option: lowResourcesMaxIdleTime for Carte

HOP_SERVER_OBJECT_TIMEOUT_MINUTES

1440

This project variable will set a time-out after which waiting, completed or stopped pipelines and workflows will be automatically cleaned up. The default value is 1440 (one day).

HOP_SERVER_REFRESH_STATUS

A variable to configure refresh for Hop server workflow/pipeline status page

HOP_SPLIT_FIELDS_REMOVE_ENCLOSURE

Set this variable to N to preserve enclosure symbol after splitting the string in the Split fields transform. Changing it to true will remove first and last enclosure symbol from the resulting string chunks.

HOP_SYSTEM_HOSTNAME

You can use this variable to speed up hostname lookup. Hostname lookup is performed by Hop so that it is capable of logging the server on which a workflow or pipeline is executed.

HOP_TRANSFORM_PERFORMANCE_SNAPSHOT_LIMIT

The maximum number of transform performance snapshots to keep in memory. Set to 0 to keep all snapshots indefinitely (default)

HOP_USE_NATIVE_FILE_DIALOG

Set this value to Y if you want to use the system file open/save dialog when browsing files

HOP_ZIP_MAX_ENTRY_SIZE_DEFAULT_STRING

HOP_ZIP_MAX_TEXT_SIZE

A variable to configure the maximum number of characters of text that are extracted before an exception is thrown during extracting text from documents

HOP_ZIP_MAX_TEXT_SIZE_DEFAULT_STRING

HOP_ZIP_MIN_INFLATE_RATIO

A variable to configure the minimum allowed ratio between de- and inflated bytes to detect a zipbomb

HOP_ZIP_MIN_INFLATE_RATIO_DEFAULT_STRING

NEO4J_LOGGING_CONNECTION

Set this variable to the name of an existing Neo4j connection to enable execution logging to a Neo4j database.

Environment variables

Set the environment variables listed below in your operating system to configure Hop’s startup behavior:

HOP_AUDIT_FOLDER: Set this variable to a valid path on your machine to store Hop’s audit information. This information includes last opened files per project, zoom size and lots more.

HOP_CONFIG_FOLDER: Hop stores your configuration in the config folder by default. Set this environment variable to point to a folder outside of your Hop installation to keep your configuration, projects and environment list etc, no matter which Hop version or installation you use.

copy the contents of an existing hop/conf folder to the path set in HOP_CONFIG_FOLDER to move the configuration from one of your Hop installations to your new central location.

HOP_PLUGIN_BASE_FOLDERS: Set this variable to point Hop to a comma separated list of folders where you want Hop to look for additional plugins.

When using this variable it will also unset your default plugins folder, make sure to add the default plugin folder to the comma separated list. This can be a relative path to the installation eg. export HOP_PLUGIN_BASE_FOLDERS=./plugins,/additional/plugin/folder.
The ./plugins will point to the plugins in the base installation folder

HOP_SHARED_JDBC_FOLDER: The variable which points to a shared folder with JDBC drivers in them.

Additionally, the following environment variables can help you to add even more fine-grained configuration for your Apache Hop installation:

Variable	Default	Description
HOP_AUTO_CREATE_CONFIG	N	Set this variable to 'Y' to automatically create config file when it’s missing.
HOP_METADATA_FOLDER	-	The system environment variable pointing to the alternative location for the Hop metadata folder
HOP_REDIRECT_STDERR	N	Set this variable to Y to redirect stderr to Hop logging.
HOP_REDIRECT_STDOUT	N	Set this variable to Y to redirect stdout to Hop logging.
HOP_SIMPLE_STACK_TRACES	N	System wide flag to log stack traces in a simpler, more human-readable format

Variable

Default

Description

HOP_AUTO_CREATE_CONFIG

Set this variable to 'Y' to automatically create config file when it’s missing.

HOP_METADATA_FOLDER

The system environment variable pointing to the alternative location for the Hop metadata folder

HOP_REDIRECT_STDERR

Set this variable to Y to redirect stderr to Hop logging.

HOP_REDIRECT_STDOUT

Set this variable to Y to redirect stdout to Hop logging.

HOP_SIMPLE_STACK_TRACES

System wide flag to log stack traces in a simpler, more human-readable format