Google BigQuery

Option Info

Type

Relational

Driver

Driver Link

Version Included

None

Hop Dependencies

None

Documentation

Documentation Link

JDBC Url

jdbc:bigquery://[Host]:[Port];ProjectId=[Project];OAuthType=[AuthValue]

Driver folder

<Hop Installation>/lib/jdbc

Driver Installation

The Simba driver is packaged as a .zip containing many jars. Only a subset of the jars included with the Driver are necessary to use Bigquery JDBC with Apache Hop. Furthermore, some of the jars may conflict with those packaged with Hop and must be excluded.

SIMBA DRIVER JARS TO EXCLUDE (THESE JARS ARE INCLUDED IN THE SIMBA DRIVER, BUT CONFLICT WITH HOP LIBRARIES AND MUST BE EXCLUDED

grpc-alts-<VERSION>.jar
grpc-api-<VERSION>.jar
grpc-core-<VERSION>.jar
grpc-netty-shaded-<VERSION>.jar

SIMBA DRIVER JARS TO INCLUDE (Include only these jars for a minimal Bigquery driver install):

api-common-<VERSION>.jar
gax-<VERSION>.jar
gax-grpc-<VERSION>.jar
google-api-services-bigquery-v2-<VERSION>.jar
GoogleBigQueryJDBC42.jar
google-cloud-bigquerystorage-<VERSION>.jar
grpc-google-cloud-bigquerystorage-v1-<VERSION>.jar
grpc-google-cloud-bigquerystorage-v1beta1-<VERSION>.jar
grpc-google-cloud-bigquerystorage-v1beta2-<VERSION>.jar
json-<VERSION>.jar
proto-google-cloud-bigquerystorage-v1-<VERSION>.jar
proto-google-cloud-bigquerystorage-v1beta1-<VERSION>.jar
proto-google-cloud-bigquerystorage-v1beta2-<VERSION>.jar
threetenbp-<VERSION>.jar

* Tested with Hop 2.5.0 and Simba 1.3.3.1004. Not all authentication methods tested either, so this list may not be exhaustive

Connection Configuration

Basic Connection Settings

When creating a Google BigQuery connection in Apache Hop, configure the following basic settings:

Setting Value

Connection name

Your preferred connection name (e.g., "BQ")

Connection type

Google BigQuery

Access

Native (JDBC)

Server host name

https://www.googleapis.com/bigquery/v2

Database name

Your Google Cloud Project ID

Port number

443

Authentication Options

Google BigQuery connections in Apache Hop support multiple authentication methods. The most common and recommended approach is using Service Account authentication with a JSON key file.

In the Options tab of the connection dialog, configure the following parameters:

Parameter Value Description

OAuthType

0

Service Account authentication

ProjectId

your-project-id

Your Google Cloud Project ID

OAuthServiceAcctEmail

service-account@your-project.iam.gserviceaccount.com

Service account email address

OAuthPvtKeyPath

/path/to/service-account-key.json

Path to the service account JSON key file

TimeOut

3600

Connection timeout in seconds (optional)

Example Configuration

Here’s a complete example of a working BigQuery connection configuration:

Options tab parameters:

Google Cloud Platform Service Account Setup

To connect to Google Cloud services, you need to set up a service account in your Google Cloud Platform (GCP) project and download the authentication credentials.

Creating a Service Account

  1. Navigate to the Google Cloud Console:

  2. Create a Service Account:

    • Navigate to "IAM & Admin" > "Service Accounts"

    • Click "Create Service Account"

    • Provide a name and description for your service account

    • Click "Create and Continue"

  3. Assign Roles:

    • Assign the appropriate roles to your service account based on the service you’re connecting to:

      • For BigQuery:

        • BigQuery Data Viewer (roles/bigquery.dataViewer) - for read access

        • BigQuery Job User (roles/bigquery.jobUser) - for running queries (required)

        • BigQuery Data Editor (roles/bigquery.dataEditor) - if write access needed

        • BigQuery Data Owner (roles/bigquery.dataOwner) - if full control needed

      • For Cloud Storage: Storage Object Viewer, Storage Object Admin (if write access needed)

      • For other services: Consult the specific service documentation for required roles

    • Click "Continue" and then "Done"

  4. Generate and Download Key:

    • Click on the created service account

    • Go to the "Keys" tab

    • Click "Add Key" > "Create new key"

    • Select "JSON" format

    • Click "Create" to download the key file

  5. Secure the Key File:

    • Store the downloaded JSON key file in a secure location

    • Note the full path to this file - you’ll need it for authentication configuration

    • Ensure the file has appropriate permissions (readable only by the user running Hop)

Alternative: Using Application Default Credentials

Apache Hop can also use Google Cloud’s Application Default Credentials (ADC) if you’re running Hop on Google Cloud Platform or have configured the Google Cloud SDK locally.

To use ADC: 1. Install and configure the Google Cloud SDK 2. Run gcloud auth application-default login to set up default credentials 3. In your Hop connection, you can omit service account key file parameters

This method is particularly useful for development environments or when running Hop on Google Cloud Platform services.

Security Best Practices

  • Principle of Least Privilege: Only assign the minimum roles necessary for your use case

  • Key Rotation: Regularly rotate service account keys (recommended every 90 days)

  • Environment Variables: Consider using environment variables to store key file paths instead of hardcoding them

  • Access Control: Restrict access to service account key files using appropriate file system permissions

  • Monitoring: Enable audit logging to monitor service account usage

Testing the Connection

After configuring your connection:

  1. Click the "Test" button in the connection dialog

  2. If successful, you should see a confirmation message

  3. If the test fails, verify:

    • Your service account has the necessary BigQuery permissions

    • The JSON key file path is correct and accessible

    • Your project ID matches the one in the service account

    • Network connectivity to Google APIs is available

Troubleshooting

Common Issues

  • Authentication errors: Verify that your service account has the required BigQuery roles and that the JSON key file path is correct

  • Project not found: Ensure the ProjectId parameter matches your actual Google Cloud Project ID

  • Connection timeout: Increase the TimeOut value if you’re experiencing slow connections

  • Driver conflicts: Ensure you’ve excluded the conflicting GRPC jars as listed in the driver installation section

Performance Considerations

  • Use appropriate timeouts for large query operations

  • Consider using BigQuery’s standard SQL dialect for better performance

  • Implement proper error handling in your workflows when working with large datasets