Google BigQuery
Option | Info |
---|---|
Type | Relational |
Driver | |
Version Included | None |
Hop Dependencies | None |
Documentation | |
JDBC Url | jdbc:bigquery://[Host]:[Port];ProjectId=[Project];OAuthType=[AuthValue] |
Driver folder | <Hop Installation>/lib/jdbc |
Driver Installation
The Simba driver is packaged as a .zip containing many jars. Only a subset of the jars included with the Driver are necessary to use Bigquery JDBC with Apache Hop. Furthermore, some of the jars may conflict with those packaged with Hop and must be excluded.
SIMBA DRIVER JARS TO EXCLUDE (THESE JARS ARE INCLUDED IN THE SIMBA DRIVER, BUT CONFLICT WITH HOP LIBRARIES AND MUST BE EXCLUDED
grpc-alts-<VERSION>.jar grpc-api-<VERSION>.jar grpc-core-<VERSION>.jar grpc-netty-shaded-<VERSION>.jar
SIMBA DRIVER JARS TO INCLUDE (Include only these jars for a minimal Bigquery driver install):
api-common-<VERSION>.jar gax-<VERSION>.jar gax-grpc-<VERSION>.jar google-api-services-bigquery-v2-<VERSION>.jar GoogleBigQueryJDBC42.jar google-cloud-bigquerystorage-<VERSION>.jar grpc-google-cloud-bigquerystorage-v1-<VERSION>.jar grpc-google-cloud-bigquerystorage-v1beta1-<VERSION>.jar grpc-google-cloud-bigquerystorage-v1beta2-<VERSION>.jar json-<VERSION>.jar proto-google-cloud-bigquerystorage-v1-<VERSION>.jar proto-google-cloud-bigquerystorage-v1beta1-<VERSION>.jar proto-google-cloud-bigquerystorage-v1beta2-<VERSION>.jar threetenbp-<VERSION>.jar
* Tested with Hop 2.5.0 and Simba 1.3.3.1004. Not all authentication methods tested either, so this list may not be exhaustive
Connection Configuration
Basic Connection Settings
When creating a Google BigQuery connection in Apache Hop, configure the following basic settings:
Setting | Value |
---|---|
Connection name | Your preferred connection name (e.g., "BQ") |
Connection type | Google BigQuery |
Access | Native (JDBC) |
Server host name | |
Database name | Your Google Cloud Project ID |
Port number | 443 |
Authentication Options
Google BigQuery connections in Apache Hop support multiple authentication methods. The most common and recommended approach is using Service Account authentication with a JSON key file.
In the Options tab of the connection dialog, configure the following parameters:
Parameter | Value | Description |
---|---|---|
OAuthType | 0 | Service Account authentication |
ProjectId | your-project-id | Your Google Cloud Project ID |
OAuthServiceAcctEmail | Service account email address | |
OAuthPvtKeyPath | /path/to/service-account-key.json | Path to the service account JSON key file |
TimeOut | 3600 | Connection timeout in seconds (optional) |
Example Configuration
Here’s a complete example of a working BigQuery connection configuration:
-
Connection name: BQ
-
Server host name: https://www.googleapis.com/bigquery/v2
-
Database name: your-project-id
-
Port number: 443
Options tab parameters:
-
OAuthType: 0
-
ProjectId: your-project-id
-
OAuthServiceAcctEmail: your-service-account@your-project-id.iam.gserviceaccount.com
-
OAuthPvtKeyPath: /path/to/your-service-account-key.json
-
TimeOut: 3600
Google Cloud Platform Service Account Setup
To connect to Google Cloud services, you need to set up a service account in your Google Cloud Platform (GCP) project and download the authentication credentials.
Creating a Service Account
-
Navigate to the Google Cloud Console:
-
Select your project or create a new one
-
Create a Service Account:
-
Navigate to "IAM & Admin" > "Service Accounts"
-
Click "Create Service Account"
-
Provide a name and description for your service account
-
Click "Create and Continue"
-
-
Assign Roles:
-
Assign the appropriate roles to your service account based on the service you’re connecting to:
-
For BigQuery:
-
BigQuery Data Viewer
(roles/bigquery.dataViewer
) - for read access -
BigQuery Job User
(roles/bigquery.jobUser
) - for running queries (required) -
BigQuery Data Editor
(roles/bigquery.dataEditor
) - if write access needed -
BigQuery Data Owner
(roles/bigquery.dataOwner
) - if full control needed
-
-
For Cloud Storage:
Storage Object Viewer
,Storage Object Admin
(if write access needed) -
For other services: Consult the specific service documentation for required roles
-
-
Click "Continue" and then "Done"
-
-
Generate and Download Key:
-
Click on the created service account
-
Go to the "Keys" tab
-
Click "Add Key" > "Create new key"
-
Select "JSON" format
-
Click "Create" to download the key file
-
-
Secure the Key File:
-
Store the downloaded JSON key file in a secure location
-
Note the full path to this file - you’ll need it for authentication configuration
-
Ensure the file has appropriate permissions (readable only by the user running Hop)
-
Alternative: Using Application Default Credentials
Apache Hop can also use Google Cloud’s Application Default Credentials (ADC) if you’re running Hop on Google Cloud Platform or have configured the Google Cloud SDK locally.
To use ADC: 1. Install and configure the Google Cloud SDK 2. Run gcloud auth application-default login
to set up default credentials 3. In your Hop connection, you can omit service account key file parameters
This method is particularly useful for development environments or when running Hop on Google Cloud Platform services.
Security Best Practices
-
Principle of Least Privilege: Only assign the minimum roles necessary for your use case
-
Key Rotation: Regularly rotate service account keys (recommended every 90 days)
-
Environment Variables: Consider using environment variables to store key file paths instead of hardcoding them
-
Access Control: Restrict access to service account key files using appropriate file system permissions
-
Monitoring: Enable audit logging to monitor service account usage
Testing the Connection
After configuring your connection:
-
Click the "Test" button in the connection dialog
-
If successful, you should see a confirmation message
-
If the test fails, verify:
-
Your service account has the necessary BigQuery permissions
-
The JSON key file path is correct and accessible
-
Your project ID matches the one in the service account
-
Network connectivity to Google APIs is available
-
Troubleshooting
Common Issues
-
Authentication errors: Verify that your service account has the required BigQuery roles and that the JSON key file path is correct
-
Project not found: Ensure the ProjectId parameter matches your actual Google Cloud Project ID
-
Connection timeout: Increase the TimeOut value if you’re experiencing slow connections
-
Driver conflicts: Ensure you’ve excluded the conflicting GRPC jars as listed in the driver installation section