This the multi-page printable view of this section. Click here to print.
Supported Databases
- 1: Ascend.io
- 2: Amazon Athena
- 3: Amazon Redshift
- 4: Apache Drill
- 5: Apache Druid
- 6: Apache Hive
- 7: Apache Impala
- 8: Apache Kylin
- 9: Apache Pinot
- 10: Apache Solr
- 11: Apache Spark SQL
- 12: Clickhouse
- 13: CockroachDB
- 14: CrateDB
- 15: Databricks
- 16: Dremio
- 17: Elasticsearch
- 18: Exasol
- 19: Firebird
- 20: Firebolt
- 21: Google BigQuery
- 22: Google Sheets
- 23: Hana
- 24: Hologres
- 25: IBM DB2
- 26: IBM Netezza Performance Server
- 27: Microsoft SQL Server
- 28: MySQL
- 29: Oracle
- 30: Postgres
- 31: Presto
- 32: Rockset
- 33: Snowflake
- 34: Teradata
- 35: Trino
- 36: Vertica
- 37: YugabyteDB
1 - Ascend.io
Ascend.io
The recommended connector library to Ascend.io is impyla.
The expected connection string is formatted as follows:
ascend://{username}:{password}@{hostname}:{port}/{database}?auth_mechanism=PLAIN;use_ssl=true
2 - Amazon Athena
AWS Athena
PyAthenaJDBC
PyAthenaJDBC is a Python DB 2.0 compliant wrapper for the Amazon Athena JDBC driver.
The connection string for Amazon Athena is as follows:
awsathena+jdbc://{aws_access_key_id}:{aws_secret_access_key}@athena.{region_name}.amazonaws.com/{schema_name}?s3_staging_dir={s3_staging_dir}&...
Note that you’ll need to escape & encode when forming the connection string like so:
s3://... -> s3%3A//...
PyAthena
You can also use PyAthena library (no Java required) with the following connection string:
awsathena+rest://{aws_access_key_id}:{aws_secret_access_key}@athena.{region_name}.amazonaws.com/{schema_name}?s3_staging_dir={s3_staging_dir}&...
3 - Amazon Redshift
AWS Redshift
The sqlalchemy-redshift library is the recommended way to connect to Redshift through SQLAlchemy.
You’ll need to the following setting values to form the connection string:
- User Name: userName
- Password: DBPassword
- Database Host: AWS Endpoint
- Database Name: Database Name
- Port: default 5439
Here’s what the connection string looks like:
redshift+psycopg2://<userName>:<DBPassword>@<AWS End Point>:5439/<Database Name>
4 - Apache Drill
Apache Drill
SQLAlchemy
The recommended way to connect to Apache Drill is through SQLAlchemy. You can use the sqlalchemy-drill package.
Once that is done, you can connect to Drill in two ways, either via the REST interface or by JDBC. If you are connecting via JDBC, you must have the Drill JDBC Driver installed.
The basic connection string for Drill looks like this:
drill+sadrill://<username>:<password>@<host>:<port>/<storage_plugin>?use_ssl=True
To connect to Drill running on a local machine running in embedded mode you can use the following connection string:
drill+sadrill://localhost:8047/dfs?use_ssl=False
JDBC
Connecting to Drill through JDBC is more complicated and we recommend following this tutorial.
The connection string looks like:
drill+jdbc://<username>:<passsword>@<host>:<port>
ODBC
We recommend reading the Apache Drill documentation and read the Github README to learn how to work with Drill through ODBC.
5 - Apache Druid
Apache Druid
Use the SQLAlchemy / DBAPI connector made available in the pydruid library.
The connection string looks like:
druid://<User>:<password>@<Host>:<Port-default-9088>/druid/v2/sql
Customizing Druid Connection
When adding a connection to Druid, you can customize the connection a few different ways in the Add Database form.
Custom Certificate
You can add certificates in the Root Certificate field when configuring the new database connection to Druid:
<img src={useBaseUrl("/img/root-cert-example.png")} />{" “}
When using a custom certificate, pydruid will automatically use https scheme.
Disable SSL Verification
To disable SSL verification, add the following to the Extras field:
engine_params:
{"connect_args":
{"scheme": "https", "ssl_verify_cert": false}}
6 - Apache Hive
Apache Hive
The pyhive library is the recommended way to connect to Hive through SQLAlchemy.
The expected connection string is formatted as follows:
hive://hive@{hostname}:{port}/{database}
7 - Apache Impala
Apache Impala
The recommended connector library to Apache Impala is impyla.
The expected connection string is formatted as follows:
impala://{hostname}:{port}/{database}
8 - Apache Kylin
Apache Kylin
The recommended connector library for Apache Kylin is kylinpy.
The expected connection string is formatted as follows:
kylin://<username>:<password>@<hostname>:<port>/<project>?<param1>=<value1>&<param2>=<value2>
9 - Apache Pinot
Apache Pinot
The recommended connector library for Apache Pinot is pinotdb.
The expected connection string is formatted as follows:
pinot+http://<pinot-broker-host>:<pinot-broker-port>/query?controller=http://<pinot-controller-host>:<pinot-controller-port>/``
10 - Apache Solr
Apache Solr
The sqlalchemy-solr library provides a Python / SQLAlchemy interface to Apache Solr.
The connection string for Solr looks like this:
solr://{username}:{password}@{host}:{port}/{server_path}/{collection}[/?use_ssl=true|false]
11 - Apache Spark SQL
Apache Spark SQL
The recommended connector library for Apache Spark SQL pyhive.
The expected connection string is formatted as follows:
hive://hive@{hostname}:{port}/{database}
12 - Clickhouse
Clickhouse
To use Clickhouse with StreamZero you will need to add the following Python libraries:
clickhouse-driver==0.2.0
clickhouse-sqlalchemy==0.1.6
If running StreamZero using Docker Compose, add the following to your ./docker/requirements-local.txt
file:
clickhouse-driver>=0.2.0
clickhouse-sqlalchemy>=0.1.6
The recommended connector library for Clickhouse is sqlalchemy-clickhouse.
The expected connection string is formatted as follows:
clickhouse+native://<user>:<password>@<host>:<port>/<database>[?options…]clickhouse://{username}:{password}@{hostname}:{port}/{database}
Here’s a concrete example of a real connection string:
clickhouse+native://demo:demo@github.demo.trial.altinity.cloud/default?secure=true
If you’re using Clickhouse locally on your computer, you can get away with using a native protocol URL that uses the default user without a password (and doesn’t encrypt the connection):
clickhouse+native://localhost/default
13 - CockroachDB
CockroachDB
The recommended connector library for CockroachDB is sqlalchemy-cockroachdb.
The expected connection string is formatted as follows:
cockroachdb://root@{hostname}:{port}/{database}?sslmode=disable
14 - CrateDB
CrateDB
The recommended connector library for CrateDB is crate. You need to install the extras as well for this library. We recommend adding something like the following text to your requirements file:
crate[sqlalchemy]==0.26.0
The expected connection string is formatted as follows:
crate://crate@127.0.0.1:4200
15 - Databricks
Databricks
To connect to Databricks, first install databricks-dbapi with the optional SQLAlchemy dependencies:
|
|
There are two ways to connect to Databricks: using a Hive connector or an ODBC connector. Both ways work similarly, but only ODBC can be used to connect to SQL endpoints.
Hive
To use the Hive connector you need the following information from your cluster:
- Server hostname
- Port
- HTTP path
These can be found under “Configuration” -> “Advanced Options” -> “JDBC/ODBC”.
You also need an access token from “Settings” -> “User Settings” -> “Access Tokens”.
Once you have all this information, add a database of type “Databricks (Hive)” in StreamZero, and use the following SQLAlchemy URI:
databricks+pyhive://token:{access token}@{server hostname}:{port}/{database name}
You also need to add the following configuration to “Other” -> “Engine Parameters”, with your HTTP path:
{"connect_args": {"http_path": "sql/protocolv1/o/****"}}
ODBC
For ODBC you first need to install the ODBC drivers for your platform.
For a regular connection use this as the SQLAlchemy URI:
databricks+pyodbc://token:{access token}@{server hostname}:{port}/{database name}
And for the connection arguments:
{"connect_args": {"http_path": "sql/protocolv1/o/****", "driver_path": "/path/to/odbc/driver"}}
The driver path should be:
/Library/simba/spark/lib/libsparkodbc_sbu.dylib
(Mac OS)/opt/simba/spark/lib/64/libsparkodbc_sb64.so
(Linux)
For a connection to a SQL endpoint you need to use the HTTP path from the endpoint:
{"connect_args": {"http_path": "/sql/1.0/endpoints/****", "driver_path": "/path/to/odbc/driver"}}
16 - Dremio
Dremio
The recommended connector library for Dremio is sqlalchemy_dremio.
The expected connection string for ODBC (Default port is 31010) is formatted as follows:
dremio://{username}:{password}@{host}:{port}/{database_name}/dremio?SSL=1
The expected connection string for Arrow Flight (Dremio 4.9.1+. Default port is 32010) is formatted as follows:
dremio+flight://{username}:{password}@{host}:{port}/dremio
This blog post by Dremio has some additional helpful instructions on connecting StreamZero to Dremio.
17 - Elasticsearch
Elasticsearch
The recommended connector library for Elasticsearch is elasticsearch-dbapi.
The connection string for Elasticsearch looks like this:
elasticsearch+http://{user}:{password}@{host}:9200/
Using HTTPS
elasticsearch+https://{user}:{password}@{host}:9200/
Elasticsearch as a default limit of 10000 rows, so you can increase this limit on your cluster or set Feris’s row limit on config
ROW_LIMIT = 10000
You can query multiple indices on SQL Lab for example
SELECT timestamp, agent FROM "logstash"
But, to use visualizations for multiple indices you need to create an alias index on your cluster
POST /_aliases
{
"actions" : [
{ "add" : { "index" : "logstash-**", "alias" : "logstash_all" } }
]
}
Then register your table with the alias name logstasg_all
Time zone
By default, StreamZero uses UTC time zone for elasticsearch query. If you need to specify a time zone, please edit your Database and enter the settings of your specified time zone in the Other > ENGINE PARAMETERS:
{
"connect_args": {
"time_zone": "Asia/Shanghai"
}
}
Another issue to note about the time zone problem is that before elasticsearch7.8, if you want to convert a string into a DATETIME
object,
you need to use the CAST
function,but this function does not support our time_zone
setting. So it is recommended to upgrade to the version after elasticsearch7.8.
After elasticsearch7.8, you can use the DATETIME_PARSE
function to solve this problem.
The DATETIME_PARSE function is to support our time_zone
setting, and here you need to fill in your elasticsearch version number in the Other > VERSION setting.
the StreamZero will use the DATETIME_PARSE
function for conversion.
18 - Exasol
Exasol
The recommended connector library for Exasol is sqlalchemy-exasol.
The connection string for Exasol looks like this:
exa+pyodbc://{username}:{password}@{hostname}:{port}/my_schema?CONNECTIONLCALL=en_US.UTF-8&driver=EXAODBC
19 - Firebird
Firebird
The recommended connector library for Firebird is sqlalchemy-firebird.
StreamZero has been tested on sqlalchemy-firebird>=0.7.0, <0.8
.
The recommended connection string is:
firebird+fdb://{username}:{password}@{host}:{port}//{path_to_db_file}
Here’s a connection string example of StreamZero connecting to a local Firebird database:
firebird+fdb://SYSDBA:masterkey@192.168.86.38:3050//Library/Frameworks/Firebird.framework/Versions/A/Resources/examples/empbuild/employee.fdb
20 - Firebolt
Firebolt
The recommended connector library for Firebolt is firebolt-sqlalchemy.
StreamZero has been tested on firebolt-sqlalchemy>=0.0.1
.
The recommended connection string is:
firebolt://{username}:{password}@{database}
or
firebolt://{username}:{password}@{database}/{engine_name}
Here’s a connection string example of StreamZero connecting to a Firebolt database:
firebolt://email@domain:password@sample_database
or
firebolt://email@domain:password@sample_database/sample_engine
21 - Google BigQuery
Google BigQuery
The recommended connector library for BigQuery is pybigquery.
Install BigQuery Driver
Follow the steps here about how to install new database drivers when setting up StreamZero locally via docker-compose.
echo "pybigquery" >> ./docker/requirements-local.txt
Connecting to BigQuery
When adding a new BigQuery connection in StreamZero, you’ll need to add the GCP Service Account credentials file (as a JSON).
- Create your Service Account via the Google Cloud Platform control panel, provide it access to the appropriate BigQuery datasets, and download the JSON configuration file for the service account.
- In StreamZero you can either upload that JSON or add the JSON blob in the following format (this should be the content of your credential JSON file):
{
"type": "service_account",
"project_id": "...",
"private_key_id": "...",
"private_key": "...",
"client_email": "...",
"client_id": "...",
"auth_uri": "...",
"token_uri": "...",
"auth_provider_x509_cert_url": "...",
"client_x509_cert_url": "..."
}
-
Additionally, can connect via SQLAlchemy URI instead
The connection string for BigQuery looks like:
bigquery://{project_id}
Go to the Advanced tab, Add a JSON blob to the Secure Extra field in the database configuration form with the following format:
{ "credentials_info": <contents of credentials JSON file> }
The resulting file should have this structure:
{ "credentials_info": { "type": "service_account", "project_id": "...", "private_key_id": "...", "private_key": "...", "client_email": "...", "client_id": "...", "auth_uri": "...", "token_uri": "...", "auth_provider_x509_cert_url": "...", "client_x509_cert_url": "..." } }
You should then be able to connect to your BigQuery datasets.
To be able to upload CSV or Excel files to BigQuery in StreamZero, you’ll need to also add the pandas_gbq library.
22 - Google Sheets
Google Sheets
Google Sheets has a very limited SQL API. The recommended connector library for Google Sheets is shillelagh.
23 - Hana
Hana
The recommended connector library is sqlalchemy-hana.
The connection string is formatted as follows:
hana://{username}:{password}@{host}:{port}
24 - Hologres
Hologres
Hologres is a real-time interactive analytics service developed by Alibaba Cloud. It is fully compatible with PostgreSQL 11 and integrates seamlessly with the big data ecosystem.
Hologres sample connection parameters:
- User Name: The AccessKey ID of your Alibaba Cloud account.
- Password: The AccessKey secret of your Alibaba Cloud account.
- Database Host: The public endpoint of the Hologres instance.
- Database Name: The name of the Hologres database.
- Port: The port number of the Hologres instance.
The connection string looks like:
postgresql+psycopg2://{username}:{password}@{host}:{port}/{database}
25 - IBM DB2
IBM DB2
The IBM_DB_SA library provides a Python / SQLAlchemy interface to IBM Data Servers.
Here’s the recommended connection string:
db2+ibm_db://{username}:{passport}@{hostname}:{port}/{database}
There are two DB2 dialect versions implemented in SQLAlchemy. If you are connecting to a DB2 version without LIMIT [n]
syntax, the recommended connection string to be able to use the SQL Lab is:
ibm_db_sa://{username}:{passport}@{hostname}:{port}/{database}
26 - IBM Netezza Performance Server
IBM Netezza Performance Server
The nzalchemy library provides a Python / SQLAlchemy interface to IBM Netezza Performance Server (aka Netezza).
Here’s the recommended connection string:
netezza+nzpy://{username}:{password}@{hostname}:{port}/{database}
27 - Microsoft SQL Server
SQL Server
The recommended connector library for SQL Server is pymssql.
The connection string for SQL Server looks like this:
mssql+pymssql://<Username>:<Password>@<Host>:<Port-default:1433>/<Database Name>/?Encrypt=yes
28 - MySQL
MySQL
The recommended connector library for MySQL is mysqlclient.
Here’s the connection string:
mysql://{username}:{password}@{host}/{database}
Host:
- For Localhost or Docker running Linux:
localhost
or127.0.0.1
- For On Prem: IP address or Host name
- For Docker running in OSX:
docker.for.mac.host.internal
Port:3306
by default
One problem with mysqlclient
is that it will fail to connect to newer MySQL databases using caching_sha2_password
for authentication, since the plugin is not included in the client. In this case, you should use [mysql-connector-python](https://pypi.org/project/mysql-connector-python/)
instead:
mysql+mysqlconnector://{username}:{password}@{host}/{database}
29 - Oracle
Oracle
The recommended connector library is cx_Oracle.
The connection string is formatted as follows:
oracle://<username>:<password>@<hostname>:<port>
30 - Postgres
Postgres
Note that, if you’re using docker-compose, the Postgres connector library psycopg2 comes out of the box with Feris.
Postgres sample connection parameters:
- User Name: UserName
- Password: DBPassword
- Database Host:
- For Localhost: localhost or 127.0.0.1
- For On Prem: IP address or Host name
- For AWS Endpoint
- Database Name: Database Name
- Port: default 5432
The connection string looks like:
postgresql://{username}:{password}@{host}:{port}/{database}
You can require SSL by adding ?sslmode=require
at the end:
postgresql://{username}:{password}@{host}:{port}/{database}?sslmode=require
You can read about the other SSL modes that Postgres supports in Table 31-1 from this documentation.
More information about PostgreSQL connection options can be found in the SQLAlchemy docs and the PostgreSQL docs.
31 - Presto
Presto
The pyhive library is the recommended way to connect to Presto through SQLAlchemy.
The expected connection string is formatted as follows:
presto://{hostname}:{port}/{database}
You can pass in a username and password as well:
presto://{username}:{password}@{hostname}:{port}/{database}
Here is an example connection string with values:
presto://datascientist:securepassword@presto.example.com:8080/hive
By default StreamZero assumes the most recent version of Presto is being used when querying the datasource. If you’re using an older version of Presto, you can configure it in the extra parameter:
{
"version": "0.123"
}
32 - Rockset
Rockset
The connection string for Rockset is:
rockset://apikey:{your-apikey}@api.rs2.usw2.rockset.com/
For more complete instructions, we recommend the Rockset documentation.
33 - Snowflake
Snowflake
The recommended connector library for Snowflake is snowflake-sqlalchemy<=1.2.4.
The connection string for Snowflake looks like this:
snowflake://{user}:{password}@{account}.{region}/{database}?role={role}&warehouse={warehouse}
The schema is not necessary in the connection string, as it is defined per table/query. The role and warehouse can be omitted if defaults are defined for the user, i.e.
snowflake://{user}:{password}@{account}.{region}/{database}
Make sure the user has privileges to access and use all required databases/schemas/tables/views/warehouses, as the Snowflake SQLAlchemy engine does not test for user/role rights during engine creation by default. However, when pressing the “Test Connection” button in the Create or Edit Database dialog, user/role credentials are validated by passing “validate_default_parameters”: True to the connect() method during engine creation. If the user/role is not authorized to access the database, an error is recorded in the StreamZero logs.
34 - Teradata
Teradata
The recommended connector library is teradatasqlalchemy.
The connection string for Teradata looks like this:
teradata://{user}:{password}@{host}
ODBC Driver
There’s also an older connector named sqlalchemy-teradata that requires the installation of ODBC drivers. The Teradata ODBC Drivers are available here: https://downloads.teradata.com/download/connectivity/odbc-driver/linux
Here are the required environment variables:
export ODBCINI=/.../teradata/client/ODBC_64/odbc.ini
export ODBCINST=/.../teradata/client/ODBC_64/odbcinst.ini
We recommend using the first library because of the lack of requirement around ODBC drivers and because it’s more regularly updated.
35 - Trino
Trino
Supported trino version 352 and higher
The sqlalchemy-trino library is the recommended way to connect to Trino through SQLAlchemy.
The expected connection string is formatted as follows:
trino://{username}:{password}@{hostname}:{port}/{catalog}
If you are running trino with docker on local machine please use the following connection URL
trino://trino@host.docker.internal:8080
Reference: Trino-Feris-Podcast
36 - Vertica
Vertica
The recommended connector library is sqlalchemy-vertica-python. The Vertica connection parameters are:
- User Name: UserName
- Password: DBPassword
- Database Host:
- For Localhost : localhost or 127.0.0.1
- For On Prem : IP address or Host name
- For Cloud: IP Address or Host Name
- Database Name: Database Name
- Port: default 5433
The connection string is formatted as follows:
vertica+vertica_python://{username}:{password}@{host}/{database}
Other parameters:
- Load Balancer - Backup Host
37 - YugabyteDB
YugabyteDB
YugabyteDB is a distributed SQL database built on top of PostgreSQL.
Note that, if you’re using docker-compose, the Postgres connector library psycopg2 comes out of the box with StreamZero.
The connection string looks like:
postgresql://{username}:{password}@{host}:{port}/{database}