Database Integration

How to integrate a Database with the StreamZero Platform.

Install Database Drivers

StreamZero DX requires a Python DB-API database driver and a SQLAlchemy dialect to be installed for each datastore you want to connect to within the executor image.

Configuring Database Connections

StreamZero can manage preset connection configurations. This enables a platform wide set up for both confidential as well as general access databases.

StreamZero uses the SQL Alchemy Engine along with the URL template based approach to connection management. The connection configurations are maintained as secrets within the platform and are therefore not publicly accessible i.e. access is provided for administrators only.

Retrieving DB Connections

The following is how to retrieve a named connection. The following sample assumes that the connection identifier key is uploaded to the package as a secrets.json.

1
2
3
4
5
6
7
8
from fx_ef import context
import sqlalchemy as db

db_url = context.secrets.get('my_connection')
engine = db.create_engine(db_url)

connection = engine.connect()
metadata = db.MetaData()

In the above example the db_url is set up as a secret with name 'my_connection'.

Depending on whether this is a service, project or platform level secret there are different approaches to set up the secret. For service level secret the following is a sample set up for a secrets.json file of the package.

1
2
3
{
  "my_connection" = "mysql://scott:tiger@localhost/test"
}
  • For Project scope use the 'secrets' tab of the Project Management UI.
  • For Platform scope secrets use the Vault UI in the DX Manager Application.

Database Drivers

The following table provides a guide on the python libs to be installed within the Executor docker image. For instructions on how to extend the Executor docker image please check this page: /docs/extending_executor_image

You can read more here about how to install new database drivers and libraries into your StreamZero DX executor image.

Note that many other databases are supported, the main criteria being the existence of a functional SQLAlchemy dialect and Python driver. Searching for the keyword “sqlalchemy + (database name)” should help get you to the right place.

If your database or data engine isn’t on the list but a SQL interface exists, please file an issue so we can work on documenting and supporting it.

A list of some of the recommended packages.

Database PyPI package
Amazon Athena pip install "PyAthenaJDBC>1.0.9 , pip install "PyAthena>1.2.0
Amazon Redshift pip install sqlalchemy-redshift
Apache Drill pip install sqlalchemy-drill
Apache Druid pip install pydruid
Apache Hive pip install pyhive
Apache Impala pip install impyla
Apache Kylin pip install kylinpy
Apache Pinot pip install pinotdb
Apache Solr pip install sqlalchemy-solr
Apache Spark SQL pip install pyhive
Ascend.io pip install impyla
Azure MS SQL pip install pymssql
Big Query pip install pybigquery
ClickHouse pip install clickhouse-driver==0.2.0 && pip install clickhouse-sqlalchemy==0.1.6
CockroachDB pip install cockroachdb
Dremio pip install sqlalchemy_dremio
Elasticsearch pip install elasticsearch-dbapi
Exasol pip install sqlalchemy-exasol
Google Sheets pip install shillelagh[gsheetsapi]
Firebolt pip install firebolt-sqlalchemy
Hologres pip install psycopg2
IBM Db2 pip install ibm_db_sa
IBM Netezza Performance Server pip install nzalchemy
MySQL pip install mysqlclient
Oracle pip install cx_Oracle
PostgreSQL pip install psycopg2
Trino pip install sqlalchemy-trino
Presto pip install pyhive
SAP Hana pip install hdbcli sqlalchemy-hana or pip install apache-Feris[hana]
Snowflake pip install snowflake-sqlalchemy
SQLite No additional library needed
SQL Server pip install pymssql
Teradata pip install teradatasqlalchemy
Vertica pip install sqlalchemy-vertica-python
Yugabyte pip install psycopg2


Last modified September 4, 2023: mobile fix (c3370e3)