This the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

StreamZero SX

The following section provides a short overview of key features, concepts and architecture of StreamZero SX.

Overview

StreamZero SX is a streaming automation solution for the StreamZero Platform. It utilizes Apache Kafka, the distributed message broker used by thousands of companies, to enable data processing across the data mesh.

StreamZero SX drastically simplifies the creation of data pipelines and deployment of data streams, speeding up the time it takes to build stream processing applications.

It automates sourcing, streaming, and data management, and widely reduces the need for engineers’ involvement in topics management, DevOps, and DataOps.

What is Stream-Processing

Stream processing is a data management technique that involves ingesting a continuous data stream to quickly analyze, filter, transform or enhance the data in real time. Apache Kafka is the most popular open-source stream-processing software for collecting, processing, storing, and analyzing data at scale.

Most known for its excellent performance and fault tolerance, Kafka is designed to deliver high throughput and at the same time maintain low latency for real-time data feeds. It can handle thousands of messages per second with an average latency of 5–15ms.

Kafka serves as an ideal platform for building real-time streaming applications such as online streaming analytics or real-time message queue.

Apache Kafka has several advantages over other tools. Some notable benefits are:

  • Building data pipelines.
  • Leveraging real-time data streams.
  • Enabling operational metrics.
  • Data integration across countless sources.

Common Struggles For Companies Trying to Implement Kafka as an Integration Pattern

Now, while Kafka is great for building scalable and high-performance streaming applications, it’s actually hard to implement and maintain.

  1. For one thing, the system is large and complex, which is why most companies fail to meet their goals.
  2. On top of that, integrating client systems with Kafka brings additional challenges that can be difficult even for experienced teams, because there are many different technical complexities that could potentially cause hiccups in your integration strategy. -> Data schema, supported protocol and serialization are just some of the examples.
  3. As a result, Kafka requires a dedicated team with advanced knowledge and varying skill sets to handle its adoption — engineers, DevOps specialists, DataOps engineers, and GitOps experts.
  4. Moreover, due to the complexity of the applications, especially the concern of scalability, it can take a significant time to build each application.

There are many steps involved: from defining and writing business logic, setting up Kafka and integrating it with other services, to automating and deploying the applications.

How Does StreamZero SX Address And Solve These Issues?

StreamZero SX takes streaming automation to a whole new level. And the way it works is simple. It removes the complexity of Kafka connections, integrations, setups, automation, deployments and gives the end user the opportunity to focus on building client applications instead of losing time learning how to manage Kafka.

But how exactly does StreamZero SX solve the common issues and pitfalls mentioned above? By simplifying all processes:

  • It is easy to adop and therefore has a low learning curve: Users can start working with StreamZero SX and experience first results within an hour.
  • It removes the all complexities of Kafka: Engineers focus strictly on business logic for processing messages. The StreamZero SX python package takes care of configuration, Kafka connections, error handling, logging, and functions to interact with other services inside StreamZero.
  • It is flexible. StreamZero SX allows using different underlying images and install additional components or pip modules.
  • It enables connecting services code automatically to Streams and Topics.
  • It helps you to quickly iterate on your service architecture. With StreamZero SX, once the images are deployed and the services are running, results are displayed right away.
  • It takes care of all the underlying core processes. This means that you don’t need to worry about any technical or operational considerations.
  • It is highly scalable and provides flexibility to up- or down-scale at any time, adjusted to the user’s needs and the number of topic partitions.

With the experience and knowledge gained over the past 7 years, the StreamZero Labs team has built an out-of-the-box module that lets developers concentrate on coding while taking care of all the complex processes that come with stream-processing data integrations.

1 - Developer Guide

StreamZero SX Developer Guide

Overview

StreamZero is a container level solution for building highly scalable, cross-network sync or async applications.

Using the StreamZero SX platform to run and manage stream processing containers utilizing StreamZero messaging infrastructure significantly reduces the cost of deploying enterprise application and offers standardized data streaming between workflow steps. This will simplify the development and as result create a platform with agile data processing and ease of integration.

Getting started with Stream Processors

Take a look at this library for creating Stream Processors on top of Kafka and running them inside StreamZero platform: StreamZero-SX

Example of a Stream Processor

Below you can find an example application that is using StreamZero-sx python library functions to count the number of words in incoming messages and then sending the result to twitter_feed_wc Kafka topic.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
import json
from StreamZero_sx.core import app
from StreamZero_sx.utils import sx_producer


def process(message):
    message_new = dict()
    message_new['text'] = message['text']
    message_new['word_count'] = len(message['text'].split(' '))
    message_new_json = json.dumps(message_new)
    print(message_new_json)
    sx_producer.send(topic="twitter_feed_wc", value=message_new_json.encode('utf-8'))


app.process = process

Creating Docker Container

Below is an example of a dockerfile to create a Docker image for the Twitter Word Count application shown in the previous section. The user is free to use whatever base python image and then add StreamZero module and other libraries.

FROM python:3.9-alpine
#RUN pip install -i https://test.pypi.org/simple/ StreamZero-sx==0.0.8 --extra-index-url https://pypi.org/simple/ StreamZero-sx
RUN pip install StreamZero-sx
COPY twitter_word_count.py app.py

After the user have built an image and pushed it to some Docker image regitry, he can run it in StreamZero SX UI.

2 - Integrations Guide

StreamZero SX Integrations Guide.

How does it Work?

There are two main approaches to implementing the external notifications support.

  • Implementation within a StreamZero SX container
  • Implementation in an Exit Gateway

The 2nd option is used in platforms which are behind a firewall and therefore require the gateway to be outside the firewall for accessing external services. In these cases the adapter runs as a separate container.

Irrespective of the infrastructure implementation the service internal API (as illustrated above) does not change.

3 - User Guide

StreamZero SX User Guide.

StreamZero SX Management UI

Create a Stream Adapter

After a developer has built an image of a stream processing task and stored it to a container register, we can configure and launch it with StreamZero Management UI.

On left side menu, open Stream Adapters menu and select “Stream Adapter Definition”. Fill in the details.

create_stream_adapter_ui

Go to the “List Stream Adapters” page. You should find your the Stream Adapter that we created on the list. You can start the container by clicking the “Run”-button. The download and start-up of the image can take few minutes.

list_stream_adapter_ui

When the Stream Adapter is running you can find it in the list of running adapters.

list_running_adapters_ui

StreamZero also has a list of all the Kafka topics that are currently attached to Stream Adapters or available to Stream Adapters.

list_topics_ui

4 - Solutions snippets / explain problem solved / link to relevant use case

The following section provides StreamZero SX solutions snippets / explain problem solved / link to relevant use case.

Twitter message processing example

The first example application is using StreamZero-sx python library to implement stream processor to count the number of words in incoming messages. The messages are queried from Twitter API with specific filter condition and then fed to the processor. The results are sent to a Kafka topic.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
import json
from StreamZero_sx.core import app
from StreamZero_sx.utils import sx_producer


def process(message):
    message_new = dict()
    message_new['text'] = message['text']
    message_new['word_count'] = len(message['text'].split(' '))
    message_new_json = json.dumps(message_new)
    print(message_new_json)
    sx_producer.send(topic="twitter_feed_wc", value=message_new_json.encode('utf-8'))


app.process = process

5 - Containers + Purpose

StreamZero SX Containers + Purpose.

Creating a Docker Container

Below is an example of a dockerfile to create a Docker image for some StreamZero SX application. The user is free to choose what base python image to use and then add StreamZero module and other libraries.

FROM python:3.9-alpine
#RUN pip install -i https://test.pypi.org/simple/ StreamZero-sx==0.0.8 --extra-index-url https://pypi.org/simple/ StreamZero-sx
RUN pip install StreamZero-sx
COPY app.py utils.py

After the user have built an image and pushed it to a Docker image regitsry, they can run it in StreamZero SX Management UI.

6 - Architecture

StreamZero SX Architecture.

StreamZero SX Architecture Principles

Stream processing is a technique for processing large volumes of data in real-time as it is generated or received. One way to implement a stream processing architecture is to use Docker containers for individual workflow steps and Apache Kafka for the data pipeline.

Docker is a platform for creating, deploying, and running containers, which are lightweight and portable units of software that can be run on any system with a compatible container runtime. By using Docker containers for each step in the stream processing workflow, developers can easily package and deploy their code, along with any dependencies, in a consistent and reproducible way. This can help to improve the reliability and scalability of the stream processing system.

Apache Kafka is a distributed streaming platform that can be used to build real-time data pipelines and streaming applications. It provides a publish-subscribe model for sending and receiving messages, and can handle very high throughput and low latency. By using Kafka as the backbone of the data pipeline, developers can easily scale their stream processing system to handle large volumes of data and handle failover scenarios.

Overall, by using Docker containers for the individual workflow steps and Apache Kafka for the data pipeline, developers can create a stream processing architecture that is both scalable and reliable. This architecture can be used for a wide range of use cases, including real-time analytics, event-driven architectures, and data integration.

Below is the high-level architecture diagram of StreamZero SX:

streamzero_sx_architecture

Required Infrastructure

The following are the infrastructure components required for a StreamZero SX installation

Component Description
Apache Kafka Apache Kafka serves as the backbone to pass events and operational data within a StreamZero SX Installation.
PostgreSQL Postgres is used as the database for the StreamZero SX Management Application.
Consul Consul is the configuration store used by the StreamZero SX platform. It is also used by the services to store their configurations.
MinIO Minio provides the platform internal storage for scripts and assets used by the Services.
Elasticsearch Elasticsearch is used as a central store for all operational data. Thereby making the data easiliy searchable.
Kibana Kibana is used to view and query the data stored in Elasticsearch.
StreamZero Management UI StreamZero Management UI is the main UI used for all activities on the StreamZero FX platform.
StreamZero FX-Router The Route container is responsible for listening to events flowing through the system and forwarding the events to the appropriate micro-services that you create.
StreamZero FX-Executor The executor container(s) is where the code gets executed.