Streams

A stream is an aggregation pipeline with stages that transform JSON messages. They can consume messages from other streams or Kafka topics. They can produce messages to other streams that are connected to it or to Kafka topics. The transformations are MongoDB aggregation pipeline stages. The following is a simple example that filters out messages based on a field in them.

---
application: "my-app"
version: "1.0"
parts:
  - type: "stream"
    name: "my-stream"
    fromTopic: "my-messages"
    toTopic: "my-messages-filtered"
    pipeline:
      - $match:
          shouldFilter: true

When pipeline stages are large it is more convenient to put them in a separate file. A file can contain just one stage or an array of stages. This example moves the stage to another file.

---
application: "my-app"
version: "1.0"
parts:
  - type: "stream"
    name: "my-stream"
    fromTopic: "my-messages"
    toTopic: "my-messages-filtered"
    pipeline:
      - "filter.yml"

Where the file filter.yml is then like this:

$match:
  shouldFilter: true

Supported Fields

Field Mandatory Description
fromCollection Exclusive with fromTopic and fromStream The name of a MongoDB collection. The _id fields in the documents will become the message ID. They are converted to strings if they aren't strings already.
fromStream Exclusive with fromTopic and fromCollection The name of the stream to which this stream will be connected.
fromTopic Exclusive with fromStream and fromCollection The name of the Kafka topic to which this stream will be connected as a consumer.
name Yes The part name of the stream. Other parts can connect to this stream with that name. It should be unique within the application.
pipeline No An array of pipeline stages. These are either objects or relative filenames, in which case the stage is loaded from there.
toCollection No The name of the MongoDB collection to which the messages are sent.
toString No A boolean field that, when the toTopic field is present, will cause the JSON messages to be written as strings.
toTopic No The name of the Kafka topic to which this stream will be connected as a producer.
type Yes The value is always stream.

Supported Aggregation Stages

These are described on the aggregation pipeline stages page.