Troubleshooting

Creating declarative data pipelines like this can quickly become quite complex. So you need to be able to debug them. There are a few tools to help with this.

Wherever you write MongoDB expressions that are not used to actually query the database, you can use the $trace operator. It can be wrapped around another expression. The result is the same, but as a side effect some tracing is written to the log with log level INFO. The Java logger for this is net.pincette.mongo.expressions. Its level is INFO by default. If you set it to FINEST, this kind of tracing is done for all expressions.

The same technique is available for JSLT scripts. There you have the custom function trace, which can also be wrapped around another expression. The Java logger for this is net.pincette.json.streams. Its level is INFO by default.

With the custom MongoDB aggregation pipeline stage $trace, the contents of whatever goes through it is logged at level INFO. If you set its value to null the entire message is traced. Alternatively you can provide a MongoDB expression to show only a piece. The Java logger for this is net.pincette.mongo.streams. Its level is INFO by default.

The custom MongoDB aggregation pipeline stage $probe emits the number of messages it has seen per minute to a Kafka topic. If the topics in your data pipeline are partitioned you should combine this with a grouping pipeline as shown above. You can give your probes a name in order to distinguish the various places where you've put them.

You can trace execution times of pipeline stages by setting the log level of the logger having the name of the application to FINEST. If you want this for an individual stage, you give it the sibling field _trace: true. The level of the log entries will be INFO.

If you have doubts about how the applications are spread across the running instances, you can look in the logs if the log level is at least INFO. You will also see which instance is the leader. The MongoDB collection where the built applications are saved, which is set in the mongodb.collection configuration entry, also contains a document for each running instance and for the leader. Make sure there is a TTL index on the aliveAt field with an expiration period that is higher than the keepAliveInterval and leaderInterval settings.