Monday, September 10, 2018

Exactly Once is NOT Exactly the Same: Article Analysis

Introduction

I decided to analyze an article describing some interesting details of the stream processing exactly-once. The thing is that sometimes the authors misunderstand important terms. The analysis of the article just will clarify many aspects and details so revealing illogicalities and oddities allows you to fully experience the concepts and meaning.

Let’s get started.

Analysis

Everything starts very well:

Distributed event stream processing has become an increasingly hot topic in the area of Big Data. Notable Stream Processing Engines (SPEs) include Apache Storm, Apache Flink, Heron, Apache Kafka (Kafka Streams), and Apache Spark (Spark Streaming). One of the most notable and widely discussed features of SPEs is their processing semantics, with “exactly-once” being one of the most sought after and many SPEs claiming to provide “exactly-once” processing semantics.

Meaning that data processing is extremely important bla-bla-bla and the topic under discussion is exactly-once processing. Let us discuss it.

There exists a lot of misunderstanding and ambiguity, however, surrounding what exactly “exactly-once” is, what it entails, and what it really means when individual SPEs claim to provide it.

Indeed, it is very important to understand what it is. To do this, it would be nice to give a correct definition before the lengthy reasoning. And who am I to give such damn sensible advice?