Our tutorial, titled “The Role of Event-Time Analysis Order in Data Streaming”, will be presented next week at the 14th ACM International Conference on Distributed and Event-Based Systems (DEBS) conference. We have recorded the tutorial, and you can find the videos at the following links:
Part 1: https://youtu.be/SW_WS6ULsdY
Part 2: https://youtu.be/bq3ECNvPwOU
You can find the slides, as well as the code examples, here. The slides are also available at SlideShare (here)
Abstract:
The data streaming paradigm was introduced around the year 2000 to overcome the limitations of traditional store-then-process paradigms found in relational databases (DBs). Opposite to DBs’ “first-the-data-then-the-query” approach, data streaming applications build on the “first-the-query then-the-data” alternative. More concretely, data streaming applications do not rely on storage to initially persist data and later query it, but rather build on continuous single-pass analysis in which incoming streams of data are processed on the fly and result in continuous streams of outputs.
In contrast with traditional batch processing, data streaming applications require the user to reason about an additional dimension in the data: event-time. Numerous models have been proposed in the literature to reason about event-time, each with different guarantees and trade-offs. Since it is not always clear which of these models is appropriate for a particular application, this tutorial studies the relevant concepts and compares the available options. This study can be highly relevant for people working with data streaming applications, both researchers and industrial practitioners.
Leave a Reply