Abstract and slides for the “Performance Modeling of Stream Joins” paper (DEBS 17)

I just uploaded the slides I used to present the paper “Performance Modeling of Stream Joins” at the DEBS 17 conference. You can find them here.

Abstract
Streaming analysis is widely used in a variety of environments, from cloud computing infrastructures up to the network’s edge. In these contexts, accurate modeling of streaming operators’ performance enables fine-grained prediction of applications’ behavior without the need of costly monitoring. This is of utmost importance for computationally-expensive operators like stream joins, that observe throughput and latency very sensitive to rate-varying data streams, especially when deterministic processing is required.
In this paper, we present a modeling framework for estimating the throughput and the latency of stream join processing. The model is presented in an incremental step-wise manner, starting from a centralized non-deterministic stream join and expanding up to a deterministic parallel stream join. The model describes how the dynamics of throughput and latency are influenced by the number of physical input streams, as well as by the amount of parallelism in the actual processing and the requirement for determinism. We present an experimental validation of the model with respect to the actual implementation. The proposed model can provide insights that are catalytic for understanding the behavior of stream joins against different system deployments, with special emphasis on the influences of determinism and parallelization.

Posted in Uncategorized

BEST PAPER AWARD at DEBS 2017

I am very happy to share that our paper Maximizing Determinism in Stream Processing Under Latency Constraints, a join work between Nikos Zacheilas, Vana Kalogeraki, Yiannis Nikolakopoulos, Marina Papatriantafilou, Philippas Tsigas and me got the BEST PAPER AWARD at the 11th ACM International Conference on Distributed Event-Based Systems (DEBS 2017)!!!

ABSTRACT

The problem of coping with the demands of determinism and meeting latency constraints is challenging in distributed data stream processing systems that have to process high volume data streams that arrive from different unsynchronized input sources. In order to deterministically process the streaming data, they need mechanisms that synchronize the order in which tuples are processed by the operators. On the other hand, achieving real-time response in such a system requires careful tradeoff between determinism and low latency performance. We build on a recently proposed approach to handle data exchange and synchronization in stream processing, namely ScaleGate, which comes with guarantees for determinism and an efficient lock-free implementation, enabling high scalability. Considering the challenge and trade-offs implied by real-time constraints, we propose a system which comprises (a) a novel data structure called Slack-ScaleGate (SSG), along with its algorithmic implementation; SSG enables us to guarantee the deterministic processing of tuples as long as they are able to meet their latency constraints, and (b) a method to dynamically tune the maximum amount of time that a tuple can wait in the SSG data structure, relaxing the determinism guarantees when needed, in order to satisfy the latency constraints. Our detailed experimental evaluation using a traffic monitoring application deployed in the city of Dublin, illustrates the working and benefits of our approach.

 

Posted in Uncategorized

Lecture about data streaming in IoT and Big Data Analysis (PhD course held at Mälardalens högskola – MDH)

I recently participated to a PhD course at Mälardalens högskola (MDH) presenting what data streaming is and how (and why) it connects to the IoT and Big Data analytics. You can find the slides here.

Posted in Data Streaming, Presentation, Research, Teaching