BEST PAPER AWARD at DEBS 2017

I am very happy to share that our paper Maximizing Determinism in Stream Processing Under Latency Constraints, a join work between Nikos Zacheilas, Vana Kalogeraki, Yiannis Nikolakopoulos, Marina Papatriantafilou, Philippas Tsigas and me got the BEST PAPER AWARD at the 11th ACM International Conference on Distributed Event-Based Systems (DEBS 2017)!!!

ABSTRACT

The problem of coping with the demands of determinism and meeting latency constraints is challenging in distributed data stream processing systems that have to process high volume data streams that arrive from different unsynchronized input sources. In order to deterministically process the streaming data, they need mechanisms that synchronize the order in which tuples are processed by the operators. On the other hand, achieving real-time response in such a system requires careful tradeoff between determinism and low latency performance. We build on a recently proposed approach to handle data exchange and synchronization in stream processing, namely ScaleGate, which comes with guarantees for determinism and an efficient lock-free implementation, enabling high scalability. Considering the challenge and trade-offs implied by real-time constraints, we propose a system which comprises (a) a novel data structure called Slack-ScaleGate (SSG), along with its algorithmic implementation; SSG enables us to guarantee the deterministic processing of tuples as long as they are able to meet their latency constraints, and (b) a method to dynamically tune the maximum amount of time that a tuple can wait in the SSG data structure, relaxing the determinism guarantees when needed, in order to satisfy the latency constraints. Our detailed experimental evaluation using a traffic monitoring application deployed in the city of Dublin, illustrates the working and benefits of our approach.

 

Posted in Uncategorized

Lecture about data streaming in IoT and Big Data Analysis (PhD course held at Mälardalens högskola – MDH)

I recently participated to a PhD course at Mälardalens högskola (MDH) presenting what data streaming is and how (and why) it connects to the IoT and Big Data analytics. You can find the slides here.

Posted in Data Streaming, Presentation, Research, Teaching

ScaleJoin journal – IEEE Transactions on Big Data

I am happy to share with you that our ScaleJoin has been accepted for a journal publication at the IEEE Transactions on Big Data!

This work extends our previous conference submission in 2 directions. First, we discuss how ScaleJoin can be used to join streams both on time-based windows and tuple-based windows. Second, we implement and evaluate ScaleJoin on the Xeon Phi coprocessor unit, based on Intel® Many Integrated Core (Intel MIC) architecture, which allows for a scalability study of up to 220 physical threads.

Abstract:

The inherently large and varying volumes of information generated in large scale systems demand near real-time processing of data streams. In this context, data streaming is imperative for data-intensive processing infrastructures. Stream joins, the streaming counterpart of database joins, compare tuples coming from different streams and constitute one of the most important and expensive data streaming operators. Algorithmic implementations of stream joins have to be capable of efficiently processing bursty and rate-varying data streams in a deterministic and skew-resilient fashion. To leverage the design of modern multicore architectures, scalability and parallelism need to be addressed also in the algorithmic design.

In this paper we present ScaleJoin, an algorithmic construction for deterministic and parallel stream joins that guarantees all the above properties, thus filling in a gap in the existing state-of-the-art. Key to the novelty of ScaleJoin is the ScaleGate data structure and its lock-free implementation. ScaleGate facilitates concurrent data exchange and balances independent actions among process- ing threads; enabling fine-grain parallelism and deterministic processing. It allows ScaleJoin to run on an arbitrary number of processing threads, evenly sharing the overall comparisons run in parallel and achieving disjoint and skew-resilient high processing throughput and low processing latency.

Posted in Concurrent Data Structures, Data Streaming, Research, ScaleGate